Help understanding retries

Hi,

I have a temporal (go) activity with a start To end timeout of 10 seconds.

	ao := workflow.ActivityOptions{
		StartToCloseTimeout: 10 * time.Second,
	}
	ctx = workflow.WithActivityOptions(ctx, ao)

	logger := workflow.GetLogger(ctx)

It also has an activity that returns a temporal.NewCanceledError(err)

I am really confused about 2 things:

  1. After 10 seconds, it has reached its start to end timeout period of 10 seconds, so why does it stay running? My expectation was the task failed and is over, but i need to manually terminate it.

  2. When the temporal newCancelledError is returned, why does it schedule a new retry? I was expecting the task to be over. If i was to use an ordinary error the behaviour is the same.

I’ve really been struggling with understanding why temporal behaves this way as its very unintuitive & hoping there can be some help.

Even pointing me to a sample that shows how to deal with the behaviour would be really helpful. I am really quite stuck on this issue.

Regarding activity timeouts, make sure you watch Maxims video: The 4 Types of Timeouts in Temporal - YouTube
Specific place in video for retries and activity timeouts here.

By default Temporal provides automatic retries for activities. This can be controlled, as described in that video by the activity retry policy, as well as activity ScheduleToClose timeout.

StartToClose timeout is the maximum execution time for a single activity invocation.

For limiting activity retries, it’s recommended to rely on ScheduleToClose timeout which limits total time activity can execute including all the retries (rather than setting RetryPolicy.MaximumAttempts on activity retry options)

Regarding error handling, see this post and response, especially the " Non-retryable error not failing workflow problem" section that is relevant for your second question.

.

Activity times out from the point of the Temporal workflow. So it is either retried or the timeout error is returned to the workflow. From the activity implementation point of view, the Context object Done channel is closed. If your activity implementation ignores this then it keeps running as there is no way to proactively kill the activity goroutine in the Go language.

This is actually quite clear and a good design.

I was wondering if you could help show me the simplest example from the go-samples that explains how to handle a timed out instance nicely? If there is one that showed which activity timed out its also quite helpful.