We have a temporal activity that we use to poll whether some action has occurred by a certain time. The activity returns an error if the action hasn’t occurred, and uses the following activity options and retry policy, based loosely on this recommendation:
Usually, this activity works as intended. However, today we encountered an issue where the activity finished way after the expected deadline; specifically, we scheduled the activity at 13:57 UTC and set the ScheduleToCloseTimeout to 1h29m, so we were expecting the activity to complete by 15:26 UTC. Instead, the activity actually completed (with a ScheduleToClose timeout error) at 16:56 UTC. Attached a screenshot of the temporal UI below.
Our logs show that the activity started running at 13:57 UTC, and the last retry occurred at 15:26 UTC, so we’re wondering why the activity failed at 16:56 UTC instead of at 15:26 UTC? We’re not sure if this is related, but the Temporal UI shows the “ActivityTaskStarted” at 15:26 UTC, and the time out occurred 1h29m after that “ActivityTaskStarted” (which matches the ScheduleToClose duration).
Hi could you share the full event history json of this execution please? Do you have service metrics configured that we could look at? Also which server version are you using?
Just in general, it’s recommended to always set activity StartToClose timeout. Use ScheduleToClose to restrict retries via time, or if need to restrict retries via number of attempts use RetryPolicy-> MaximumAttempts, so no need to use both i think.
(also just in case, this does look possibly like a bug so knowing server version and your sdk version as well as if there is a way to repro this would help)
Hi, we’re using v1.25.1 of the golang SDK and Temporal cloud. I’ll also send you the JSON via Slack or email (this page doesn’t seem to allow me to upload raw JSON).
And good to know about also setting the activity StartToClose timeout - I’ll give that a try!
Thanks, please always set StartToClose timeout on your activities.
We will have more info on ticket but for now if last activity attempt (retry) happens right before ScheduleToClose time it can happen that ScheduleToClose gets extended. This is something team will fix in the future.
Setting a smaller and appropriate StartToClose timeout for activity is always recommended to do and in your case would fail the activity a lot sooner.
Another thing to look into is whats going on with the activity? if you have sdk metrics enabled look at activity_execution_latency as well as request_latency metric for operation RespondActivityTaskCompleted please.
Did you already get a chance to open up a ticket for this - was there any progress on making it work only with the ScheduleToCloseTimeout? I’m reviving this topic in the hope of simplifying the activity options in our system.