Activity timeout bug

Vitaly · December 18, 2020, 3:16am

@nithin I was able to reproduce your problem locally, it looks like this issue can occur when activity handlers can not keep up with pollers, in your case you use default number of pollers (5) and reduce number of handlers from default (200) to 1.What happens is that activity tasks are getting polled, server starts counting time for the activity heartbeat, but instead of being processed right away they are waiting for handler capacity to become available and when it finally becomes available it could be already too late because heartbeat timeout might have passed. We consider this a bug and it will be fixed in the coming release.

Meanwhile for temporary relief you may try adding setActivityPollThreadCount(1) property in your worker options which should reduce frequency of the exception or even make it go away.

nithin · December 18, 2020, 4:07am

@Vitaly Thanks for looking more into this. I tried setting set ActivityPollThreadCount(1) couple days back. This reduced the number of errors, but we are still seeing them.

Vitaly · December 18, 2020, 6:21am

That’s to be expected, number of errors in this case would be proportional to the number of pollers. So by changing it from 5 to 1 you reduce it roughly by a factor of 5x. I’m working on a proper fix.
My expectation is that you should no longer be blocked by this issue as doing heartbeating early on would simply fail the activity execution that was queued for too long, which should result in a retry with no actual work being done in the failed attempt.

nithin · December 18, 2020, 7:45pm

Unfortunately we heartbeat in a separate wrapper thread so right now we dont kill the activity when the heartbeat fails.

So to summarize there are 2 issues here:

When the activity_poll_thread_count is greater than the concurrent_executions, the poller is fetching more tasks than the handler can processing some of which times out. This can be fixed by us by setting activity_poll_thread_count <= concurrent_executions.
The poller keep polling tasks even though the handler has not completed activity execution. This the bug that you ll be fixing in the next release

Is this an apt summary?

maxim · December 18, 2020, 8:41pm

Yes.

And you want to make sure that your activity execution stops on any exception thrown from the heartbeat method.

Hoda_Torabi · February 24, 2025, 10:28pm

Hi @maxim, I wanted to get some clarification on what is meant by shutting down the activity on exception thrown from the heartbeat method also mentioned here. Does it simply mean letting the error propagate instead of catching it and allowing the activity to continue running? Or is there something more to it?

maxim · February 24, 2025, 10:29pm

Yes, ensure it propagates, or at least the activity function returns.

Hoda_Torabi · February 24, 2025, 10:30pm

Thanks so much! @maxim

Topic		Replies	Views
Activity stuck after activity timeout Community Support activity , timeout	9	1743	June 2, 2021
Activity is scheduled but does not start Community Support	7	1703	March 31, 2021
Temporal Activity Poll & Start Delays - Issues under Load Community Support java-sdk , general-impl	6	729	May 24, 2023
Scheduling of activity gets stuck for about 10 minutes before starting Community Support go-sdk	4	346	February 9, 2024
Activity scheduled but not started (need help) Community Support go-sdk	22	5311	June 27, 2022

Activity timeout bug

Related topics