I have a simple workflow with 2 activity steps, the activity steps are long running (up to an hour) and memory intensive (a host can only run 1 concurrent activity at a time) tasks. I currently dont do any heartbeating on the activity.
So my workflow is configured with MaxConcurrentActivityExecutionSize as 1
And my activity stub is configure with ScheduleToStartTimeout and StartToCloseTimeout as 1 hour each
I am noticing this bug where my first activity is completing fine, then my second activity is scheduled but not getting executed immediately, instead after an hour (activity timeout) the second activity is getting executed twice.
So here is the sequence of steps (from temporal UI + my logs)
00:00 workflow started
00:00 first activity scheduled
00:00 first activity started
00:01 first activity complete
00:01 second activity scheduled, i can see form the logs that this did not start executing till an hour
– 1 hour later –
01:01 second activity started (with last error as “activity timeout”) see screenshot
01:02 second activity complete
– duplicate (from host logs) –
01:03 second activity triggered again
01:04: PollerOptions: uncaught exception java.lang.RuntimeException: NOT_FOUND: invalid activityID or activity already timed out or invoking workflow is completed
If you look at my activity_schedule_to_start p99 metric it never goes above 13 mins
I am trying to understand why my second activity task is not getting executed immediately even thought it says as scheduled. And after the time out happens why is it getting executed twice?
Let me know if you need any more details or if there is a better place i should report bugs to.