I had a situation occur today where an activity running in a workflow caused a memory spike and forced our worker service in kubernetes to restart. After the worker service started back up the activity did not get reprocessed and eventually timed out based on the scheduleToCloseTimeout setting.
Would this behaviour be based on the retry settings for that Activity or is there other settings that could be the cause of the activity not being retried again. In this case where the worker becomes unavailable how does temporal know this and reschedule it to a new worker?
Also, is there a better / quicker way to know when the activity has died and needs to be retried - I was thinking that some kind of a timeout based on an activity heartbeat would be a lot quicker than having to wait for the scheduleToCloseTimeout to trigger.