if some workflow has default execution timeout (which is infinite), and the pod on which it’s running on crashes, will it be retried?
Workflow executions are not tied to a specific worker. In the case a worker crashes, your executions can be continued on a worker in a different process. Would recommend watching this video where Maxim explains this in more details.
Unlike activities, workflows do not have a default retry policy and you have to specifically enable it via WorkflowOptions. If you enable workflow retries and your workflow execution fails or times out, it will be retried up to the WorkflowExecutionTimeout (or “infinitely” if you don’t specify it). In case your worker process is down it can be retried on a different one.
By default workflows do not fail on intermittent errors but block workflow execution waiting on a fix. You don’t need to set up workflow retries in case of worker crash, as again execution can be continued on a different worker process.
regarding activity, you say that start-to-close must be set in order to reach timeout
For activities you have to set either StartToClose or ScheduleToClose timeouts, see this video for more info. Activities have a default retry policy, so in your case where a worker crashes, it would be retried on a different worker process.
Activities are retried up to the set ScheduleToClose timeout. When activity retries are exhausted ActivityFailure is delivered to your workflow code.
namely, is start-to-close is infinite, and heartbeat is 20s, won’t it solve the issue of crashing and retrying?
For long running activities (when you have a long StartToClose timeout), heartbeat timeout can be used to detect activity failures quickly, see here for more info. In this case again in case of worker crash, your activity would be able to be retried on a different worker process and continue execution.
for activity, if maxRetries =1 and the pod crashes, it means that there will be no retries?
Setting maxRetries in ActivityOptions to 1 disables activity retries. In case of worker failure execution can be continued on a different worker process.
We don’t recommend limiting retries via maxRetries, but via ActivityOptions ScheduleToClose timeout.