How to test Server outageous

Case1:
I have configured RetryOption in my activity.

  • StartToClose : Maximum time for a single Activity Execution attempt.
  • ScheduleToClose : Maximum time for the overall Activity Execution.

Restarting my worker server.
If I configure StartToClose then the activity is retying, but If I configure ScheduleToClose then it won’t.

May I know the reason

case2:
My workflow code calling single activity.
The temporal server is restarted. worker server is always up.

In this case, my activity gets completed but since my temporal server outage, it is unable to respond back to it saying ActivityCompleted event.

So it is again starting my activity and finishing it. Activity is running twice. Is that expected?

Case1:

What are the values of StartToClose and ScheduleToClose? Or could you just post the history of the workflow here?

Case2:

If activity couldn’t report its completion to the service it is going to timeout when the service is back. By default, an activity is retried upon timeout, so it is going to execute a second time.

Normal retry is working. But if I stop and start my worker to check worker outage then retry not working. If I set .setStartToCloseTimeout then retry working

below find my history-json

[{"eventId":"1","eventTime":{"seconds":"1622569025","nanos":956258900},"eventType":"EVENT_TYPE_WORKFLOW_EXECUTION_STARTED","version":"0","taskId":"7391964","workflowExecutionStartedEventAttributes":{"workflowType":{"name":"PilotWorkflow"},"parentWorkflowNamespace":"","parentInitiatedEventId":"0","taskQueue":{"name":"test","kind":"TASK_QUEUE_KIND_NORMAL"},"input":{"payloads":[{"metadata":{"encoding":"anNvbi9wbGFpbg=="},"data":"eyJpbnB1dCI6IjEwIiwiZmxvd0xpc3QiOlt7IkBjbGFzcyI6ImNvbS5wZXJwaWxvdC5jb250cm9sLlNpbXBsZUZsb3ciLCJ0YXNrTGlzdCI6W3siQGNsYXNzIjoiY29tLnBlcnBpbG90LmNsaWVudC50YXNrLkV4Y2VwdGlvblRhc2sifV19XX0="}]},"workflowExecutionTimeout":{"seconds":"0","nanos":0},"workflowRunTimeout":{"seconds":"0","nanos":0},"workflowTaskTimeout":{"seconds":"10","nanos":0},"continuedExecutionRunId":"","initiator":"CONTINUE_AS_NEW_INITIATOR_UNSPECIFIED","originalExecutionRunId":"84dfff4a-cbd3-41a8-aa98-1f018b209cb3","identity":"5903@Digitals-MacBook-Pro.local","firstExecutionRunId":"84dfff4a-cbd3-41a8-aa98-1f018b209cb3","attempt":1,"cronSchedule":"","firstWorkflowTaskBackoff":{"seconds":"0","nanos":0},"header":{"fields":{}}},"attributes":"workflowExecutionStartedEventAttributes"},{"eventId":"2","eventTime":{"seconds":"1622569025","nanos":956417900},"eventType":"EVENT_TYPE_WORKFLOW_TASK_SCHEDULED","version":"0","taskId":"7391965","workflowTaskScheduledEventAttributes":{"taskQueue":{"name":"test","kind":"TASK_QUEUE_KIND_NORMAL"},"startToCloseTimeout":{"seconds":"10","nanos":0},"attempt":1},"attributes":"workflowTaskScheduledEventAttributes"},{"eventId":"3","eventTime":{"seconds":"1622569025","nanos":990183900},"eventType":"EVENT_TYPE_WORKFLOW_TASK_STARTED","version":"0","taskId":"7391969","workflowTaskStartedEventAttributes":{"scheduledEventId":"2","identity":"5903@Digitals-MacBook-Pro.local","requestId":"0e77080a-f16b-4045-abf2-01fabc7f3dc4"},"attributes":"workflowTaskStartedEventAttributes"},{"eventId":"4","eventTime":{"seconds":"1622569026","nanos":237938800},"eventType":"EVENT_TYPE_WORKFLOW_TASK_COMPLETED","version":"0","taskId":"7391972","workflowTaskCompletedEventAttributes":{"scheduledEventId":"2","startedEventId":"3","identity":"5903@Digitals-MacBook-Pro.local","binaryChecksum":""},"attributes":"workflowTaskCompletedEventAttributes"},{"eventId":"5","eventTime":{"seconds":"1622569026","nanos":238974200},"eventType":"EVENT_TYPE_ACTIVITY_TASK_SCHEDULED","version":"0","taskId":"7391973","activityTaskScheduledEventAttributes":{"activityId":"2b995bb2-93ad-3592-b1ca-a7102d30a51e","activityType":{"name":"Run"},"namespace":"","taskQueue":{"name":"test","kind":"TASK_QUEUE_KIND_NORMAL"},"header":{"fields":{}},"input":{"payloads":[{"metadata":{"encoding":"anNvbi9wbGFpbg=="},"data":"eyJzeXN0ZW1EYXRhIjpudWxsLCJ1c2VyRGF0YSI6eyJ0YXNrIjp7IkBjbGFzcyI6ImNvbS5wZXJwaWxvdC5jbGllbnQudGFzay5FeGNlcHRpb25UYXNrIn0sInByZXZSZXN1bHQiOiIxMCJ9fQ=="}]},"scheduleToCloseTimeout":{"seconds":"40","nanos":0},"scheduleToStartTimeout":{"seconds":"40","nanos":0},"startToCloseTimeout":{"seconds":"40","nanos":0},"heartbeatTimeout":{"seconds":"0","nanos":0},"workflowTaskCompletedEventId":"4","retryPolicy":{"nonRetryableErrorTypes":[],"initialInterval":{"seconds":"1","nanos":0},"backoffCoefficient":2,"maximumInterval":{"seconds":"100","nanos":0},"maximumAttempts":2}},"attributes":"activityTaskScheduledEventAttributes"},{"eventId":"6","eventTime":{"seconds":"1622569026","nanos":272071800},"eventType":"EVENT_TYPE_ACTIVITY_TASK_STARTED","version":"0","taskId":"7391978","activityTaskStartedEventAttributes":{"scheduledEventId":"5","identity":"5903@Digitals-MacBook-Pro.local","requestId":"abd1d821-08ea-46e2-b0a6-8022c46b3979","attempt":1},"attributes":"activityTaskStartedEventAttributes"},{"eventId":"7","eventTime":{"seconds":"1622569066","nanos":248583400},"eventType":"EVENT_TYPE_ACTIVITY_TASK_TIMED_OUT","version":"0","taskId":"7391979","activityTaskTimedOutEventAttributes":{"failure":{"message":"activity timeout","source":"Server","stackTrace":"","timeoutFailureInfo":{"timeoutType":"TIMEOUT_TYPE_SCHEDULE_TO_CLOSE"},"failureInfo":"timeoutFailureInfo"},"scheduledEventId":"5","startedEventId":"6","retryState":"RETRY_STATE_NON_RETRYABLE_FAILURE"},"attributes":"activityTaskTimedOutEventAttributes"},{"eventId":"8","eventTime":{"seconds":"1622569066","nanos":248651500},"eventType":"EVENT_TYPE_WORKFLOW_TASK_SCHEDULED","version":"0","taskId":"7391980","workflowTaskScheduledEventAttributes":{"taskQueue":{"name":"5903@Digitals-MacBook-Pro.local:da369b41-3aba-4f28-b3ca-2fb32ddcec93","kind":"TASK_QUEUE_KIND_STICKY"},"startToCloseTimeout":{"seconds":"10","nanos":0},"attempt":1},"attributes":"workflowTaskScheduledEventAttributes"},{"eventId":"9","eventTime":{"seconds":"1622569071","nanos":258963900},"eventType":"EVENT_TYPE_WORKFLOW_TASK_TIMED_OUT","version":"0","taskId":"7391984","workflowTaskTimedOutEventAttributes":{"scheduledEventId":"8","startedEventId":"0","timeoutType":"TIMEOUT_TYPE_SCHEDULE_TO_START"},"attributes":"workflowTaskTimedOutEventAttributes"},{"eventId":"10","eventTime":{"seconds":"1622569071","nanos":259013100},"eventType":"EVENT_TYPE_WORKFLOW_TASK_SCHEDULED","version":"0","taskId":"7391985","workflowTaskScheduledEventAttributes":{"taskQueue":{"name":"test","kind":"TASK_QUEUE_KIND_NORMAL"},"startToCloseTimeout":{"seconds":"10","nanos":0},"attempt":1},"attributes":"workflowTaskScheduledEventAttributes"},{"eventId":"11","eventTime":{"seconds":"1622569071","nanos":279365500},"eventType":"EVENT_TYPE_WORKFLOW_TASK_STARTED","version":"0","taskId":"7391988","workflowTaskStartedEventAttributes":{"scheduledEventId":"10","identity":"5904@Digitals-MacBook-Pro.local","requestId":"ef7d913c-3236-4386-b07e-b6bcd456ef1e"},"attributes":"workflowTaskStartedEventAttributes"},{"eventId":"12","eventTime":{"seconds":"1622569071","nanos":801303200},"eventType":"EVENT_TYPE_WORKFLOW_TASK_COMPLETED","version":"0","taskId":"7391991","workflowTaskCompletedEventAttributes":{"scheduledEventId":"10","startedEventId":"11","identity":"5904@Digitals-MacBook-Pro.local","binaryChecksum":""},"attributes":"workflowTaskCompletedEventAttributes"},{"eventId":"13","eventTime":{"seconds":"1622569071","nanos":801436800},"eventType":"EVENT_TYPE_WORKFLOW_EXECUTION_FAILED","version":"0","taskId":"7391992","workflowExecutionFailedEventAttributes":{"failure":{"message":"Activity task timedOut","source":"","stackTrace":"","cause":{"message":"activity timeout","source":"Server","stackTrace":"","timeoutFailureInfo":{"timeoutType":"TIMEOUT_TYPE_SCHEDULE_TO_CLOSE"},"failureInfo":"timeoutFailureInfo"},"activityFailureInfo":{"scheduledEventId":"5","startedEventId":"6","identity":"","activityType":{"name":"Run"},"activityId":"2b995bb2-93ad-3592-b1ca-a7102d30a51e","retryState":"RETRY_STATE_NON_RETRYABLE_FAILURE"},"failureInfo":"activityFailureInfo"},"retryState":"RETRY_STATE_RETRY_POLICY_NOT_SET","workflowTaskCompletedEventId":"12"},"attributes":"workflowExecutionFailedEventAttributes"}]

Understood :+1:

If I have 2 or 3 temporal servers and if any one of the server goes down then other will pick up the request and continue right?

Here are the activity timeouts from the history:

      "scheduleToCloseTimeout": {
        "seconds": "40",
        "nanos": 0
      },
      "scheduleToStartTimeout": {
        "seconds": "40",
        "nanos": 0
      },
      "startToCloseTimeout": {
        "seconds": "40",
        "nanos": 0
      },

When your worker is restarted the activity is going to timeout after the StartToClose timeout. But it cannot be retried as you didn’t give it any time to retry by specifying the same ScheduleToClose timeout. The solution is to not specify ScheduleToClose timeout (defaulting it to infinity) or to specify it long enough to give the activity a chance to retry.

We also recommend to not specify ScheduleToStart timeout as it is not retryable and is not needed in the 99% of cases.

Okay, the only recommended timeout is startToCloseTimeout right?

One more doubt what if I didn’t specify end cases like Maximum attempt, MaximumInterval.
How many times it will attempt?

One more doubt what if I didn’t specify end cases like Maximum attempt, MaximumInterval.
How many times it will attempt?

It will attempt until the workflow is timed out or closed in any other way.

If you do want to limit the duration of retries use the ScheduleToClose timeout.

1 Like