NextRetryDelay issue in Activity

We have an activity that must not execute during a fixed downtime window (02:00–02:30 IST). If the activity starts within the downtime window, we immediately fail it and return a failure with an explicit nextRetryDelay, calculated so that the next retry should occur after the downtime ends. This works in most cases.

However, in a small number of workflows, we’re seeing the activity get scheduled again within the downtime window, which ends up consuming retry attempts and eventually exhausting retries. From the logs, it looks like another retry task may already be scheduled or in flight, even though nextRetryDelay is set.

Any reasons why this is happening ?

Activity retry config (for reference)

retryPolicy:
  initialInterval: 1s
  backoffCoefficient: 2.0
  maximumInterval: 1m
  maximumAttempts: 3

Worker config (for reference)

workers:
  debitWorker:
    maxConcurrentActivityExecutionSize: 100
    taskQueueActivitiesPerSecond: 30

However, in a small number of workflows, we’re seeing the activity get scheduled again within the downtime window,

Can we try to eliminate possibility that those activity attempts resulted in retry due to activity timeout (StartToClose). Asking because in that case server wouldnt have received the retry delay set in your activity code (and would use default retry policy you set in activity options)

From worker metrics (sdk metrics) take a look at temporal_request_failure for operation RespondActivityTaskFailed with status_code NotFound or ResourceExhausted, or any other error code please share.

If you cannot see anything via worker metrics, check also server metrics as activity worker could have crashed before responding failure to service in which case only server metrics would be able to record this:

sum(rate(start_to_close_timeout{operation="TimerActiveTaskActivityTimeout",namespace="your-ns-name-here"}[1m])) by(namespace,operation)

sum by (temporal_namespace,operation) (rate(schedule_to_start_timeout{operation="TimerActiveTaskActivityTimeout",namespace="your-ns-name-here"}[1m]))

Would also look at resource exhausted graphs from server metrics

sum(rate(service_errors_resource_exhausted{}[1m])) by (operation, resource_exhausted_cause)


If no metrics, try maybe to find one of these executions and look at the ActivityTaskFailed event in event history, and look at last failure details, see if its a timeout or not.

Hi @tihomir

I was unable to find any relevant information in the server metrics.

If no metrics, try maybe to find one of these executions and look at the ActivityTaskFailed event in event history, and look at last failure details, see if its a timeout or not.

Failed Event JSON

{
  "eventId": "23",
  "eventTime": "2025-12-17T20:44:48.471038382Z",
  "eventType": "EVENT_TYPE_WORKFLOW_EXECUTION_FAILED",
  "taskId": "51380685",
  "workflowExecutionFailedEventAttributes": {
    "failure": {
      "message": "activity error",
      "source": "GoSDK",
      "cause": {
        "message": "Current time 2025-12-18 02:14:48.446255802 +0530 IST is within IBL downtime window skipping debit attempt. Retrying after 1h22m10.113371457s. Downtime ends in 15m11.553744198s adding 1h6m58.559627259s jitter",
        "source": "GoSDK",
        "applicationFailureInfo": {
          "type": "NextDelay",
          "nextRetryDelay": "4930.113371457s"
        }
      },
      "activityFailureInfo": {
        "scheduledEventId": "17",
        "startedEventId": "18",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "activityType": {
          "name": "DebitActivity"
        },
        "activityId": "17",
        "retryState": "RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED"
      }
    },
    "retryState": "RETRY_STATE_RETRY_POLICY_NOT_SET",
    "workflowTaskCompletedEventId": "22"
  }
}

Can we try to eliminate possibility that those activity attempts resulted in retry due to activity timeout (StartToClose). Asking because in that case server wouldnt have received the retry delay set in your activity code (and would use default retry policy you set in activity options)

Is the issue arising because my startToCloseTimeout is set to 2 minutes, and the worker has not responded to the server within that timeframe before the next retry began? If so, what would be the optimal value for startToCloseTimeout, considering that my activity interacts with another service that has a latency of only 7 to 10 seconds?

Also why the activity is unable to connect to the server? I’m asking because I haven’t encountered any errors indicating resource exhaustion.

Activity Options (for reference)

activities:
  debitActivity:
    options:
      taskQueue: "ump-debit-queue"
      startToCloseTimeout: 2m 
      retryPolicy:
        maximumAttempts: 3 
        initialInterval: 1s   
        maximumInterval: 1m      
        backoffCoefficient: 2.0           

Hi @tihomir,

I observed a similar issue yesterday. Additionally, if the startToCloseTimeout is set to 2 minutes, how is it possible for attempt 2 to begin at 02:12:12 when attempt 1 started at 02:11:12? Ideally, the first attempt should take a minimum of 2 minutes to close before the next attempt can start. Is my understanding correct?

Also, do you think that the maximumInterval (set at 1m) is capping the nextRetryDelay ?

Activity options:

activities:
  debitActivity:
    options:
      taskQueue: "ump-debit-queue"
      startToCloseTimeout: 2m 
      retryPolicy:
        maximumAttempts: 3 
        initialInterval: 1s   
        maximumInterval: 1m      
        backoffCoefficient: 2.0  

Server Metrics:

From your graphs it does look like you have a constant, small scale, rate of activity timeouts on StartToClose, so don’t think we were able to fully eliminate that from being the cause of issue i think.

Thanks for sharing the json, I think more relevant event to look into is the ActivityTaskFailed event, not the WorkflowExecutionFailed event, as ActivityTaskFailed would include last failure info if attempt > 0.

One more thing you can do is once you check err != nil from workflow.ExecuteActivity
you can log from workflow code if last attempt failed due to timeout. So something like:

err := workflow.ExecuteActivity(ctx, Activity, name).Get(ctx, &result)
if err != nil {
// ...
var timeoutErr *temporal.TimeoutError
if errors.As(err, &timeoutErr) {
log.Error("Activitiy timeout:", "type", timeoutErr.TimeoutType())
}
return "", err
}

Looking at time stamps of activity worker logs is good imho, but won’t fully help us here as there could be other things involved like activity schedule to start latencies, dispatch latencies, db latencies, and some small rate that you also show on ResourceExhausted.
Any chance you could share full event history json of one of those failed executions?

Also if you have worker (sdk) metrics configured, can you look at temporal_request_failure metric by operation and status_code and share during time period where you see this happen?

It would also be nice to try to eliminate worker pod/container restarts at those times as possible reason as if worker is shut down or drops while executing activities, that would lead to activity timeout on StartToClose
from server metrics maybe look at
sum(rate(service_requests{service_name="frontend"}[1m]) or on () vector(0))
look for operations ShutdownWorker, DescribeNamespace, GetSystemInfo around times where you see this issue happen

Hi @tihomir ,

I am unable to identify any errors in the server metrics. Additionally, there are no errors in the SDK metrics. A few of them are present, but they fall outside the downtime window.


Event History:

{
  "events": [
    {
      "eventId": "1",
      "eventTime": "2025-12-19T12:18:39.961066345Z",
      "eventType": "EVENT_TYPE_WORKFLOW_EXECUTION_STARTED",
      "taskId": "53537750",
      "userMetadata": {
        "summary": {
          "metadata": {
            "encoding": "json/plain"
          },
          "data": "UMP Workflow for ID 8107166 and due date 2025-12-21"
        }
      },
      "workflowExecutionStartedEventAttributes": {
        "workflowType": {
          "name": "UMPWorkflow"
        },
        "taskQueue": {
          "name": "ump-wf-queue",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "input": {
          "payloads": [
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "Id": "f27afdfb-1589-4543-ac47-16d5dcde66d3",
                "ClientCode": "",
                "ClientName": "Test User ",
                "InstallmentAmount": 4500,
                "FileDueDate": "",
                "AccountNo": "",
                "BankCode": "UPI",
                "MandateRefNo": "",
                "MandateId": "",
                "FileGenDate": "",
                "RegistrationId": "8107166",
                "UmrnNo": "@okhdfc",
                "ActualDueDate": "2025-12-21T00:00:00Z",
                "DueDate": "2025-12-21T00:00:00Z",
                "IsDeleted": false,
                "CounterNo": 2487,
                "IFSCCode": "",
                "MandateCreatedDate": "2025-02-13T00:00:00Z",
                "BankName": "",
                "SchemeName": "",
                "SchemeCode": "",
                "PaymentStatus": "",
                "SettlementType": "",
                "EstimatedDate": "0001-01-01T00:00:00Z",
                "UserEntity": "",
                "PgTransactionId": ""
              }
            },
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "MinDebitOffset": 300000000000,
                "MaxDebitOffset": 43200000000000,
                "JitterWindow": 7200000000000,
                "MinSleepDuration": 86400000000000,
                "IBLDowntimeStart": 7200000000000,
                "IBLDowntimeDuration": 1800000000000
              }
            },
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "PreDebitSleep": 7000000000,
                "DebitSleep": 5000000000
              }
            }
          ]
        },
        "workflowExecutionTimeout": "0s",
        "workflowRunTimeout": "0s",
        "workflowTaskTimeout": "10s",
        "originalExecutionRunId": "019b368c-40d9-7100-b591-a4037dfde0d3",
        "identity": "54@ip-10-3-202-194.ap-south-1.compute.internal@",
        "firstExecutionRunId": "019b368c-40d9-7100-b591-a4037dfde0d3",
        "attempt": 1,
        "firstWorkflowTaskBackoff": "0s",
        "header": {},
        "workflowId": "ump-8107166-2025-12-21"
      }
    },
    {
      "eventId": "2",
      "eventTime": "2025-12-19T12:18:39.961128618Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_SCHEDULED",
      "taskId": "53537751",
      "workflowTaskScheduledEventAttributes": {
        "taskQueue": {
          "name": "ump-wf-queue",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "startToCloseTimeout": "10s",
        "attempt": 1
      }
    },
    {
      "eventId": "3",
      "eventTime": "2025-12-19T12:18:39.998264165Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_STARTED",
      "taskId": "53537756",
      "workflowTaskStartedEventAttributes": {
        "scheduledEventId": "2",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "requestId": "46652215-37b8-4675-8f02-b5a45bdacafd",
        "historySizeBytes": "1436",
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        }
      }
    },
    {
      "eventId": "4",
      "eventTime": "2025-12-19T12:18:40.020173262Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_COMPLETED",
      "taskId": "53537765",
      "workflowTaskCompletedEventAttributes": {
        "scheduledEventId": "2",
        "startedEventId": "3",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        },
        "sdkMetadata": {
          "langUsedFlags": [
            3
          ],
          "sdkName": "temporal-go",
          "sdkVersion": "1.38.0"
        },
        "meteringMetadata": {}
      }
    },
    {
      "eventId": "5",
      "eventTime": "2025-12-19T12:18:40.020238897Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_SCHEDULED",
      "taskId": "53537766",
      "activityTaskScheduledEventAttributes": {
        "activityId": "5",
        "activityType": {
          "name": "PreDebitActivity"
        },
        "taskQueue": {
          "name": "ump-pre-debit-queue",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "header": {},
        "input": {
          "payloads": [
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "Id": "f27afdfb-1589-4543-ac47-16d5dcde66d3",
                "ClientCode": "",
                "ClientName": "Test User",
                "InstallmentAmount": 4500,
                "FileDueDate": "",
                "AccountNo": "",
                "BankCode": "UPI",
                "MandateRefNo": "",
                "MandateId": "",
                "FileGenDate": "",
                "RegistrationId": "8107166",
                "UmrnNo": "@okhdfc",
                "ActualDueDate": "2025-12-21T00:00:00Z",
                "DueDate": "2025-12-21T00:00:00Z",
                "IsDeleted": false,
                "CounterNo": 2487,
                "IFSCCode": "",
                "MandateCreatedDate": "2025-02-13T00:00:00Z",
                "BankName": "",
                "SchemeName": "",
                "SchemeCode": "",
                "PaymentStatus": "",
                "SettlementType": "",
                "EstimatedDate": "0001-01-01T00:00:00Z",
                "UserEntity": "",
                "PgTransactionId": ""
              }
            },
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "MinDebitOffset": 300000000000,
                "MaxDebitOffset": 43200000000000,
                "JitterWindow": 7200000000000,
                "MinSleepDuration": 86400000000000,
                "IBLDowntimeStart": 7200000000000,
                "IBLDowntimeDuration": 1800000000000
              }
            },
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "PreDebitSleep": 7000000000,
                "DebitSleep": 5000000000
              }
            }
          ]
        },
        "scheduleToCloseTimeout": "0s",
        "scheduleToStartTimeout": "0s",
        "startToCloseTimeout": "120s",
        "heartbeatTimeout": "0s",
        "workflowTaskCompletedEventId": "4",
        "retryPolicy": {
          "initialInterval": "1s",
          "backoffCoefficient": 2,
          "maximumInterval": "60s",
          "maximumAttempts": 3
        }
      }
    },
    {
      "eventId": "6",
      "eventTime": "2025-12-19T12:55:41.343535325Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_STARTED",
      "taskId": "54536489",
      "activityTaskStartedEventAttributes": {
        "scheduledEventId": "5",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "requestId": "ae38c84a-deae-4349-844f-92753fa28fc7",
        "attempt": 1,
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        }
      }
    },
    {
      "eventId": "7",
      "eventTime": "2025-12-19T12:55:48.355476419Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_COMPLETED",
      "taskId": "54536490",
      "activityTaskCompletedEventAttributes": {
        "result": {
          "payloads": [
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "error_code": "",
                "message": "",
                "bank_error_code": "00",
                "bank_error_desc": "APPROVED OR COMPLETED SUCCESSFULLY",
                "bank_res_code": "00",
                "bank_res_desc": "APPROVED OR COMPLETED SUCCESSFULLY",
                "status": "Success",
                "statusdesc": "Request Processed Successfully",
                "cp_mandate_ref_no": "",
                "seqno": "MOCK-SEQ"
              }
            }
          ]
        },
        "scheduledEventId": "5",
        "startedEventId": "6",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@"
      }
    },
    {
      "eventId": "8",
      "eventTime": "2025-12-19T12:55:48.355481168Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_SCHEDULED",
      "taskId": "54536491",
      "workflowTaskScheduledEventAttributes": {
        "taskQueue": {
          "name": "ip-10-3-195-33.ap-south-1.compute.internal:e34f6e73-c4e7-4658-8d12-4ff192be22cc",
          "kind": "TASK_QUEUE_KIND_STICKY",
          "normalName": "ump-wf-queue"
        },
        "startToCloseTimeout": "10s",
        "attempt": 1
      }
    },
    {
      "eventId": "9",
      "eventTime": "2025-12-19T12:55:48.372474045Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_STARTED",
      "taskId": "54536495",
      "workflowTaskStartedEventAttributes": {
        "scheduledEventId": "8",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "requestId": "1ee9cc22-f892-4007-bdf5-81a107a0195d",
        "historySizeBytes": "3640",
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        }
      }
    },
    {
      "eventId": "10",
      "eventTime": "2025-12-19T12:55:48.392135128Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_COMPLETED",
      "taskId": "54536499",
      "workflowTaskCompletedEventAttributes": {
        "scheduledEventId": "8",
        "startedEventId": "9",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        },
        "sdkMetadata": {},
        "meteringMetadata": {}
      }
    },
    {
      "eventId": "11",
      "eventTime": "2025-12-19T12:55:48.392164516Z",
      "eventType": "EVENT_TYPE_MARKER_RECORDED",
      "taskId": "54536500",
      "markerRecordedEventAttributes": {
        "markerName": "SideEffect",
        "details": {
          "data": {
            "payloads": [
              {
                "metadata": {
                  "encoding": "json/plain"
                },
                "data": 1613
              }
            ]
          },
          "side-effect-id": {
            "payloads": [
              {
                "metadata": {
                  "encoding": "json/plain"
                },
                "data": 1
              }
            ]
          }
        },
        "workflowTaskCompletedEventId": "10"
      }
    },
    {
      "eventId": "12",
      "eventTime": "2025-12-19T12:55:48.392168988Z",
      "eventType": "EVENT_TYPE_TIMER_STARTED",
      "taskId": "54536501",
      "userMetadata": {
        "summary": {
          "metadata": {
            "encoding": "json/plain"
          },
          "data": "Sleep"
        }
      },
      "timerStartedEventAttributes": {
        "timerId": "12",
        "startToFireTimeout": "108364.627525955s",
        "workflowTaskCompletedEventId": "10"
      }
    },
    {
      "eventId": "13",
      "eventTime": "2025-12-20T19:01:53.031903343Z",
      "eventType": "EVENT_TYPE_TIMER_FIRED",
      "taskId": "63966945",
      "timerFiredEventAttributes": {
        "timerId": "12",
        "startedEventId": "12"
      }
    },
    {
      "eventId": "14",
      "eventTime": "2025-12-20T19:01:53.031906733Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_SCHEDULED",
      "taskId": "63966946",
      "workflowTaskScheduledEventAttributes": {
        "taskQueue": {
          "name": "ip-10-3-195-33.ap-south-1.compute.internal:e34f6e73-c4e7-4658-8d12-4ff192be22cc",
          "kind": "TASK_QUEUE_KIND_STICKY",
          "normalName": "ump-wf-queue"
        },
        "startToCloseTimeout": "10s",
        "attempt": 1
      }
    },
    {
      "eventId": "15",
      "eventTime": "2025-12-20T19:01:53.251184628Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_STARTED",
      "taskId": "63966955",
      "workflowTaskStartedEventAttributes": {
        "scheduledEventId": "14",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "requestId": "c160cbea-76b4-4861-8d0b-11e3b38c7acc",
        "historySizeBytes": "4297",
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        }
      }
    },
    {
      "eventId": "16",
      "eventTime": "2025-12-20T19:01:53.284531499Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_COMPLETED",
      "taskId": "63966962",
      "workflowTaskCompletedEventAttributes": {
        "scheduledEventId": "14",
        "startedEventId": "15",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        },
        "sdkMetadata": {
          "sdkName": "temporal-go",
          "sdkVersion": "1.38.0"
        },
        "meteringMetadata": {}
      }
    },
    {
      "eventId": "17",
      "eventTime": "2025-12-20T19:01:53.284590910Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_SCHEDULED",
      "taskId": "63966963",
      "activityTaskScheduledEventAttributes": {
        "activityId": "17",
        "activityType": {
          "name": "DebitActivity"
        },
        "taskQueue": {
          "name": "ump-debit-queue",
          "kind": "TASK_QUEUE_KIND_NORMAL"
        },
        "header": {},
        "input": {
          "payloads": [
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "Id": "f27afdfb-1589-4543-ac47-16d5dcde66d3",
                "ClientCode": "",
                "ClientName": "Test User",
                "InstallmentAmount": 4500,
                "FileDueDate": "",
                "AccountNo": "",
                "BankCode": "UPI",
                "MandateRefNo": "",
                "MandateId": "",
                "FileGenDate": "",
                "RegistrationId": "8107166",
                "UmrnNo": "@okhdfc",
                "ActualDueDate": "2025-12-21T00:00:00Z",
                "DueDate": "2025-12-21T00:00:00Z",
                "IsDeleted": false,
                "CounterNo": 2487,
                "IFSCCode": "",
                "MandateCreatedDate": "2025-02-13T00:00:00Z",
                "BankName": "",
                "SchemeName": "",
                "SchemeCode": "",
                "PaymentStatus": "",
                "SettlementType": "",
                "EstimatedDate": "0001-01-01T00:00:00Z",
                "UserEntity": "",
                "PgTransactionId": ""
              }
            },
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": "MOCK-SEQ"
            },
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "MinDebitOffset": 300000000000,
                "MaxDebitOffset": 43200000000000,
                "JitterWindow": 7200000000000,
                "MinSleepDuration": 86400000000000,
                "IBLDowntimeStart": 7200000000000,
                "IBLDowntimeDuration": 1800000000000
              }
            },
            {
              "metadata": {
                "encoding": "json/plain"
              },
              "data": {
                "PreDebitSleep": 7000000000,
                "DebitSleep": 5000000000
              }
            }
          ]
        },
        "scheduleToCloseTimeout": "0s",
        "scheduleToStartTimeout": "0s",
        "startToCloseTimeout": "120s",
        "heartbeatTimeout": "0s",
        "workflowTaskCompletedEventId": "16",
        "retryPolicy": {
          "initialInterval": "1s",
          "backoffCoefficient": 2,
          "maximumInterval": "60s",
          "maximumAttempts": 3
        }
      }
    },
    {
      "eventId": "18",
      "eventTime": "2025-12-20T20:43:13.753574358Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_STARTED",
      "taskId": "72351836",
      "activityTaskStartedEventAttributes": {
        "scheduledEventId": "17",
        "identity": "54@ip-10-3-202-194.ap-south-1.compute.internal@",
        "requestId": "632fa414-0095-4d46-91d3-4ad6b957084d",
        "attempt": 3,
        "lastFailure": {
          "message": "Current time 2025-12-21 02:12:12.840106543 +0530 IST is within IBL downtime window skipping debit attempt. Retrying after 30m58.024571025s. Downtime ends in 17m47.159893457s adding 13m10.864677568s jitter",
          "source": "GoSDK",
          "applicationFailureInfo": {
            "type": "NextDelay",
            "nextRetryDelay": "1858.024571025s"
          }
        },
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        }
      }
    },
    {
      "eventId": "19",
      "eventTime": "2025-12-20T20:43:13.771881525Z",
      "eventType": "EVENT_TYPE_ACTIVITY_TASK_FAILED",
      "taskId": "72351837",
      "activityTaskFailedEventAttributes": {
        "failure": {
          "message": "Current time 2025-12-21 02:13:13.770729179 +0530 IST is within IBL downtime window skipping debit attempt. Retrying after 1h21m34.331942638s. Downtime ends in 16m46.229270821s adding 1h4m48.102671817s jitter",
          "source": "GoSDK",
          "applicationFailureInfo": {
            "type": "NextDelay",
            "nextRetryDelay": "4894.331942638s"
          }
        },
        "scheduledEventId": "17",
        "startedEventId": "18",
        "identity": "54@ip-10-3-202-194.ap-south-1.compute.internal@",
        "retryState": "RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED"
      }
    },
    {
      "eventId": "20",
      "eventTime": "2025-12-20T20:43:13.771889383Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_SCHEDULED",
      "taskId": "72351838",
      "workflowTaskScheduledEventAttributes": {
        "taskQueue": {
          "name": "ip-10-3-195-33.ap-south-1.compute.internal:e34f6e73-c4e7-4658-8d12-4ff192be22cc",
          "kind": "TASK_QUEUE_KIND_STICKY",
          "normalName": "ump-wf-queue"
        },
        "startToCloseTimeout": "10s",
        "attempt": 1
      }
    },
    {
      "eventId": "21",
      "eventTime": "2025-12-20T20:43:13.791727564Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_STARTED",
      "taskId": "72351842",
      "workflowTaskStartedEventAttributes": {
        "scheduledEventId": "20",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "requestId": "098a6d3b-2a69-4b49-a194-95b4c4097264",
        "historySizeBytes": "6662",
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        }
      }
    },
    {
      "eventId": "22",
      "eventTime": "2025-12-20T20:43:13.814954218Z",
      "eventType": "EVENT_TYPE_WORKFLOW_TASK_COMPLETED",
      "taskId": "72351846",
      "workflowTaskCompletedEventAttributes": {
        "scheduledEventId": "20",
        "startedEventId": "21",
        "identity": "53@ip-10-3-195-33.ap-south-1.compute.internal@",
        "workerVersion": {
          "buildId": "b5a8f6156c3d7ccade82d2edcde699c3"
        },
        "sdkMetadata": {},
        "meteringMetadata": {}
      }
    },
    {
      "eventId": "23",
      "eventTime": "2025-12-20T20:43:13.814999858Z",
      "eventType": "EVENT_TYPE_WORKFLOW_EXECUTION_FAILED",
      "taskId": "72351847",
      "workflowExecutionFailedEventAttributes": {
        "failure": {
          "message": "activity error",
          "source": "GoSDK",
          "cause": {
            "message": "Current time 2025-12-21 02:13:13.770729179 +0530 IST is within IBL downtime window skipping debit attempt. Retrying after 1h21m34.331942638s. Downtime ends in 16m46.229270821s adding 1h4m48.102671817s jitter",
            "source": "GoSDK",
            "applicationFailureInfo": {
              "type": "NextDelay",
              "nextRetryDelay": "4894.331942638s"
            }
          },
          "activityFailureInfo": {
            "scheduledEventId": "17",
            "startedEventId": "18",
            "identity": "54@ip-10-3-202-194.ap-south-1.compute.internal@",
            "activityType": {
              "name": "DebitActivity"
            },
            "activityId": "17",
            "retryState": "RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED"
          }
        },
        "retryState": "RETRY_STATE_RETRY_POLICY_NOT_SET",
        "workflowTaskCompletedEventId": "22"
      }
    }
  ]
}

One more thing you can do is once you check err != nil from workflow.ExecuteActivity
you can log from workflow code if last attempt failed due to timeout. So something like:

  • I will try to include this in the next load test run.

I’m also unable to locate the TimerActiveTaskActivityTimeout metrics for start_to_close and schedule_to_start timeouts for activities.

sorry to ask but then what is the query for the graph on the left side that shows values?

Hi @tihomir

what is the query for the graph on the left side that shows values?


Ah ok thanks, so top graph is workflow task timeouts so probably not directly related to this issue.

Did you get chance to check from sdk metrics temporal_request_failure per operation and status_code? Something like:

sum(rate(temporal_request_failure{namespace="default"}[1m])) by (operation, status_code)

and from server metrics again can show
sum by (resource_exhausted_cause,namespace,operation) (rate(service_errors_resource_exhausted{service_name="frontend"}[1m]))

Hi @tihomir

I could not find anything specific in the graphs that indicates the cause of the error during the downtime window.

Attached screenshots

Do you think that the maximumInterval (set at 1m) is capping the nextRetryDelay ?

Any chance to align graphs (with TZ with the timestamps) in your event history shared?
Having bit hard time aligning both, from timestamps:
ActivityTaskScheduled - “eventTime”: “2025-12-20T19:01:53.284590910Z”,

ActivityTaskStarted (start of last attempt of activity) - “eventTime”: “2025-12-20T20:43:13.753574358Z”,

ActivityTaskFailed - “eventType”: “EVENT_TYPE_ACTIVITY_TASK_FAILED”,

Also any chance to see your activity code, trying to find out, if activity fails is there any possible path in activity code that does not go through it returning application failure with delay?

Hi @tihomir

The event time is recorded in GMT while all our calculations are in IST.


ActivityTaskScheduled - “eventTime”: “2025-12-20T19:01:53.284590910Z” → 2025-12-21 00:31:53 IST which aligns with below the log message

{
  "eventId": "17",
  "eventTime": "2025-12-20T19:01:53.284590910Z",
  "eventType": "EVENT_TYPE_ACTIVITY_TASK_SCHEDULED",
  "taskId": "63966963",
  "activityTaskScheduledEventAttributes": {
    "activityId": "17",
    "activityType": {
      "name": "DebitActivity"
    },
    "taskQueue": {
      "name": "ump-debit-queue",
      "kind": "TASK_QUEUE_KIND_NORMAL"
    },
    "header": {},
    "input": {
      "payloads": [
        {
          "metadata": {
            "encoding": "json/plain"
          },
          "data": {
            "Id": "f27afdfb-1589-4543-ac47-16d5dcde66d3",
            "ClientCode": "",
            "ClientName": "Test User",
            "InstallmentAmount": 4500,
            "FileDueDate": "",
            "AccountNo": "",
            "BankCode": "UPI",
            "MandateRefNo": "",
            "MandateId": "",
            "FileGenDate": "",
            "RegistrationId": "8107166",
            "UmrnNo": "@okhdfc",
            "ActualDueDate": "2025-12-21T00:00:00Z",
            "DueDate": "2025-12-21T00:00:00Z",
            "IsDeleted": false,
            "CounterNo": 2487,
            "IFSCCode": "",
            "MandateCreatedDate": "2025-02-13T00:00:00Z",
            "BankName": "",
            "SchemeName": "",
            "SchemeCode": "",
            "PaymentStatus": "",
            "SettlementType": "",
            "EstimatedDate": "0001-01-01T00:00:00Z",
            "UserEntity": "",
            "PgTransactionId": ""
          }
        },
        {
          "metadata": {
            "encoding": "json/plain"
          },
          "data": "MOCK-SEQ"
        },
        {
          "metadata": {
            "encoding": "json/plain"
          },
          "data": {
            "MinDebitOffset": 300000000000,
            "MaxDebitOffset": 43200000000000,
            "JitterWindow": 7200000000000,
            "MinSleepDuration": 86400000000000,
            "IBLDowntimeStart": 7200000000000,
            "IBLDowntimeDuration": 1800000000000
          }
        },
        {
          "metadata": {
            "encoding": "json/plain"
          },
          "data": {
            "PreDebitSleep": 7000000000,
            "DebitSleep": 5000000000
          }
        }
      ]
    }
  }
}


ActivityTaskStarted (start of last attempt of activity) - “eventTime”: “2025-12-20T20:43:13.753574358Z” → 2025-12-21 02:13:13 IST .

{
  "eventId": "23",
  "eventTime": "2025-12-20T20:43:13.814999858Z",
  "eventType": "EVENT_TYPE_WORKFLOW_EXECUTION_FAILED",
  "taskId": "72351847",
  "workflowExecutionFailedEventAttributes": {
    "failure": {
      "message": "activity error",
      "source": "GoSDK",
      "cause": {
        "message": "Current time 2025-12-21 02:13:13.770729179 +0530 IST is within IBL downtime window skipping debit attempt. Retrying after 1h21m34.331942638s. Downtime ends in 16m46.229270821s adding 1h4m48.102671817s jitter",
        "source": "GoSDK",
        "applicationFailureInfo": {
          "type": "NextDelay",
          "nextRetryDelay": "4894.331942638s"
        }
      },
      "activityFailureInfo": {
        "scheduledEventId": "17",
        "startedEventId": "18",
        "identity": "54@ip-10-3-202-194.ap-south-1.compute.internal@",
        "activityType": {
          "name": "DebitActivity"
        },
        "activityId": "17",
        "retryState": "RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED"
      }
    },
    "retryState": "RETRY_STATE_RETRY_POLICY_NOT_SET",
    "workflowTaskCompletedEventId": "22"
  }
}

Also any chance to see your activity code, trying to find out, if activity fails is there any possible path in activity code that does not go through it returning application failure with delay?

Since we are currently conducting load testing the code provided is the mock implementation.