Activity Scehduled but not started and timedout

I have a situation where temporal activies are getting scheduled but not started.

Blockquote

context deadline exceeded

Blockquote

Stack Trace

go.temporal.io/server/common/log.(*zapLogger).Error
	/home/builder/temporal/common/log/zap_logger.go:144
go.temporal.io/server/common/log.(*lazyLogger).Error
	/home/builder/temporal/common/log/lazy_logger.go:68
go.temporal.io/server/service/history/queues.(*executableImpl).HandleErr
	/home/builder/temporal/service/history/queues/executable.go:339
go.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1
	/home/builder/temporal/common/tasks/fifo_scheduler.go:232
go.temporal.io/server/common/backoff.ThrottleRetry.func1
	/home/builder/temporal/common/backoff/retry.go:170
go.temporal.io/server/common/backoff.ThrottleRetryContext
	/home/builder/temporal/common/backoff/retry.go:194
go.temporal.io/server/common/backoff.ThrottleRetry
	/home/builder/temporal/common/backoff/retry.go:171
go.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask
	/home/builder/temporal/common/tasks/fifo_scheduler.go:241
go.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask
	/home/builder/temporal/common/tasks/fifo_scheduler.go:217

All retries are failed and the activity is marked failed due to Start to close timeout.

Configuration details

"workflowExecutionTimeout": "315360000s",
      "workflowRunTimeout": "315360000s",
      "workflowTaskTimeout": "120s",
"startToCloseTimeout": "120s",

Workflow history JSON


[
  {
    "eventId": "1",
    "eventTime": "2024-03-27T23:29:31.542287229Z",
    "eventType": "WorkflowExecutionStarted",
    "version": "0",
    "taskId": "140537923",
    "workerMayIgnore": false,
    "workflowExecutionStartedEventAttributes": {
      "workflowType": {
        "name": "Workflows::ThirdParty::BankAccountValidationWorkflow"
      },
      "parentWorkflowNamespace": "",
      "parentWorkflowNamespaceId": "",
      "parentWorkflowExecution": null,
      "parentInitiatedEventId": "0",
      "taskQueue": {
        "name": "low-priority-queue",
        "kind": "Normal"
      },
      "input": {
        "payloads": [
          {
            "metadata": {
              "encoding": "anNvbi9wbGFpbg=="
            },
            "data": "IntcInZlcmlmaWNhdGlvbl9pZFwiOjU3NDc0MDF9Ig=="
          }
        ]
      },
      "workflowExecutionTimeout": "315360000s",
      "workflowRunTimeout": "315360000s",
      "workflowTaskTimeout": "120s",
      "continuedExecutionRunId": "",
      "initiator": "Unspecified",
      "continuedFailure": null,
      "lastCompletionResult": null,
      "originalExecutionRunId": "9958818a-d228-40ca-b67b-1d13dd98f7ec",
      "identity": "123@ip-10-202-16-199.ap-south-1.compute.internal\n",
      "firstExecutionRunId": "9958818a-d228-40ca-b67b-1d13dd98f7ec",
      "retryPolicy": null,
      "attempt": 1,
      "workflowExecutionExpirationTime": "2034-03-25T23:29:31.542Z",
      "cronSchedule": "",
      "firstWorkflowTaskBackoff": "0s",
      "memo": {
        "fields": {}
      },
      "searchAttributes": {
        "indexedFields": {}
      },
      "prevAutoResetPoints": null,
      "header": {
        "fields": {}
      },
      "parentInitiatedEventVersion": "0"
    }
  },
  {
    "eventId": "2",
    "eventTime": "2024-03-27T23:29:31.542348355Z",
    "eventType": "WorkflowTaskScheduled",
    "version": "0",
    "taskId": "140537924",
    "workerMayIgnore": false,
    "workflowTaskScheduledEventAttributes": {
      "taskQueue": {
        "name": "low-priority-queue",
        "kind": "Normal"
      },
      "startToCloseTimeout": "120s",
      "attempt": 1
    }
  },
  {
    "eventId": "3",
    "eventTime": "2024-03-27T23:29:52.945544800Z",
    "eventType": "WorkflowTaskStarted",
    "version": "0",
    "taskId": "140540440",
    "workerMayIgnore": false,
    "workflowTaskStartedEventAttributes": {
      "scheduledEventId": "2",
      "identity": "8@colend-temporal-worker-74859889bf-zc5g6\n",
      "requestId": "cd08dc04-47a2-4c5f-aa8b-c2b069c5fc8f",
      "suggestContinueAsNew": false,
      "historySizeBytes": "0"
    }
  },
  {
    "eventId": "4",
    "eventTime": "2024-03-27T23:29:56.151500134Z",
    "eventType": "WorkflowTaskCompleted",
    "version": "0",
    "taskId": "140540999",
    "workerMayIgnore": false,
    "workflowTaskCompletedEventAttributes": {
      "scheduledEventId": "2",
      "startedEventId": "3",
      "identity": "8@colend-temporal-worker-74859889bf-zc5g6\n",
      "binaryChecksum": "",
      "workerVersioningId": null
    }
  },
  {
    "eventId": "5",
    "eventTime": "2024-03-27T23:29:56.151552769Z",
    "eventType": "ActivityTaskScheduled",
    "version": "0",
    "taskId": "140541000",
    "workerMayIgnore": false,
    "activityTaskScheduledEventAttributes": {
      "activityId": "5",
      "activityType": {
        "name": "Activities::ThirdParty::UpdateRequestIdToVerificationActivity"
      },
      "taskQueue": {
        "name": "low-priority-queue",
        "kind": "Normal"
      },
      "header": {
        "fields": {
          "test-header": {
            "metadata": {
              "encoding": "anNvbi9wbGFpbg=="
            },
            "data": "InRlc3Qi"
          }
        }
      },
      "input": {
        "payloads": [
          {
            "metadata": {
              "encoding": "anNvbi9wbGFpbg=="
            },
            "data": "IntcInZlcmlmaWNhdGlvbl9pZFwiOjU3NDc0MDF9Ig=="
          }
        ]
      },
      "scheduleToCloseTimeout": "315360000s",
      "scheduleToStartTimeout": "315360000s",
      "startToCloseTimeout": "3600s",
      "heartbeatTimeout": "0s",
      "workflowTaskCompletedEventId": "4",
      "retryPolicy": {
        "initialInterval": "1s",
        "backoffCoefficient": 1,
        "maximumInterval": "40s",
        "maximumAttempts": 3,
        "nonRetryableErrorTypes": [
          "ActiveRecord::RecordNotFound",
          "CustomErrors::PreConditionFailed",
          "CustomErrors::UnprocessableEntity",
          "CustomErrors::NotFound",
          "CustomErrors::ValidationError"
        ]
      }
    }
  },
  {
    "eventId": "6",
    "eventTime": "2024-03-28T01:30:10.799747570Z",
    "eventType": "ActivityTaskStarted",
    "version": "0",
    "taskId": "143725399",
    "workerMayIgnore": false,
    "activityTaskStartedEventAttributes": {
      "scheduledEventId": "5",
      "identity": "7@colend-temporal-worker-74859889bf-4hgww\n",
      "requestId": "e7b823e4-cbe9-4877-94a2-4f4c376f16d5",
      "attempt": 3,
      "lastFailure": {
        "message": "activity StartToClose timeout",
        "source": "Server",
        "stackTrace": "",
        "encodedAttributes": null,
        "cause": null,
        "timeoutFailureInfo": {
          "timeoutType": "StartToClose",
          "lastHeartbeatDetails": null
        }
      }
    }
  },
  {
    "eventId": "7",
    "eventTime": "2024-03-28T02:30:11.235325680Z",
    "eventType": "ActivityTaskTimedOut",
    "version": "0",
    "taskId": "143725400",
    "workerMayIgnore": false,
    "activityTaskTimedOutEventAttributes": {
      "failure": {
        "message": "activity StartToClose timeout",
        "source": "Server",
        "stackTrace": "",
        "encodedAttributes": null,
        "cause": {
          "message": "activity StartToClose timeout",
          "source": "Server",
          "stackTrace": "",
          "encodedAttributes": null,
          "cause": null,
          "timeoutFailureInfo": {
            "timeoutType": "StartToClose",
            "lastHeartbeatDetails": null
          }
        },
        "timeoutFailureInfo": {
          "timeoutType": "StartToClose",
          "lastHeartbeatDetails": null
        }
      },
      "scheduledEventId": "5",
      "startedEventId": "6",
      "retryState": "MaximumAttemptsReached"
    }
  },
  {
    "eventId": "8",
    "eventTime": "2024-03-28T02:30:11.235331496Z",
    "eventType": "WorkflowTaskScheduled",
    "version": "0",
    "taskId": "143725401",
    "workerMayIgnore": false,
    "workflowTaskScheduledEventAttributes": {
      "taskQueue": {
        "name": "low-priority-queue",
        "kind": "Normal"
      },
      "startToCloseTimeout": "120s",
      "attempt": 1
    }
  },
  {
    "eventId": "9",
    "eventTime": "2024-03-28T02:30:16.999112316Z",
    "eventType": "WorkflowTaskStarted",
    "version": "0",
    "taskId": "143726515",
    "workerMayIgnore": false,
    "workflowTaskStartedEventAttributes": {
      "scheduledEventId": "8",
      "identity": "7@colend-temporal-worker-74859889bf-sr5pf\n",
      "requestId": "7dc6d1bc-8ba2-46a5-8d5d-62612e1bcdaf",
      "suggestContinueAsNew": false,
      "historySizeBytes": "0"
    }
  },
  {
    "eventId": "10",
    "eventTime": "2024-03-28T02:32:17.988239465Z",
    "eventType": "WorkflowTaskTimedOut",
    "version": "0",
    "taskId": "143751355",
    "workerMayIgnore": false,
    "workflowTaskTimedOutEventAttributes": {
      "scheduledEventId": "8",
      "startedEventId": "9",
      "timeoutType": "StartToClose"
    }
  },
  {
    "eventId": "11",
    "eventTime": "2024-03-28T02:32:17.988244898Z",
    "eventType": "WorkflowTaskScheduled",
    "version": "0",
    "taskId": "143753339",
    "workerMayIgnore": false,
    "workflowTaskScheduledEventAttributes": {
      "taskQueue": {
        "name": "low-priority-queue",
        "kind": "Normal"
      },
      "startToCloseTimeout": "120s",
      "attempt": 2
    }
  },
  {
    "eventId": "12",
    "eventTime": "2024-03-28T02:32:24.802564856Z",
    "eventType": "WorkflowTaskStarted",
    "version": "0",
    "taskId": "143753340",
    "workerMayIgnore": false,
    "workflowTaskStartedEventAttributes": {
      "scheduledEventId": "11",
      "identity": "8@colend-temporal-worker-74859889bf-ncs8f\n",
      "requestId": "ae0e1c09-a05b-4d06-95b2-48d1a2ad5ffa",
      "suggestContinueAsNew": false,
      "historySizeBytes": "0"
    }
  },
  {
    "eventId": "13",
    "eventTime": "2024-03-28T02:32:27.086689656Z",
    "eventType": "WorkflowTaskCompleted",
    "version": "0",
    "taskId": "143753341",
    "workerMayIgnore": false,
    "workflowTaskCompletedEventAttributes": {
      "scheduledEventId": "11",
      "startedEventId": "12",
      "identity": "8@colend-temporal-worker-74859889bf-ncs8f\n",
      "binaryChecksum": "",
      "workerVersioningId": null
    }
  },
  {
    "eventId": "14",
    "eventTime": "2024-03-28T02:32:27.086725009Z",
    "eventType": "WorkflowExecutionFailed",
    "version": "0",
    "taskId": "143753342",
    "workerMayIgnore": false,
    "workflowExecutionFailedEventAttributes": {
      "failure": {
        "message": "Timeout type: TIMEOUT_TYPE_START_TO_CLOSE",
        "source": "",
        "stackTrace": "/usr/local/bundle/bundler/gems/temporal-ruby-80f00631a025/lib/temporal/workflow/context.rb:106:in `execute_activity!'\n/usr/local/bundle/bundler/gems/temporal-ruby-80f00631a025/lib/temporal/activity/workflow_convenience_methods.rb:30:in `execute!'\n/colend-api/app/temporal/workflows/third_party/bank_account_validation_workflow.rb:9:in `execute'\n(eval):3:in `block in execute_with_newrelic_transaction_trace'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/instrumentation/controller_instrumentation.rb:384:in `perform_action_with_newrelic_trace'\n(eval):2:in `execute_with_newrelic_transaction_trace'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/method_tracer.rb:339:in `block (3 levels) in _nr_define_traced_method'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/method_tracer.rb:74:in `block in trace_execution_scoped'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/method_tracer_helpers.rb:37:in `block in trace_execution_scoped'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/tracer.rb:356:in `capture_segment_error'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/method_tracer_helpers.rb:37:in `trace_execution_scoped'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/method_tracer.rb:72:in `trace_execution_scoped'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/method_tracer.rb:334:in `block (2 levels) in _nr_define_traced_method'\n/usr/local/bundle/bundler/gems/temporal-ruby-80f00631a025/lib/temporal/workflow.rb:16:in `execute_in_context'\n/usr/local/bundle/bundler/gems/temporal-ruby-80f00631a025/lib/temporal/workflow/executor.rb:76:in `block (2 levels) in execute_workflow'\n/usr/local/bundle/bundler/gems/temporal-ruby-80f00631a025/lib/temporal/middleware/chain.rb:14:in `block in invoke'\n/usr/local/bundle/bundler/gems/temporal-ruby-80f00631a025/lib/temporal/middleware/chain.rb:20:in `invoke'\n/usr/local/bundle/bundler/gems/temporal-ruby-80f00631a025/lib/temporal/workflow/executor.rb:75:in `block in execute_workflow'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/tracer.rb:434:in `block (2 levels) in thread_block_with_current_transaction'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/tracer.rb:356:in `capture_segment_error'\n/usr/local/bundle/gems/newrelic_rpm-9.3.1/lib/new_relic/agent/tracer.rb:433:in `block in thread_block_with_current_transaction'",
        "encodedAttributes": null,
        "cause": null,
        "applicationFailureInfo": {
          "type": "Temporal::TimeoutError",
          "nonRetryable": false,
          "details": {
            "payloads": [
              {
                "metadata": {
                  "encoding": "anNvbi9wbGFpbg=="
                },
                "data": "IlRpbWVvdXQgdHlwZTogVElNRU9VVF9UWVBFX1NUQVJUX1RPX0NMT1NFIg=="
              }
            ]
          }
        }
      },
      "retryState": "RetryPolicyNotSet",
      "workflowTaskCompletedEventId": "13",
      "newExecutionRunId": ""
    }
  }
]

Activity retry policy specifies

        "maximumAttempts": 3,

The ActivityTaskStartedEvent specifies the number of actual attempts:

    "eventId": "6",
    "eventTime": "2024-03-28T01:30:10.799747570Z",
    "eventType": "ActivityTaskStarted",
...
      "attempt": 3,

So, the activity was retried 3 times, and it failed to complete within the specified timeout per

    "activityTaskTimedOutEventAttributes": {
      "failure": {
        "message": "activity StartToClose timeout",

It looks like it works as designed. You need to find out why your activity implementation times out repeatedly.

Hello @maxim Yes, the activity has excceded its allowe limits. The issue is it never got into execution. It got scehduled but never started and gave up all the allowed attempts. There is no error at application side and within temporal I could see error as “context_deadlineExceededError”

This happened because you set your StartToClose timeout to the same value as ScheduelToClose timeout. This told the service that this activity should not be retried if lost for some reason. I recommend setting StartToClose timeout to much sorter value (maximum expected single activity attempt execution time) and not setting ScheduleToClose at all if you are setting maxAttempts.

Hello @maxim Agree on the feedback of removing ScheduletoCLose if the cap is on max attempts. But for the above issue, my StartToClose is 120s Vs ScheduletoCLose is 315360000s

Sorry, you are right about the StartToClose timeout.

The history shows that the activity was executed three times before failing. I recommend looking into your activity implementation to see why it was invoked three times, even though you don’t see anything in the logs.

Yes, I am trying to find that. What I am not able to follow, is no trace in application and activites got failed in temporal itself. The only error I see in temporal logs in context deadline exceeded during transfer to task queue

There are rare intermittent conditions that could lead to an activity task loss. But they are so rare that a single retry is enough for the activity to start executing.