Long running workflows with undeterminable scheduleToCloseTimeout/startToCloseTimeout

I am running a workflow with undeterminable scheduleToCloseTimeout/startToCloseTimeout.

  1. Start Workflow .
  2. Calculate the time required to sleep
  3. Sleep till sundown
  4. Wakeup and perform the task
  5. Sleep until next sundown.

Start the workflow with the following options

const activityOptions: ActivityOptions = {
  **startToCloseTimeout: '5m',**
  scheduleToStartTimeout: '30s',
  retry: {
    maximumAttempts: 1,
  },
  heartbeatTimeout: '5m', 
};

then

import { activityInfo, log, sleep, CancelledFailure, heartbeat } from '@temporalio/activity';
..

export async function sleepWithHeartbeatAndTimeout(elapsedSeconds: number) {
 while (elapsedSeconds > 0) {
      try {
        const sleepSeconds = Math.min(elapsedSeconds, 60); // Limit sleep to 1 minute
        // Signal a heartbeat to indicate activity is alive
        await heartbeat();
        // Sleep for the remaining 
        await sleep(sleepSeconds * 1000);
        elapsedSeconds--; // Decrement remaining 
      } catch (error) {
        // Handle errors gracefully, potentially retrying or logging
        console.error("Error during sleepWithHeartbeatAndTimeout:", error);
        throw error;
      }
    } 
}

After 5 mins. throws an exception because startToCloseTimeout is set to 5 mins.

Error during sleepWithHeartbeatAndTimeout: CancelledFailure: NOT_FOUND
at Activity.cancel (xxxx\node_modules@temporalio\worker\src\activity.ts:53:16)

In this case, there’s no end startToCloseTimeout. Only solution seems to be set this to higher number like 10 years.

Is this the correct way to handle this?

I think you confused activity with workflow. The question and the code are activity code, not the workflow code.

StartToClose timeout is an activity timeout. You also disabled the activity retries by setting maximumAttempts to 1. So if the activity doesn’t complete in 5 minutes it times out. My guess is that the workflow doesn’t handle the activity failure exception correctly and fails.

A few comments about your activity timeouts:

` startToCloseTimeout: ‘5m’`` is the maximum time the single activity attempt to execute. So if activity is expected to run longer you have to set it to longer value.

scheduleToStartTimeout: '30s', this is the time an activity task can stay in a queue before being picked up by a worker. We don’t recommend setting this value at all unless you are using worker specific task queue.

`heartbeatTimeout: ‘5m’, this is the maximum time between heartbeat calls. Setting this value to the same value as StartToClose timeout doesn’t make sense.

NOT_FOUND happens when heartbeat is emitted for an already timed out activity.

Thanks! makes sense now. I will increase the timeout for the startToCloseTimeout for the activity.

The initial reasoning for scheduleToStartTimeout: '30s' is that if all the workers are busy and unable to pick up this activity from the queue, I want it to time out rather than pick it up later. In my use case, it’s important to pick it up within at least 30 seconds.

Thanks again.

The initial reasoning for scheduleToStartTimeout: ‘30s’ is that if all the workers are busy and unable to pick up this activity from the queue, I want it to time out rather than pick it up later. In my use case, it’s important to pick it up within at least 30 seconds.

I see. This is a valid use case for the scheduleToStartTimeout.