I’m building a “job status” system that will allow users to see the status (and potentially results or errors) of long-running tasks that we execute via temporal.
Roughly, I’m thinking I’ll keep track of each customer’s jobs by saving the workflowId/runId to the database when a job is started.
In non-exceptional cases (a job completes successfully or a known error state occurs) I can easily write the success/failure status back to the database from within a final activity in temporal.
However, I’m struggling to find a way to capture WorkflowExecutionStatus for things like cancelled/terminated/timedout, or even exceptional failures (unless I wrap all of my workflows in try-catch).
Ideally I’d be able to always execute some callback when WorkflowExecutionStatus changes for specific workflow types.
If that doesn’t exist, the only other thing I can think of is creating a cron workflow that occasionally scans my database for “running” jobs and uses DescribeWorkflowExecution to query the WorkflowExecutionStatus and write it back to the database if it’s in a terminal state.
I would recommend not using termination and workflow timeouts for business-level features. For all other scenarios, you can write an interceptor that stores the status of a workflow in your DB.
Re: termination – I don’t use termination as part of the workflow, but am just trying to guard against edge cases such as, an engineer has to terminate a workflow via the temporal UI due to some unforeseen bug. In this case I’d like my job status system to eventually reflect that it was terminated.
Re: workflow timeouts – Do you also suggest not using activity timeouts? For some types of jobs it feels strange to allow infinite time. One example would be bulk importing data from an external system. I’ve seen cases where a connection to the external system hung, etc. Ideally we implement the workflow in a way that a connection timeout causes this not to be possible and throws an appropriate error that fails the workflow, but having a sensible activity timeout has saved me from having a workflow that runs forever previously.
For termination, you can also implement a process that updates the job status.
Do you also suggest not using activity timeouts?
No. You must use activity timeouts, as activities use retries to deal with failures. Timeouts are the only way to detect process crashes and other edge cases. So use as short as possible a StartToClose timeout and use heartbeating with a HeartbeatTimeout for activities that can take a long time.
Thanks again! Last question (I think) – which interceptor method should I implement to handle workflow cancellation due to activity timeout / no more activity retries?
Any workflow failure is reported as an exception. So, the interceptor would need to handle the exception from the main workflow method. Implement WorkflowInboundCallsInterceptor.execute for this.
In the typescript SDK, is it possible to get runId from the handle returned by client.start()? I see firstExecutionRunId is available but not runId. Perhaps my question is irrelevant and they will always be the same upon client.start()?
ContinueAsNew seems to surface as an exception in WorkflowInboundCallsInterceptor.execute and I can’t seem to figure out how to distinguish it from a real failure by any of the properties in the exception. Any ideas?
To your first question, please see this comment in code:
// runId is not used in handles created with `start*` calls because these
// handles should allow interacting with the workflow if it continues as new.
I was expecting to see the ContinueAsNew error in the first log. Somewhere in that log should be the error name: ContinueAsNew along with whatever other properties a ContinueAsNew error has (since the second log, log.error(e.name) logs ContinueAsNew).
@antonio.perez does this make sense? I can file a bug in github if that’s a better place, but it seems like logging an error only logs the temporal context and not any properties of the error itself.
@antonio.perez How can I handle workflows that are cancelled (not terminated) via the temporal UI in an interceptor? I assumed this implementation would mark it as failed but it doesn’t (it stays in RUNNING).
export class WorkflowStatusInterceptor
implements WorkflowInboundCallsInterceptor
{
constructor() {}
public async execute(
input: WorkflowExecuteInput,
next: Next<WorkflowInboundCallsInterceptor, 'execute'>
): Promise<unknown> {
try {
// update status to RUNNING
const ret = await next(input)
// update status to COMPLETED
return ret
} catch (error) {
if (typeof error === 'object' && error !== null) {
if (error.name === 'ContinueAsNew') {
// update status to CONTINUEDASNEW
throw error
}
}
// update status to FAILED
throw error
}
}
}
}
I’m not sure what is exactly wrong with your code. You absolutely can catch and handle the cancellation exception in an interceptor. If workflow keeps running it is usually from throwing an unexpected exception type instead of the Temporal exception.