We want to use Temporal as a microservice orchestrator. Most of the microservice calls are fast, get completed in 500ms to 1 second. The entire orchestrator should finish within 10 seconds. We are trying to use Local activities in some cases to bring down the overall orchestration time.
Can we call a microservice asynchronously from a Local activity? When I use the following code from Local activity:
io.temporal.failure.ApplicationFailure: message='getTaskToken is not supported for local activities', type='java.lang.UnsupportedOperationException', nonRetryable=false
at io.temporal.internal.sync.LocalActivityExecutionContextImpl.getTaskToken(LocalActivityExecutionContextImpl.java:60) ~[temporal-sdk-0.29.1.jar:na]
As per my understanding, Local activities are executed in the same process where Workflow executes and results are not persisted in Cassandra. Does it mean, like Workflow, we can’t execute non-deterministic functions (example, UUID.randomUUID() or System.currentTimeMillis()) from Local activities?
Can we call a microservice asynchronously from a Local activity? When I use the following code from Local activity:
It is not currently supported while could be added. Not sure about the value of using async for very fast API calls.
As per my understanding, Local activities are executed in the same process where Workflow executes and results are not persisted in Cassandra. Does it mean, like Workflow, we can’t execute non-deterministic functions (example, UUID.randomUUID() or System.currentTimeMillis()) from Local activities?
Local activity results are persisted in Cassandra as Marker events. So they can execute any non-deterministic operations the same way normal activities do.
Thank you, @maxim . Had a few follow-up questions:
If local activities are persisted in Cassandra, does that mean if the workflow needs to be restarted in a different worker node because of some hardware failure, the local activity will not be re-executed?
The local activities are exected in the worker process while the normal activities are scheduled using task queues across different workers. Is this the only difference?
We have a very aggressive SLA (10 seconds) to complete the end-to-end orchestration. All the activities are short-lived (5 seconds max). Can we create the entire orchestration using just local activities?
If we use local activities, do we need to adjust any config values like setWorkflowTaskTimeout() based on how much time is taken by the local activities?
If local activities are persisted in Cassandra, does that mean if the workflow needs to be restarted in a different worker node because of some hardware failure, the local activity will not be re-executed?
It depends on the point of the workflow worker failure. If it fails after the local activity state was persisted (workflow task completed successfully) then the local activity is not reexecuted. If it fails after the local activity executed but the results weren’t committed then it will be reexecuted as part of the workflow task retry.
The local activities are exected in the worker process while the normal activities are scheduled using task queues across different workers. Is this the only difference?
They are implemented very differently. For example a local activity is retried the whole workflow task is retried, if the normal activity is retried only the activity task is retried. The local activities are not expected to be long running, so they can not heartbeat.
We have a very aggressive SLA (10 seconds) to complete the end-to-end orchestration. All the activities are short-lived (5 seconds max). Can we create the entire orchestration using just local activities?
Yes, you can. The downside is that the whole orchestration will be retried from the beginning in the case of worker failures.
f we use local activities, do we need to adjust any config values like setWorkflowTaskTimeout() based on how much time is taken by the local activities?
The SDK already has logic to extend the duration of the workflow task if local activities take longer than the workflow task timeout.
Thanks, @maxim. I had a few more queries. I am just trying to making sure I understand the pros and cons of local activities before using them in production.
I think the behavior will be the same for normal activities if the task worker fails before committing the result from the activity task. Please let me know if I am wrong here.
Retrying the whole workflow, is it applicable only in the case of hardware failures? For example, if the local activity times out or some exception is thrown from the activity, even in those cases is the whole workflow retired?
If the worker fails, the entire workflow task will be re-executed. But neither the local nor the normal activities will be executed again, the activity output will be fetched from Cassandra. Please correct me if I am wrong here. The only difference is, if we use only local activities in a workflow, the workflow task will be relatively long and may take longer to retry.
For our use case, the entire orchestration can take 10-15 seconds, each activity can take up to 5 seconds. If we use only local activities, are the following settings reasonable?
I think the behavior will be the same for normal activities if the task worker fails before committing the result from the activity task. Please let me know if I am wrong here.
The difference is that a normal activity is retried only if it was requested by the retry options. For example if retry options maxAttempts is set to one Temporal doesn’t retry a failed activity and delivers timeout to the workflow. Local activity can be retried even if its maxAttempts is 1.
Retrying the whole workflow, is it applicable only in the case of hardware failures? For example, if the local activity times out or some exception is thrown from the activity, even in those cases is the whole workflow retired?
I didn’t say that the whole workflow is retried. I said that a workflow task is retried. So all the local activities that were executed as part of that particular task can be reexecuted.
If the worker fails, the entire workflow task will be re-executed. But neither the local nor the normal activities will be executed again, the activity output will be fetched from Cassandra. Please correct me if I am wrong here. The only difference is, if we use only local activities in a workflow, the workflow task will be relatively long and may take longer to retry.
Activities and local activities that were executed as part of the previous workflow tasks will not be reexecuted as their results were already saved. All local activities executed as part of the failed workflow task (which is the whole orchestration in your case) will be retried on a workflow task failure.
For our use case, the entire orchestration can take 10-15 seconds, each activity can take up to 5 seconds. If we use only local activities, are the following settings reasonable?
What is the reason to use Temporal if you don’t care about the completion of the orchestration after 15 seconds?