I want to start a workflow(file_loader) which runs all the time
- it takes xml file as input
- for each name in the file, there should be a new independent main workflow(no relation with file_loader) associated with that name.
if a file have 10000 names, there should be 10000 independent workflows has to be created and run.
this independent workflows can have other child workflows.
Suggest me the best practices and possible solutions to make it. (edited)
- it takes xml file as input
given payload size limit (limit of input to your workflow) would consider maybe passing path to your xml file rather than file itself if you think it can exceed 2mb
for each name in the file, there should be a new independent main workflow(no relation with file_loader) associated with that name
could pass path to xml file to activity, activity would read the file and start reading rows, for each row use client api to start workflow execution. after each start your activity could heartbeat with current row num as heartbeat payload
if activity retries (activity worker restarts, you hit activity startToClose timeout) you should also in activity (at beginning) check last heartbeat details so can resume from latest heartbeat payload (row number) and not try to start all executions from beginning.
couple things to think about for this approach
- pass client to activity when you register it with worker (dont create client in activity itself each time)
- handle client errors in activity code when you use client api to start executions
- log each start so you have record of which executions started and which ones failed (for workflow id reuse policy for example)
- depending on your service and worker setup you might need to throttle workflow starts in your activity, one idea could be to start 100 and then “sleep” for a bit before starting next 100. if your service and workers are optimized for this load then dont have to worry about this part
- detect unexpected activity retries - sdk worker metric temporal_request_failure for operation RespondActivityTaskCompleted can help. from server side you can also detect this, sample grafana query:
sum(rate(start_to_close_timeout{operation="TimerActiveTaskActivityTimeout"}[5m])) by(namespace,operation)
- check for any resource exhausted issues on service side, sample grafana query
sum(rate(service_errors_resource_exhausted{}[1m])) by (operation, resource_exhausted_cause)
Thank you for the insights and covering all the edge cases.