I’m building an ETL server on top of Temporal that exports data in batch-style. I would like to use Schedules as Cron Schedules are fairly limited. When a Schedule executes my ETL Workflow, I would like to pass the bounds for the batch export, i.e. the start and end times of the batch or just the start time and deduce the end time by the Schedule frequency.
I’ve been digging through the Schedule documentation and examples, but I can’t find any relevant examples. Best thing I can come up with is attempting to extract the timestamp of the Action that starts the Workflow from the Workflow id, as the docs state that it’s appended by the Action. However, I imagine this timestamp will not match with the expected boundaries of each batch export. Moreover, it’s unclear is this would work when backfilling and running multiple workflows simultaneously.
Are there any established solutions on how to achieve this?
Can you clarify this a bit? Are you asking how to add a start and end for the schedule to ensure it doesn’t run outside of that window? You can set StartAt and EndAt on the ScheduleSpec in addition to the intervals/calendars.
Sure, let me clarify: I have a long-running Schedule that exports data every hour by triggering a Workflow, and would like to pass to the Workflow the start time of the current batch. So, I’m not referring to the start and end times of the Schedule (this is long-running, and doesn’t have an end time), but the start and end times of the current batch. For example, in any day, there will be 24 Workflow executions triggered by the Schedule, and each one would receive the current hour as some sort of input: 2023-04-11 00:00, 2023-04-11 01:00, 2023-04-11 02:00, …, 2023-04-11 23:00.
You could use the current timestamp but this loses idempotency and makes the Workflow un-retryable (as you can’t go back in time).
If you are familiar with Airflow, I’m asking for something like the data_interval_start and data_interval_endvariables assigned to each DagRun.
Will the Action timestamp match the exact hour mark (assuming an hourly schedule)? I need to exactly match the hour mark to avoid losing high frequency data. Of course, this is something I could correct in code.
I’m not sure exactly what the author had in mind with that recommendation. I think the warning may be overly broad. In general it’s better for things that affect the business logic to be passed in the input, but in this case, the workflow is being started by the server itself, which doesn’t have access to user data converters, so we can’t pass it in the input. That said, using the search attribute is a little messy. There’s a feature request to make a better accessor for it: [Feature Request] Expose schedule specific info in Workflows through a API · Issue #243 · temporalio/features · GitHub
The timestamp is the exact time matched by the schedule, with second resolution, before jitter is added.