Passing workflow start time via schedule

Hello,

I’m building an ETL server on top of Temporal that exports data in batch-style. I would like to use Schedules as Cron Schedules are fairly limited. When a Schedule executes my ETL Workflow, I would like to pass the bounds for the batch export, i.e. the start and end times of the batch or just the start time and deduce the end time by the Schedule frequency.

I’ve been digging through the Schedule documentation and examples, but I can’t find any relevant examples. Best thing I can come up with is attempting to extract the timestamp of the Action that starts the Workflow from the Workflow id, as the docs state that it’s appended by the Action. However, I imagine this timestamp will not match with the expected boundaries of each batch export. Moreover, it’s unclear is this would work when backfilling and running multiple workflows simultaneously.

Are there any established solutions on how to achieve this?

Thanks!

Can you clarify this a bit? Are you asking how to add a start and end for the schedule to ensure it doesn’t run outside of that window? You can set StartAt and EndAt on the ScheduleSpec in addition to the intervals/calendars.

1 Like

Sure, let me clarify: I have a long-running Schedule that exports data every hour by triggering a Workflow, and would like to pass to the Workflow the start time of the current batch. So, I’m not referring to the start and end times of the Schedule (this is long-running, and doesn’t have an end time), but the start and end times of the current batch. For example, in any day, there will be 24 Workflow executions triggered by the Schedule, and each one would receive the current hour as some sort of input: 2023-04-11 00:00, 2023-04-11 01:00, 2023-04-11 02:00, …, 2023-04-11 23:00.

You could use the current timestamp but this loses idempotency and makes the Workflow un-retryable (as you can’t go back in time).

If you are familiar with Airflow, I’m asking for something like the data_interval_start and data_interval_end variables assigned to each DagRun.

You can get the start time from the TemporalScheduledStartTime search attribute. E.g. in Go this would be something like:

var startTime time.Time
p := workflow.GetInfo(ctx).SearchAttributes.GetIndexedFields()["TemporalScheduledStartTime"]
err := converter.GetDefaultDataConverter().FromPayload(p, &startTime)

This is the time specified by the schedule, before jitter is applied, regardless of the time it actually runs. So if you do a backfill, it’ll be the times that were backfilled.

(This is in the docs for Workflow Schedules under “Action”, but maybe it’s not entirely clear. Where were you looking and what info do you think was missing?)

1 Like

I did spot this, hence why I mentioned parsing the Workflow Id, which is also in the list of additional properties. However, I have some doubts:

  • Search attributes shouldn’t be used for business logic, which is what I intend to do. Not experienced enough to understand why the docs make this recommendation.
  • Will the Action timestamp match the exact hour mark (assuming an hourly schedule)? I need to exactly match the hour mark to avoid losing high frequency data. Of course, this is something I could correct in code.

I’m not sure exactly what the author had in mind with that recommendation. I think the warning may be overly broad. In general it’s better for things that affect the business logic to be passed in the input, but in this case, the workflow is being started by the server itself, which doesn’t have access to user data converters, so we can’t pass it in the input. That said, using the search attribute is a little messy. There’s a feature request to make a better accessor for it: [Feature Request] Expose schedule specific info in Workflows through a API · Issue #243 · temporalio/features · GitHub

The timestamp is the exact time matched by the schedule, with second resolution, before jitter is added.

1 Like

Great, thanks for the replies!

I’ll use the search attribute for now and keep an eye out on the feature request you linked. I think I was initially put-off by the warning in the docs.