Is it possible to use interceptors to slice the output of an activity in Python SDK?

cip · September 18, 2024, 12:59pm

Hello!

I am building an ETL workflow in which the first activity downloads a list of URLs. For each URL an activity will be spawned to fetch the content of the page it references.

The issue I am facing is that the activity which downloads the URLs returns a list of 22k strings, which exceeds the maximum payload size that can be returned by an activity.

Thus my question is, would it be possible to use an interceptor to slice this output before it gets returned to the workflow? If not, what other ways are there to prevent this limit without saving the list to a file?

Thanks!

RobZienert · September 18, 2024, 9:44pm

I don’t think you can slice an activity response into chunks as you’re describing: Temporal will register a single activity completion event for an activity, but it sounds like you’d like multiple.

For ETL use cases, I tend to recommend keeping data that you’re working on in a blob store like S3 (or some other datastore), then passing pointers to this data around in your workflow & activities. Doing so will allow you to work with arbitrary data sizes, and you’ll more easily avoid workflow history size limits.

If you need these items in the workflow–for instance to spawn child workflows per-item–then perhaps try writing an activity that supports passing a token into the activity (as you would do for pagination) and iterate over the activity, returning a subset of the data, until the token is None.

cip · September 19, 2024, 12:37pm

Hey Rob! Thanks for the reply!
I have taken the suggestion and looking into how to paginate the activity.

Another option that I explored was for the activity to return a reference to an iterator object that will return the links one by one. Would it be possible to have such an object stored in the context?

RobZienert · September 19, 2024, 3:45pm

“The context” being the workflow context? You don’t really add stuff into the workflow context directly, but what you’re describing is essentially what I proposed initially, but instead of returning pages, you’re just returning a single item at a time. Personally, I wouldn’t do this, because it would mean 22,000 activity calls: That’s a lot! Not only will it make your workflow slower and bigger (from a history perspective), it’ll also be pretty expensive. I’d recommend batching for those reasons instead.

Topic		Replies	Views
Workflow / activity middleware Community Support	7	2115	March 19, 2021
Catching errors from a workflow like activity interceptors do Community Support worker , typescript-sdk , workflow-implementat	4	2496	January 26, 2024
What is the best way to exchange large amounts of data between Activities without running into "Complete result exceeds size limit" error? Community Support go-sdk	4	7250	April 18, 2024
Heartbeat interceptor Community Support java-sdk , activity	6	63	May 5, 2025
Cancelling workflows created from an activity python sdk Community Support	8	147	June 2, 2025

Is it possible to use interceptors to slice the output of an activity in Python SDK?

Related topics