I recently experimented converting high workload queued jobs to Temporal workflows and activities.
One in particular involves processing very large CSV files by splitting them in multiple activities.
The issue I have on my end is that the activity eventually re-attempts but I am unsure what the issue could be.
I have posted the error below but if my colleague runs the exact same code the file processed correctly with all the activities never retrying in the first place.
Same amount of workers (1) and num_workers: 10 in .rr.yaml.
We both run the same Docker setup using docker compose which describes the containers necessary to run the application.
Has anyone hit a similar issue? Seems like something to do with my setup, any help would be very much appreciated.
Not sure this is Temporal specific, looks maybe like resource utilization issue on your worker when processing large file, maybe running out of memory/disc space?
One in particular involves processing very large CSV files by splitting them in multiple activities.
can you give more defaults on how your have implemented this use case?
There’s something that is not working correctly on my machine specifically, the other machine the code is tested on has less resources than the one I am working on.
With 64GB RAM / 4TB disk (25GB RAM / 200GB disk to Docker):
Current usage: ~8GB RAM, ~1% CPU, 89GB storage
There’s a chance it is not Temporal specific the CSV is essentially split into chunks and indexes are kept for which rows to process.
As for the process it is the same as it was with the other queue system we were using, except with that queue system this process would complete and process the file.
We don’t actually split the file itself. Instead, we create lightweight “chunk descriptors” that tell each activity which rows to process from the original file.
Step-by-Step Process
Initialization (happens once):
Read the CSV file to count total rows and identify any rows to skip
Create chunk metadata - just row numbers, not actual data
We’re not copying the CSV data 40 times. Each activity streams directly from the original file and only reads its assigned rows.
I am reviewing my setup again but since we use Docker and the same OS it should be essentially the same I believe.
It looks a lot more like it’s an issue on my end and not the code but I am unsure what, if I find the answer I will post it here in case anyone runs into a similar issue.