How to deal with long running loops that cannot be offloaded to activities

I’m making a new project using temporal that needs to parse Excel sheets.

I’m using Apache POI for excel manipulation.

I’ve had issues passing the Workbook instance from Workflow to Activity. I later learned that this is because anything returned by the completion client must be serializable and this class is not.

To work around this, I’ve had to parse the excel sheet in the workflow method. I know this is against the nature of temporal, and any “long execution” code should be moved to activities.

Naturally, now temporal is throwing warnings about potential deadlocks because it can’t poll the workflow while it’s in the loop.

Is there any way to pass non serializable classes to activities? I’ve tried making a wrapper class for the workbook instance and tagging it with @JsonIgnore but the result is null when it’s returned to the workflow.

I’ve had issues passing the Workbook instance from Workflow to Activity.

Are you creating Workbook in workflow code? You could offload its creation to activity and would not have to pass it in as input.

Is there any way to pass non serializable classes to activities?

You can pass them as input to activity impl constructor when you register activity with worker. You could write a custom data converter for your type if you need to pass it as input (by default it would need to be serializable to json using jackson).

Once you parse the workbook do you iterate through each entry and perform actions? If so you might want to use “iterator workflow pattern” mentioned in this forum post.

I’ll read the post and see if it answers my question - however, the structure currently is as follows:

  1. Workflow is triggered with a URL to the Excel file on a google cloud bucket
  2. URL is given to an activity to download the file and write it to disk (because of the serialization limitation, I can’t return the instance directly from the activity)
  3. Workflow creates a workbook instance from the file, and iterates over all the rows (and this is the loop that bothers temporal because it can’t poll the workflow

Regarding your solutions,
Passing it as an activity: I think I tried passing the Workbook instance to an activity but I’ll give it another go. If I don’t need to return the workbook instance itself, then I should be safe from the serialization failure.

As to the data converter, how would that work exactly? I tried making a custom data converter and it didn’t help, but I probably did something wrong.

What would implementing a custom converter do exactly? My issue is jackson tries to serialize a class that isn’t serializable. If I remember correctly, what I tried to do with the custom data converter was to explicitly tell jackson not to serialize this class, but then it would still be null.

Take a look at the file-processing sample, imo you can have one activity which writes the file to disc and then can call another activity on same host to read it and return a “batch” of rows or whatever the iteration parameter are for processing.

I’ve taken a look at that already actually,

The issue with returning a “batch of rows” is that it will still be a loop regardless. Iterating over the rows in the workflow method simply doesn’t work.

Is there a way to get the worker’s ActivityExecutionContext from the workflow? That way I can pass the Workbook instance in the constructor. I don’t have access to the excel during registration, so can’t do it that way.

If its just impossible as you mention you could do all processing in activity. Just make sure it heartbeats and you can set the heartbeat payload to current iteration so in case of retry could continue from last completed. Would it be possible to see some of your current impl? Could maybe help with data coverter for this type.