Best way to streaming data between activities in Temporal

No, you can’t route large amounts of data through the Temporal server, it’s not designed for that. You need to store the data somewhere, such as in S3 as you mentioned, and then in the signal you can pass a reference to the data (such as a filename, or a S3 bucket and key).

You mentioned that the batch might be too large for a single worker process to handle? So you’d need to split the batch into chunks, and have multiple workers process the chunks in parallel? Something like MapReduce, perhaps?

Temporal doesn’t implement MapReduce itself. You could process the batch using a MapReduce system, and use a workflow to control the MapReduce system through its API.