Lets say we have an API which accepts files (Content-Type:multipart/formdata), and this needs to start a workflow involving the file. The general guideline is to upload the file to an object storage and pass on the URL to the workflow to keep footprint small.
However, the act of uploading the file is in itself an activity. Should this pre-requisite be handled outside a workflow (outside temporal), with regular primitives of uploading in a try-catch block? or is there a way we can do this within temporal?
Edit: Media processing workflows. This blog provides an approach where-in the workflow polls the IoT devices for video files, essentially a pull based approach. What can be done in push based approaches, where-in the API consumers do not necessarily have to adhere to contracts set by us, and thereby do not expose such an endpoint to poll for files. Rather, they push the files to our endpoint
The general guideline is to upload the file to an object storage and pass on the URL to the workflow to keep footprint small.
I believe its not just to keep small footprint, there are concrete limitations such as blob size (2mb)(workflow/activity inputs / results), grpc message max size (4mb), as well as event history limitations (50K event count, 50mb history event per single execution) that are taken in consideration for the guideline.
Can you give more info on this push mechanism that you are using now? Once end user submits a file is what you are trying to retry its upload to a blob store?
Can you give more info on this push mechanism that you are using now?
Ideally (if there was no size limitations), we would have started the (email) workflow with the provided large file(15mb). To get around the size limitation, we first upload the large file to s3 and just provide its reference to the workflow. But this act of pushing to s3 itself is a candidate for retry, and we put this action in regular try/catch to handle any errors during push.