Workflow replay behaviour after crash

Hi, this is a question to gather knowledge about the behaviour of workflow replay after crash. Assume I have the following method in my workflow. Let’s say that we are trying to download 1000 files, that means args.getSourceFilenames() returns a list with 1000 strings. In the middle of the execution, when we have only started 500 downloads, the server running the workflow crashes.

A bit later the workflow execution starts again in another server, at this point the Temporal server knows that the workflow stopped its latest execution attempt in the method processFile, correct?

When the code starts to call the activities to download the files, is TemporalSDK / server smart enough to know that some of these calls have already happened?

Another similar question, does Temporal memoize and return the memoized results for the execution of an activity in a method inside a workflow?

In this scenario, we may be downloading files that are not so small, so after having the server crash and after the workflow started running again, the activities may not have finished yet.

  public void processFile(Arguments args) {
    List<Promise<String>> localNamePromises = new ArrayList<>();
    List<String> processedNames = null;
    try {
      // Download all files in parallel.
      for (String sourceFilename : args.getSourceFilenames()) {
        Promise<String> localName =
            Async.function(activities::download, args.getSourceBucketName(), sourceFilename);
        localNamePromises.add(localName);
      }
      List<String> localNames = new ArrayList<>();
      for (Promise<String> localName : localNamePromises) {
        localNames.add(localName.get());
      }
      processedNames = activities.processFiles(localNames);

      // Upload all results in parallel.
      List<Promise<Void>> uploadedList = new ArrayList<>();
      for (String processedName : processedNames) {
        Promise<Void> uploaded =
            Async.procedure(
                activities::upload,
                args.getTargetBucketName(),
                args.getTargetFilename(),
                processedName);
        uploadedList.add(uploaded);
      }
      // Wait for all uploads to complete.
      Promise.allOf(uploadedList).get();
    } finally {
      for (Promise<String> localNamePromise : localNamePromises) {
        // Skip files that haven't completed downloading.
        if (localNamePromise.isCompleted()) {
          activities.deleteLocalFile(localNamePromise.get());
        }
      }
      if (processedNames != null) {
        for (String processedName : processedNames) {
          activities.deleteLocalFile(processedName);
        }
      }
    }
  }

“Starts again” is not the right mental model. It resumes from the point it reached at the previous server. So from the developer point of view workflow doesn’t belong to any server and executes exactly once.

When the code starts to call the activities to download the files, is TemporalSDK / server smart enough to know that some of these calls have already happened?

Yes.

Another similar question, does Temporal memoize and return the memoized results for the execution of an activity in a method inside a workflow?

Yes, but this is a workflow recovery implementation detail. When implementing workflow, think about it executing exactly once.

In this scenario, we may be downloading files that are not so small, so after having the server crash and after the workflow started running again, the activities may not have finished yet.

Workflow is not going to reschedule activities on recovery. But if an activity is interrupted by the server crash it will eventually timeout (usually after StartToCloseTimeout or HeartbeatTimeout) and be retried according its retry options.

Also note that workflows have hard limits that should be obeyed: