Best practice for handling lost external events and resuming paused workflows

Abhinav_Verma · November 10, 2025, 8:04am

Hi Temporal Community,

I have a workflow that pauses while waiting for an asynchronous downstream operation to complete. The downstream system publishes an event to Kafka once the operation is done, and my service listens to this event and triggers a webhook to send a signal to resume the Temporal workflow.

However, there are cases where these downstream events might be lost or not delivered, causing the workflow to remain paused indefinitely and eventually time out.

To handle this, I’m exploring the idea of adding a polling fallback mechanism — so that when the workflow is reset or retried, it can poll the downstream system to verify if the operation has already completed and then resume accordingly.

I wanted to check:

Does Temporal provide any built-in support or recommended pattern for handling such scenarios where external signals might be lost?
If not, what would be the best practice to implement this kind of recovery or polling mechanism within a Temporal workflow?

Thanks in advance for your insights!

Abhinav_Verma · November 10, 2025, 11:01am

@maxim Also my polling frequency will be after every 1 hour.

maxim · November 10, 2025, 3:11pm

Abhinav_Verma · November 11, 2025, 3:14am

@maxim I’m planning to implement periodic polling that runs every hour. My idea is to create a child workflow responsible for polling and notifying the parent workflow once it completes (as provided in samples-java repo), based on the downstream state. Do you see any potential concerns or drawbacks with this approach?

Abhinav_Verma · November 11, 2025, 7:44am

@maxim I’m implementing a parent workflow that must do two things at once:

Kick off a child workflow that periodically polls some downstream state and returns true when the polling condition is met.
Listen for an external signal that can set a workflowPaused flag at any time.

Current code (simplified):

PollingChildWorkflow childWorkflow =
  Workflow.newChildWorkflowStub(PollingChildWorkflow.class,
    ChildWorkflowOptions.newBuilder().setWorkflowId("ChildWorkflowPoll").build());

boolean success = childWorkflow.exec(pollingIntervalInSeconds);

Workflow.await(() -> success || workflowPaused);

The issue: childWorkflow.exec(...) blocks the parent until the child completes, so the parent can’t concurrently react to signals while the child is polling.

Question : what are the recommended patterns in Temporal to run the polling child and wait for a signal in parallel? For example — should I invoke the child asynchronously, have the child send a signal back to the parent on completion, convert the poller into an activity, use promises/async APIs, or something else? Any example snippets, pitfalls, or best practices would be really helpful.

antonio.perez · November 11, 2025, 2:07pm

Hi,

This post shows how to start a child workflow async in Java, Best way to create an async child workflow , sample code here .

Something like this should work

ChildWorkflow child = Workflow.newChildWorkflowStub(ChildWorkflow.class, childWorkflowOptions);
Promise<String> result = Async.function(child::executeChild);
result.thenApply(
    (String r) -> {
      done = true;
      return r;
    });


Workflow.await(() -> done || signal_received);

You can put the logic within a cancellation scope to cancel the timer if the signal arrives first.

Another approach can be start the child workflow after the workflow await times out,

Workflow.await(duration, () -> sinal_received);

//child workflow polling  here

so you give time to kafka to send the signal back to the parent workflow

Abhinav_Verma · November 11, 2025, 2:47pm

Thanks @antonio.perez . I’m implementing periodic polling as a child workflow, which will be triggered by the parent workflow.

In the official Temporal Java samples, the polling logic is implemented directly inside the workflow code. However, that is not a good practice Polling in workflow vs. Activity? - #2 by maxim.

Since in my design the polling runs inside a child workflow, and the parent workflow only records the start and end events, am I correct in assuming that this approach won’t negatively impact the parent workflow’s history size or performance?

antonio.perez · November 11, 2025, 4:28pm

Hi,

In the official Temporal Java samples, the polling logic is implemented directly inside the workflow code

Could you show me where? note that this is relaying on activity retries , not calling an activity in a loop.

I correct in assuming that this approach won’t negatively impact the parent workflow’s history size or performance?

it will add three event to the workflow history ( StartChildWorkflowExecutionInitiated/Failed.., ChildWorkflowExecutionStarted, ChildWorkflowExecutionCompleted/Failed/Cancelled)

Abhinav_Verma · November 11, 2025, 6:30pm

@antonio.perez samples-java/core/src/main/java/io/temporal/samples/polling/periodicsequence/PeriodicPollingChildWorkflowImpl.java at main · temporalio/samples-java · GitHub Here you can see we are calling activity for the number of attempts. Also let’s say If my parent workflow is running for 2-3 days will child workflow history impact if i’m using it for polling for every 1 hour ??

antonio.perez · November 11, 2025, 6:43pm

Thanks

github.com/temporalio/samples-java

core/src/main/java/io/temporal/samples/polling/periodicsequence/README.md?plain=1

626bf032c


      
          This is a rare scenario where polling requires execution of a sequence of Activities, or Activity arguments need to change between polling retries.
          
          For this case we use a Child Workflow to call polling Activities a set number of times in a loop and then periodically calls continue-as-new.

Also let’s say If my parent workflow is running for 2-3 days will child workflow history impact if i’m using it for polling for every 1 hour ??

Sorry, does this answer your question or is this a different one?

it will add three event to the workflow history ( StartChildWorkflowExecutionInitiated/Failed.., ChildWorkflowExecutionStarted, ChildWorkflowExecutionCompleted/Failed/Cancelled)

I have the felling that you don’t need a child workflow, infrequent polling should do what you need

Abhinav_Verma · November 12, 2025, 2:54am

@antonio.perez For infrequent polling, I’ll need to start a new workflow as well, right? I was thinking of using a child workflow for this.

Can this be achieved without creating a new workflow? Are you suggesting that I should directly call an activity from my parent workflow instead? This will IMO as well add events like ActivityTaskScheduled/Completed per retry.

Abhinav_Verma · November 12, 2025, 7:35am

@maxim @antonio.perez

I have the following code snippet in my workflow:

Promise result = Async.function(pollHandlerActivity::poll, req);
Workflow.await(() → isPaused || result.get());

As I understand it, the activity will continue executing asynchronously in the background (including any retries, as per its retry policy). Meanwhile, the workflow will remain paused at the Workflow.await() line until either the activity completes (result.get() == true) or the isPaused flag becomes true.

From my perspective, when the predicate will be evaluated since I have used result.get() it will block until the activity completes. Is there any better way to do this ??

awwx · November 12, 2025, 9:45am

I think result.get() is not what you want here, because get() is a blocking call, it will wait until the activity has completed. I think what you want here is result.isCompleted(), which will tell you if the activity has completed or not.

Abhinav_Verma · November 12, 2025, 10:22am

github.com/temporalio/samples-java

core/src/main/java/io/temporal/samples/polling/infrequent/InfrequentPollingActivityImpl.java

626bf032c


      
            public InfrequentPollingActivityImpl(TestService service) {
              this.service = service;
            }
          
            @Override
            public String doPoll() {
              try {
                return service.getServiceResult();
              } catch (TestService.TestServiceException e) {
                // We want to rethrow the service exception so we can poll via activity retries
                throw ApplicationFailure.newBuilder()
                    .setMessage(e.getMessage())
                    .setType(e.getClass().getName())
                    .setCause(e)
                    // This failure is expected so we set it as benign to avoid excessive logging
                    .setCategory(ApplicationErrorCategory.BENIGN)
                    .build();
              }
            }
          }

I’m using temporal sdk - 1.23.1 and it don’t have newBuilder() in ApplicationFailure class. What is the alternative for this ??

Can I do something like

throw Activity.wrap(e);

Abhinav_Verma · November 12, 2025, 11:11am

@awwx When I use result.isCompleted(), if it evaluates to true, that could also mean the activity completed with an exception, not just successfully. In that case, my Workflow.await() condition would still evaluate to true, even though the activity actually failed — which isn’t what I want. So this approach wouldn’t work correctly for my use case.

awwx · November 12, 2025, 11:34am

To use automatic activity retries, you throw an exception from the activity if you check and the operation hasn’t completed yet. The activity won’t be retried if it completes normally, even if it returns false or something. samples-java/core/src/main/java/io/temporal/samples/polling/TestService.java at 626bf032cd168ffd353305a7662c2e72f6bc0ce1 · temporalio/samples-java · GitHub

Now the activity will only complete if the poll determines that the operation has completed, and the activity returns normally without throwing an exception.

If you don’t want to use infinite retries and so need to check whether the activity completed normally or with an exception, I think you’d want to do something like wait for isCompleted() and then check the result; I think just calling get() is going to block your await.

antonio.perez · November 12, 2025, 11:59am

Workflow.await(() → isPaused || result.get());

I think this should work, the only problem I see is that .get will throw an exception is the activity fails (after all retries are exhausted) .

This is another approach

Promise<Boolean> result = Async.function(activity::execute);
        result.thenApply(
                (Boolean r) -> {
                    done = true;
                    return r;
                }).exceptionally((ex -> {

//your logic to handle the failure here
            done = false;
            return null;
        }));

    Workflow.await(() -> done || signal_received);

Abhinav_Verma · November 12, 2025, 1:12pm

Thanks @antonio.perez result.thenApply is non-blocking right ??

maxim · November 17, 2025, 12:15am

Right, it returns a Promise without blocking

Abhinav_Verma · November 18, 2025, 3:23pm

antonio.perez:

Promise<Boolean> result = Async.function(activity::execute);
        result.thenApply(
                (Boolean r) -> {
                    done = true;
                    return r;
                }).exceptionally((ex -> {

//your logic to handle the failure here
            done = false;
            return null;
        }));

    Workflow.await(() -> done || signal_received);

@antonio.perez @maxim Is this workflow code deterministic.

Live Execution

Activity runs
Callback (thenApply) fires
done is set to true
await() resume
Workflow moves forward

Replay Execution (Happens later)

During replay:

Activity does not run
As a result thenApply(...) is never triggered
done.set(true) never happens
done.get() remains false

But since:

done = false
workflowPaused = false

The workflow enters an infinite wait.

Is my understanding correct ??

Topic		Replies	Views
What is the best practice for a polling activity? Community Support go-sdk , activity , polling	20	19170	November 25, 2024
Long polling inside workflows or activities Community Support polling	8	2219	March 22, 2024
Is Temporal inherently not built for long polling? Community Support	5	138	September 21, 2025
Restart workflow from a failed activity Community Support workflow-config	5	2951	November 14, 2024
How to support both Polling and Callback in a workflow? Community Support java-sdk	19	4519	June 22, 2023

Best practice for handling lost external events and resuming paused workflows

Related topics