Hi,
I have a temporal implementation that has the following structure. Its a user-interactive workflow that should be responsive to stop/cancel requests.
The workflows and activities are modelled as follows.
##parent workflow
*the parent workflow contains a seqence of steps to be executed one after another. each step is modelled as a workflow embedded inside a cancellation scope that can be cancelled and the parent workflow stopped at that point
*when user wants to stop the workflow, the parent workflow will receive a STOP signal using a signal method. At that point, the parent workflow will trigger cancel() on the currently executing cancellationscope(which is excecuting a child workflow) and will return, skipping the execution of remaining steps(cancellation scopes).
##cancellationscope(each step of a parent workflow)
*Each cancellationscope will trigger a child workflow
*Because the cancellation scope needs to be cancelled, it triggers the embedded child workflows asynchronously.
*Inside the cancellation scope, Async.procedure() is used to trigger the child workflow asynchronously
*After workflow is triggered, the parent workflow needs to wait till this step is complete. So a Promise.get()(promise returned by Async.procedure()) is used to wait till the child workflow is complete. this promise.get() is executed by the parentworkflow.
##childworkflow
*The child workflow is embedded inside the cancellation scope.
*The child workflow will trigger a set of activities in parallel asynchronously using Async.function().
*The child workflow will do a promise.get() for all the promises returned by Async.function() to wait for all the activities execution to complete and return
##activity
*The activity does some processing and handles DB persistence.
*Each activity heartbeats and will return, when it detects that the child workflow that triggered the activity got cancelled.
Configuration
* every Activity/Child workflow is set with RetryOptions.maximumattempts as 1 so that no retry happens.
when we signal the parent workflow to stop, the parent workflow inturn cancels the cancellationscope(the workflow step that is currently executing) that triggered the child workflow and returns
Expectation:
* The child workflow receives the cancellation exception.
* the child workflow gets cancelled
* all the activities that are triggered from the child workflow(scheduled for execution) will get cancelled
* the parent workflow gets completed. the child workflow gets cancelled( this canceled/completed state) is expected to be reflected in the temporal web UI
What actually happens:
* in the temporal we can see both parent and child workflows continue to run.
* first exception we see is io.temporal.failure.ActivityFailure(caused by io.temporal.failure.CanceledFailure) at the point where the childworkflow is waiting for the activity result we see cancelled failures in the logs at the point where child workflow is doing a get() on activitiy promises.
* Second exception is: WARN [temporalworker,] [Workflow Executor taskQueue=“WorkflowQueue”, namespace=“default”: 100] WARN i.t.i.sync.DeterministicRunnerImpl.close - Promise completed with exception and was never accessed. The ignored exception:
io.temporal.failure.ActivityFailure: scheduledEventId=32, startedEventId=0, activityType=‘GenerateAndPersist’, activityId=‘461240c0-9005-3e25-9f39-424b920e77ba’, identity=‘workflow’, retryState=RETRY_STATE_UNSPECIFIED.
This second exception keeps coming in the logs and doesnt stop.
* In the temporal web UI we can see that both the child and parent workflows continue in the running state indefinitely
full details of the exceptions are below:
* First exception
06-05-2022 09:45:40.058 ERROR [temporalworker,,,] [workflow-method] ERROR i.t.i.s.POJOWorkflowImplementationFactory.logWorkflowExecutionException - Workflow execution failure WorkflowId=658f5ea3-f9a9-34f7-9c9c-c8fc61bb7961, RunId=9c34a875-d2f6-464f-b0d1-b2f2aaf10952, WorkflowType=ReachLinkEnumerationWorkflow io.temporal.failure.ActivityFailure: scheduledEventId=23, startedEventId=0, activityType='GenerateAndPersist', activityId='bce747c8-db73-38db-838f-b1a3ee2ffd5b', identity='workflow', retryState=RETRY_STATE_UNSPECIFIED
at java.base/java.lang.Thread.getStackTrace(Unknown Source)
at io.temporal.internal.sync.CompletablePromiseImpl.throwFailure(CompletablePromiseImpl.java:136)
at io.temporal.internal.sync.CompletablePromiseImpl.getImpl(CompletablePromiseImpl.java:95)
at io.temporal.internal.sync.CompletablePromiseImpl.get(CompletablePromiseImpl.java:74)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX(application code line where its doing a get() on promise)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:324)
at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:282)
at io.temporal.internal.sync.WorkflowExecuteRunnable.run(WorkflowExecuteRunnable.java:53)
at io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:126)
at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:101)
at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:107)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.temporal.failure.CanceledFailure:
at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:128)
at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:72)
at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:86)
at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:72)
at o.temporal.internal.sync.SyncWorkflowContext$ActivityCallback.lambda$invoke$0(SyncWorkflowContext.java:198)
... 7 common frames omitted
- second exception (one that is coming indefinitely)
06-05-2022 09:45:40.063 WARN [temporalworker,] [Workflow Executor taskQueue=“WorkflowQueue”, namespace=“default”: 100] WARN i.t.i.sync.DeterministicRunnerImpl.close - Promise completed with exception and was never accessed. The ignored exception:
io.temporal.failure.ActivityFailure: scheduledEventId=32, startedEventId=0, activityType=‘GenerateAndPersist’, activityId=‘461240c0-9005-3e25-9f39-424b920e77ba’, identity=‘workflow’, retryState=RETRY_STATE_UNSPECIFIED
at java.base/java.lang.Thread.getStackTrace(Unknown Source)
at io.temporal.internal.sync.CompletablePromiseImpl.throwFailure(CompletablePromiseImpl.java:136)
at io.temporal.internal.sync.CompletablePromiseImpl.getImpl(CompletablePromiseImpl.java:95)
at io.temporal.internal.sync.CompletablePromiseImpl.get(CompletablePromiseImpl.java:74)
at io.temporal.internal.sync.DeterministicRunnerImpl.close(DeterministicRunnerImpl.java:387)
at io.temporal.internal.sync.DeterministicRunnerImpl.runUntilAllBlocked(DeterministicRunnerImpl.java:317)
at io.temporal.internal.sync.SyncWorkflow.eventLoop(SyncWorkflow.java:144)
at io.temporal.internal.replay.ReplayWorkflowExecutor.eventLoop(ReplayWorkflowExecutor.java:74)
at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler$EntityManagerListenerImpl.eventLoop(ReplayWorkflowRunTaskHandler.java:328)
at io.temporal.internal.statemachines.WorkflowStateMachines.eventLoop(WorkflowStateMachines.java:388)
at io.temporal.internal.statemachines.WorkflowStateMachines.lambda$scheduleActivityTask$ff6e4b43$1(WorkflowStateMachines.java:449)
at io.temporal.internal.statemachines.ActivityStateMachine.notifyCanceled(ActivityStateMachine.java:318)
at io.temporal.internal.statemachines.FixedTransitionAction.apply(FixedTransitionAction.java:45)
at io.temporal.internal.statemachines.StateMachine.executeTransition(StateMachine.java:137)
at io.temporal.internal.statemachines.StateMachine.handleHistoryEvent(StateMachine.java:91)
at io.temporal.internal.statemachines.EntityStateMachineBase.handleEvent(EntityStateMachineBase.java:63)
at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventImpl(WorkflowStateMachines.java:210)
at io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:178)
at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleEvent(ReplayWorkflowRunTaskHandler.java:140)
at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:180)
at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:150)
at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithEmbeddedQuery(ReplayWorkflowTaskHandler.java:201)
at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:114)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:319)
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:279)
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.temporal.failure.CanceledFailure:
at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:128)
at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:72)
at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:86)
at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:72)
at io.temporal.internal.sync.SyncWorkflowContext$ActivityCallback.lambda$invoke$0(SyncWorkflowContext.java:198)
at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:101)
at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:107)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
… 3 common frames omitted