Tried with v1.7.0 java sdk, issue persists. But I see following additional logs (3 times)
internal.worker.WorkflowWorker: Workflow task failure during replying to the server. startedEventId=19, WorkflowId=45F46FAD4EB8455AA8C7F0B3F25C17DC@AVABCgA-06437495323552-0, RunId=415789e6-c560-4d4f-ac46-4368a849618b. If seen continuously the workflow might be stuck. io.grpc.StatusRuntimeException: NOT_FOUND: workflow execution already completed
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[grpc-stub-1.43.2.jar:1.43.2]
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) ~[grpc-stub-1.43.2.jar:1.43.2]
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) ~[grpc-stub-1.43.2.jar:1.43.2]
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:2671) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendReply$0(WorkflowWorker.java:320) ~[temporal-sdk-1.7.0.jar:?]
at io.temporal.internal.retryer.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:44) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:61) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:51) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.retryer.GrpcRetryer.retry(GrpcRetryer.java:41) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendReply(WorkflowWorker.java:313) ~[temporal-sdk-1.7.0.jar:?]
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:243) ~[temporal-sdk-1.7.0.jar:?]
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:188) ~[temporal-sdk-1.7.0.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:83) ~[temporal-sdk-1.7.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
internal.worker.WorkflowWorker: Workflow task failure during replying to the server. startedEventId=14, WorkflowId=5089E07B66E342388F0FEF2FE4735E18@AVABCgA-06437495323552-0, RunId=21f5ef39-bf54-458e-876a-2a57c98f4449. If seen continuously the workflow might be stuck. io.grpc.StatusRuntimeException: INVALID_ARGUMENT: encouter invalid commands sequence: RecordMarker, RecordMarker, RecordMarker, CompleteWorkflowExecution, RecordMarker
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[grpc-stub-1.43.2.jar:1.43.2]
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) ~[grpc-stub-1.43.2.jar:1.43.2]
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) ~[grpc-stub-1.43.2.jar:1.43.2]
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:2671) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendReply$0(WorkflowWorker.java:320) ~[temporal-sdk-1.7.0.jar:?]
at io.temporal.internal.retryer.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:44) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:61) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:51) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.retryer.GrpcRetryer.retry(GrpcRetryer.java:41) ~[temporal-serviceclient-1.7.0.jar:?]
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendReply(WorkflowWorker.java:313) ~[temporal-sdk-1.7.0.jar:?]
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:243) ~[temporal-sdk-1.7.0.jar:?]
at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:188) ~[temporal-sdk-1.7.0.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:83) ~[temporal-sdk-1.7.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
I am unable to reproduce this using 1.7.0 or 1.8.0. For both cases the local activity that throws non-retryable error fails and produces a MarkerRecorded event with the failure. The second local activity retries and then fails with the retryable error.
What server version are you using?
Could you show your relevant code so can make sure I’m doing the same thing?
I am unable to reproduce this using 1.7.0 or 1.8.0. For both cases the local activity that throws non-retryable error fails and produces a MarkerRecorded event with the failure.
Actually I take that back. It just does not happen every time. After running it 10 times I was able to get:
13:56:45.271 [Workflow Executor taskQueue="HelloParallelActivityTaskQueue", namespace="default": 1] WARN i.t.internal.worker.WorkflowWorker - Workflow task failure during replying to the server. startedEventId=3, WorkflowId=HelloParallelActivityWorkflow, RunId=790382b4-27c5-4f0c-bbd4-cdc88e699036. If seen continuously the workflow might be stuck.
io.grpc.StatusRuntimeException: INVALID_ARGUMENT: encouter invalid commands sequence: CompleteWorkflowExecution, RecordMarker
@ravikiran are you by chance not waiting for the async started local activities to complete? I can see thementioned issue if start the async invocations, but then don’t wait for them to complete. This could be a race condition between workflow completing and marker being recorded in history.
If I do for example:
List<Promise<String>> promiseList = new ArrayList<>();
promiseList.add(Async.function(localActivityOne::doSomething));
promiseList.add(Async.function(localActivityTwo::doSomethingElse);
// ...
// wait for local activities to complete to get results
// getting the error if i don't have this code block in workflow
for (Promise<String> promise : promiseList) {
// ...
promise.get();
}
I am unable to get this error, tried running it over 30 times.
@ravikiran could you provide a reproducible sample please, if not, can you show your local activity and workflow options, and local activity code where you throw the exceptions?
I ask because ran test using your latest shown code over 200 times and was not able to get the failure.
Can however reproduce it without the:
Are you sure you are throwing NonRetryable exception first followed by Retryable one?
Anyway, thanks to you for below question. It is indeed a race condition and when we get a “NonRetryable” exception, we just complete other promise and ignore its result or exception.
Basically, this is what we did to workaround the issue.