RuntimeException: Failure processing activity response

Hi,

We are seeing intermittent errors in activity execution. From logs we could see that activity execution completes but while sending completion event to temporal server, something breaks and throws below exception

WARN i.i.internal.worker.ActivityWorker - Failure during reporting of activity result to the server

java.lang.RuntimeException: Failure processing activity response. WorkflowId=bff0ea8b-3e93-4fa6-b382-cea474962e6c_609595626_37, RunId=dd618f4f-e50a-49e5-925e-500f91078b3e, ActivityType=PublishAudit, ActivityId=28e44819-a8c6-3c10-b360-31aca1a4d4bc

These exceptions are very generic and doesn’t actually throw light on what caused it.

Since StartToClose timeout is reached, workflow does retry and then activity call succeeds. So we dont suspect any code issue here as first retry works.

Could you please help troubleshoot this issue? Please suggest if we can tweak any settings.

@Dhanraj

Wondering if this could happen because the server times out the activity before the worker send the completion request to the server.

From logs we could see that activity execution completes

Do you know how much your activity takes to get completed? and what is the StartToClose?

Is this the full exception? I think it should have some additional exception as a cause.

@antonio.perez @maxim sharing the log screenshot

StartToClose timeout is set to 2hrs as this service sometimes take more time to respond in worst case scenario. In this particular case, activity was completed within seconds and but something failed while sending completion event to temporal. As a result temporal didnt receive any event and it waited for 2 hours to initiate retry which is expected as StartToClose timeout is 2hrs.

Question is what went wrong while sending the activity completion event as logs are not helpful to diagnose the issue.

WARN i.i.internal.worker.ActivityWorker - Failure during reporting of activity result to the server

java.lang.RuntimeException: Failure processing activity response. WorkflowId=bff0ea8b-3e93-4fa6-b382-cea474962e6c_609595626_37, RunId=dd618f4f-e50a-49e5-925e-500f91078b3e, ActivityType=PublishAudit, ActivityId=28e44819-a8c6-3c10-b360-31aca1a4d4bc

It looks like the logger is skipping the original cause. The code that throws that exception does attach the cause: sdk-java/temporal-sdk/src/main/java/io/temporal/internal/worker/ActivityWorker.java at c215a78b55020bec7f67b31c0dd8c62bcc007f62 · temporalio/sdk-java · GitHub

Seen this a couple of times. Could be that the stack trace is too large hence for some reason not printed.