Find cause of "Complete result exceeds size limit" error

An activity failed with the following error.
ActivityTaskFailed: Complete result exceeds size limit

It’s not retried because it’s a NonRetryableFailure.

What exactly does this error mean?

I thought maybe the result of the activity is too large to store in the event log. But I’ve checked and in this case the JSON-encoded result would be ~100 bytes. That’s peanuts compared to some other activity results that we have.

Besides the JSON-encoded result I guess more data is part of the event (headers? metadata?), but none of the other activity executions have failed so far.
There have been between 30k and 50k successful executions of this activity in the past 2 weeks.
If I pick a random ActivityTaskCompleted event for one of the successful executions and inspect the payloads in the web UI it’s something like this:

{
  "payloads": [
    {
      "metadata": {
        "encoding": "json"
      },
      "data": "{FieldOne:1234,FieldTwo:null,FieldThree:null,FieldFour:null}"
    }
  ]
}

How do I debug the failed activity? Where do I start finding the cause of the size limit that’s exceeded?

I’m on version v0.27.0.

1 Like

It is a tricky one. We never heard about such a problem. Could you add some logging that logs the activity result size before completion? We will think if there is a way to expose more information about this particular type of failure.

Hey @Slijkhuis,
Each time an activity fails because of exceeding the size limit, Temporal server emits both a metric and warning log on the server with information about the payload sizes and server limits for the namespace.
Here is the logic on the which checks for Blob size limits. Can you look for the following server log to see what are payloads being passed to the server:

"Blob size exceeds limit."

Default limits are the following:

BlobSizeLimitError = 2MB
BlobSizeLimitWarn = 256KB

These are configurable per namespace, but I highly recommend to not increase them beyond this number as they directly correlate to the size of transaction on the underlying persistence store.

Another thing you can also check is look at the ActivityTaskScheduledEvent in the history which should have the input parameters used to invoke the activity and try to test the implementation of your activity with those input parameters to see if it returns large result.

Ah, I see. I’ve queried the logs and found it:

{
  "level": "warn",
  "ts": "2020-08-14T10:59:23.388Z",
  "msg": "Blob size exceeds limit.",
  "service": "frontend",
  "wf-namespace-id": "67a51123-4947-4020-8d51-f7e0b60bebd9",
  "wf-id": "b0372af3-895a-44be-af83-c527bfbd5dd5_28",
  "wf-run-id": "a74406e7-4928-4b39-9436-5f39f11b43ad",
  "wf-size": 2784428,
  "blob-size-violation-operation": "RespondActivityTaskCompleted",
  "logging-call-at": "util.go:431"
}

If wf-size is bytes, then I guess it is indeed >2MB. I can’t reproduce that though, the same activity with the same input produces only a payload of around 100 bytes if I try it again.

That part I didn’t see in the logs. But:

I didn’t touch the limits, so they will be set to the defaults. And the default limits seem to be more than we (should) need, I probably don’t want to change that.

When I re-ran it with the exact same input, it worked… So for now I can’t reproduce it and I don’t have any more info.
Still I’m very confused how it happened that one time.

Thank you for all the information. If I encounter it again and I have more debugging logs I will let you know.

Hi @samar

I am able to reproduce this issue in my local machine.
I am using Java temporal SDK.

I have an activity that returns a list containing 40000 records.
But activity fails with same error “Complete result exceeds size limit”.

But now as you suggested instead of increasing the BlobSize I will need to look some other way to fetch these records.
Probably not return such huge list of records in activity and instead store it somewhere and read from there for usage.

Below is stackTrace of exception for same activity.

2023-05-27 12:47:30.343 [workflow-method-cc2306e2-2b40-4973-925b-945840802454-d92f73b4-e76b-402e-b272-c50a712ef491] WARN i.t.i.sync.WorkflowExecutionHandler - Workflow execution failure WorkflowId=‘cc2306e2-2b40-4973-925b-945840802454’, RunId=d92f73b4-e76b-402e-b272-c50a712ef491, WorkflowType=‘RandomWorkflow’
io.temporal.failure.ActivityFailure: Activity with activityType=‘FindRecords’ failed: ‘Activity task failed’. scheduledEventId=5, startedEventId=6, activityId=11934d9c-b42c-3ddd-813f-900dc9af5368, identity=‘35060@SAG-8C1ZVL3’, retryState=RETRY_STATE_NON_RETRYABLE_FAILURE
at java.base/java.lang.Thread.getStackTrace(Thread.java:1610)
at io.temporal.internal.sync.ActivityStubBase.execute(ActivityStubBase.java:49)
at io.temporal.internal.sync.ActivityInvocationHandler.lambda$getActivityFunc$0(ActivityInvocationHandler.java:78)
at io.temporal.internal.sync.ActivityInvocationHandlerBase.invoke(ActivityInvocationHandlerBase.java:60)
at jdk.proxy2/jdk.proxy2.$Proxy259.findRecords(Unknown Source)
at com.local.RandomWorkflowImpl.execute(RandomWorkflowImpl.java:37)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:302)
at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:277)
at io.temporal.internal.sync.WorkflowExecutionHandler.runWorkflowMethod(WorkflowExecutionHandler.java:70)
at io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:116)
at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:102)
at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:106)
at io.temporal.worker.ActiveThreadReportingExecutor.lambda$submit$0(ActiveThreadReportingExecutor.java:53)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: io.temporal.failure.ServerFailure: Complete result exceeds size limit.
at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:136)
at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79)
at io.temporal.failure.FailureConverter.failureToExceptionImpl(FailureConverter.java:93)
at io.temporal.failure.FailureConverter.failureToException(FailureConverter.java:79)
at io.temporal.internal.sync.SyncWorkflowContext$ActivityCallback.lambda$invoke$0(SyncWorkflowContext.java:294)
… 8 common frames omitted

It is an anti-pattern to return very large payloads as activity result. There are many ways instead to model your use case:

  1. Store activity result as blob to another store and return a pointer to it.
  2. Keep the activity result on the same worker and route other activities to it
  3. If you want to iterate over a large data set then have a long running activity which heartbeats progress back to Temporal

If you can tell me more about your use case I can provide more targeted advice.