Find cause of "Complete result exceeds size limit" error

An activity failed with the following error.
ActivityTaskFailed: Complete result exceeds size limit

It’s not retried because it’s a NonRetryableFailure.

What exactly does this error mean?

I thought maybe the result of the activity is too large to store in the event log. But I’ve checked and in this case the JSON-encoded result would be ~100 bytes. That’s peanuts compared to some other activity results that we have.

Besides the JSON-encoded result I guess more data is part of the event (headers? metadata?), but none of the other activity executions have failed so far.
There have been between 30k and 50k successful executions of this activity in the past 2 weeks.
If I pick a random ActivityTaskCompleted event for one of the successful executions and inspect the payloads in the web UI it’s something like this:

{
  "payloads": [
    {
      "metadata": {
        "encoding": "json"
      },
      "data": "{FieldOne:1234,FieldTwo:null,FieldThree:null,FieldFour:null}"
    }
  ]
}

How do I debug the failed activity? Where do I start finding the cause of the size limit that’s exceeded?

I’m on version v0.27.0.

1 Like

It is a tricky one. We never heard about such a problem. Could you add some logging that logs the activity result size before completion? We will think if there is a way to expose more information about this particular type of failure.

Hey @Slijkhuis,
Each time an activity fails because of exceeding the size limit, Temporal server emits both a metric and warning log on the server with information about the payload sizes and server limits for the namespace.
Here is the logic on the which checks for Blob size limits. Can you look for the following server log to see what are payloads being passed to the server:

"Blob size exceeds limit."

Default limits are the following:

BlobSizeLimitError = 2MB
BlobSizeLimitWarn = 256KB

These are configurable per namespace, but I highly recommend to not increase them beyond this number as they directly correlate to the size of transaction on the underlying persistence store.

Another thing you can also check is look at the ActivityTaskScheduledEvent in the history which should have the input parameters used to invoke the activity and try to test the implementation of your activity with those input parameters to see if it returns large result.

Ah, I see. I’ve queried the logs and found it:

{
  "level": "warn",
  "ts": "2020-08-14T10:59:23.388Z",
  "msg": "Blob size exceeds limit.",
  "service": "frontend",
  "wf-namespace-id": "67a51123-4947-4020-8d51-f7e0b60bebd9",
  "wf-id": "b0372af3-895a-44be-af83-c527bfbd5dd5_28",
  "wf-run-id": "a74406e7-4928-4b39-9436-5f39f11b43ad",
  "wf-size": 2784428,
  "blob-size-violation-operation": "RespondActivityTaskCompleted",
  "logging-call-at": "util.go:431"
}

If wf-size is bytes, then I guess it is indeed >2MB. I can’t reproduce that though, the same activity with the same input produces only a payload of around 100 bytes if I try it again.

That part I didn’t see in the logs. But:

I didn’t touch the limits, so they will be set to the defaults. And the default limits seem to be more than we (should) need, I probably don’t want to change that.

When I re-ran it with the exact same input, it worked… So for now I can’t reproduce it and I don’t have any more info.
Still I’m very confused how it happened that one time.

Thank you for all the information. If I encounter it again and I have more debugging logs I will let you know.