Hi Temporal Community,
We observed a burst of workflow. Each workflow has a lot of activities. A total number of workflows at a given data point were not very high ~39 and the total activities at that point were ~79200.
We started observing a lot of errors in our server as a result. All the request to history client failed at this point.
To me the problematic errors are
error type 1:
{"level":"info","ts":"2023-04-28T23:13:58.725Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metric_client.go:90"}
error type 2 :
{"level":"info","ts":"2023-04-28T23:14:32.542Z","msg":"Activity task not found","service":"matching","component":"matching-engine","wf-namespace-id":"a13de8cb-9392-4914-bb77-656159403a7f","wf-id":"83ec3b13-8ce5-4a5c-8486-9fbeb4d45976","wf-run-id":"fd1e3d61-b610-4581-a23a-fe2ecc52ff56","wf-task-queue-name":"/_sys/TestCallTreeCreation/3","queue-task-id":2950492,"queue-task-visibility-timestamp":"2023-04-28T23:14:30.947Z","wf-history-event-id":2161,"error":"invalid activityID or activity already timed out or invoking workflow is completed","logging-call-at":"matchingEngine.go:502"}
error type 3:
{"level":"info","ts":"2023-04-28T23:15:39.214Z","msg":"Workflow task not found","service":"matching","component":"matching-engine","wf-task-queue-name":"90@ltx1-app4600.prod.linkedin.com:99b943d2-9aa7-49f9-9b64-c15a3eb48bbc","wf-namespace-id":"a13de8cb-9392-4914-bb77-656159403a7f","wf-id":"83ec3b13-8ce5-4a5c-8486-9fbeb4d45976","wf-run-id":"fd1e3d61-b610-4581-a23a-fe2ecc52ff56","wf-task-queue-name":"90@ltx1-app4600.prod.linkedin.com:99b943d2-9aa7-49f9-9b64-c15a3eb48bbc","queue-task-id":-137,"queue-task-visibility-timestamp":"2023-04-28T23:15:39.209Z","wf-history-event-id":12014,"error":"Workflow task not found.","logging-call-at":"matchingEngine.go:423"}
There were also errors of the type but these look benign and were also present in workflows that passed
{"level":"warn","ts":"2023-04-28T23:12:01.439Z","msg":"history size exceeds warn limit.","shard-id":453,"address":"10.154.98.59:7234","component":"history-cache","wf-namespace-id":"a13de8cb-9392-4914-bb77-656159403a7f","wf-id":"34dbae70-d99a-44f3-863e-7773dbd29934","wf-run-id":"b951ff2b-4481-40b9-b82b-7b15b63062c2","wf-history-size":3043550,"wf-event-count":17197,"logging-call-at":"context.go:915"}