We have a Temporal IO server deployed in our Kubernetes cluster and in our application, we have a workflow with 3 activities running, we have seen that recently we are receiving a lot of error messages related to the activity.
Activity not found on completion. This may happen if the activity has already been cancelled but completed anyway. message: "invalid activityID or activity already timed out or invoking workflow is completed", details: b"\x08\x05\x12Rinvalid activityID or activity already timed out or invoking workflow is completed\x1aB\n@type.googleapis.com/temporal.api.errordetails.v1.NotFoundFailure"
Does this mean that the activity was tried multiple times and then it failed.
Also more recently we are seeing multiple disconnects happening between the application and the temporal server with error. with deadline exceeded to connect. Any specific configuration requried for the same.
The most probable cause for this error is that the activity worker reports completion status after the activity task timed out on the server side. An activity timeout does not forcibly stops execution of the activity code.
For example, let’s suppose that activity SomeVeryLongActivity is scheduled with a start-to-close timeout of 10 seconds, but sometime takes more than 20 seconds to complete. In that case, the activity code will continue executing after the 10 seconds timeout, but when the activity’s completion gets reported (after 20 seconds), then the server will refuse it, as the activity task token is not longer valid.
Possible solutions to this are:
Increase the timeout constraints on your activities (start-to-close and/or schedule-to-close)