We have Temporal 1.0.0 installed in the k8s from the official helm chart. Temporal works fine until at some moment workflow processing gets stuck and we see numerous error in clients’ logs:
"Failure in thread Workflow Poller taskQueue=“indexing-orchestrator”, namespace=“default”: 2
io.grpc.StatusRuntimeException: INTERNAL: corrupted history event batch, eventID is not continuous
We had this error since in 0.28 and considered this either as instability of the alpha version or failure caused by Cassandra nodes rotation.
But currently we have this happening on 1.0.0 and a 6-node Cassandra cluster, none of which had restarted to cause this.