Got error: corrupted history event batch, eventID is not continuous

We have Temporal 1.0.0 installed in the k8s from the official helm chart. Temporal works fine until at some moment workflow processing gets stuck and we see numerous error in clients’ logs:

"Failure in thread Workflow Poller taskQueue=“indexing-orchestrator”, namespace=“default”: 2
io.grpc.StatusRuntimeException: INTERNAL: corrupted history event batch, eventID is not continuous
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollWorkflowTaskQueue(WorkflowServiceGrpc.java:2658)
at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:77)

We had this error since in 0.28 and considered this either as instability of the alpha version or failure caused by Cassandra nodes rotation.

But currently we have this happening on 1.0.0 and a 6-node Cassandra cluster, none of which had restarted to cause this.

How is your cassandra deployed? Can you provide some details on your configuration? Are you using a Replication Factor of 3 temporal’s keyspace? How many racks in your cassandra cluster?

Other details of your configuration could help here as well.