Randomly Start workflow operation fails with internal error

We are using Temporal with the Java SDK on the client side and are encountering random errors when attempting to start workflows. The exception logs are unclear, and our review of the Temporal logs has not provided additional insights.
Here is the exception we are encountering.


 io.temporal.client.WorkflowStubImpl.wrapStartException(WorkflowStubImpl.java:455)Caused by: io.grpc.StatusRuntimeException: INTERNAL: unexpected error
 io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
 io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
 io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
 io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.startWorkflowExecution(WorkflowServiceGrpc.java:3873)
 io.temporal.internal.client.external.GenericWorkflowClientImpl.lambda$start$1(GenericWorkflowClientImpl.java:74)
 io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:69)
 io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:60)
 io.temporal.internal.client.external.GenericWorkflowClientImpl.start(GenericWorkflowClientImpl.java:69)
 io.temporal.internal.client.RootWorkflowClientInvoker.start(RootWorkflowClientInvoker.java:101)
 io.temporal.client.WorkflowStubImpl.startWithOptions(WorkflowStubImpl.java:104)
 io.temporal.client.WorkflowStubImpl.start(WorkflowStubImpl.java:129)
 io.temporal.client.WorkflowInvocationHandler$StartWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:239)
 io.temporal.client.WorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:175)
 jdk.proxy2/jdk.proxy2.$Proxy183.trigger(Unknown Source)
 io.temporal.client.WorkflowClientInternalImpl.lambda$start$431e99ec$1(WorkflowClientInternalImpl.java:369)
 io.temporal.client.WorkflowClientInternalImpl.start(WorkflowClientInternalImpl.java:361)
 io.temporal.client.WorkflowClientInternalImpl.start(WorkflowClientInternalImpl.java:369)
 io.temporal.client.WorkflowClient.start(WorkflowClient.java:369)

Would start by looking at your service error logs

sum(rate(service_error_with_type[5m])) by (error_type)

to see which service(s) are reporting serviceerror.Internal related to StartWorkflowExecution api

would also start looking at your persistence errors around same times (server metric persistence_error_with_type) as most of times have seen this come up was related to some db issue(s)