Temporal mysql possible bug

Hi,

We are running temporal with mysql as backend.

We run into frequent errors when a doing a parent → child workflow especially when child workflow id here is re-used. The effect is child workflow takes very long time to get initiated.

(I tried to replicate it with a simple setup cutting out our complex business logics, but I cannot replicate the issue. Not sure what part of code is causing this)

There appears to be something wrong in the mysql query and the code which I’ll explain below.

This is the error we are getting.

AppendHistoryNodes: expected 1 row to be affected for node table, got 2","metric-scope":4,"logging-call-at":"persistenceMetricClients.go:640","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error
	/temporal/common/log/zap_logger.go:143
go.temporal.io/server/common/persistence.(*executionPersistenceClient).updateErrorMetric
	/temporal/common/persistence/persistenceMetricClients.go:640
go.temporal.io/server/common/persistence.(*executionPersistenceClient).CreateWorkflowExecution
	/temporal/common/persistence/persistenceMetricClients.go:215
go.temporal.io/server/service/history/shard.(*ContextImpl).CreateWorkflowExecution
	/temporal/service/history/shard/context_impl.go:442
go.temporal.io/server/service/history/workflow.createWorkflowExecutionWithRetry.func1
	/temporal/service/history/workflow/transaction_impl.go:306
go.temporal.io/server/common/backoff.Retry.func1
	/temporal/common/backoff/retry.go:104
go.temporal.io/server/common/backoff.RetryContext
	/temporal/common/backoff/retry.go:125
go.temporal.io/server/common/backoff.Retry
	/temporal/common/backoff/retry.go:105
go.temporal.io/server/service/history/workflow.createWorkflowExecutionWithRetry
	/temporal/service/history/workflow/transaction_impl.go:310
go.temporal.io/server/service/history/workflow.(*ContextImpl).CreateWorkflowExecution
	/temporal/service/history/workflow/context.go:439
go.temporal.io/server/service/history.(*historyEngineImpl).StartWorkflowExecution
	/temporal/service/history/historyEngine.go:575
go.temporal.io/server/service/history.(*Handler).StartWorkflowExecution
	/temporal/service/history/handler.go:551
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1
	/temporal/api/historyservice/v1/service.pb.go:861
go.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept
	/temporal/common/rpc/interceptor/rate_limit.go:83
google.golang.org/grpc.getChainUnaryHandler.func1
	/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept
	/temporal/common/rpc/interceptor/telemetry.go:108
google.golang.org/grpc.getChainUnaryHandler.func1
	/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1
	/temporal/common/metrics/grpc.go:113
google.golang.org/grpc.getChainUnaryHandler.func1
	/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1
	/temporal/common/metrics/grpc.go:66
google.golang.org/grpc.getChainUnaryHandler.func1
	/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/rpc.ServiceErrorInterceptor
	/temporal/common/rpc/grpc.go:131
google.golang.org/grpc.chainUnaryServerInterceptors.func1
	/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1119
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler
	/temporal/api/historyservice/v1/service.pb.go:863
google.golang.org/grpc.(*Server).processUnaryRPC
	/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1292
google.golang.org/grpc.(*Server).handleStream
	/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1617
google.golang.org/grpc.(*Server).serveStreams.func1.2
	/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:940

And the error seems to be coming from here

if rowsAffected != 1 {
    return fmt.Errorf("expected 1 row to be affected for node table, got %v", rowsAffected)
}

Corresponding insert statement is this

		INSERT INTO history_node (...)
		ON DUPLICATE KEY UPDATE (...) (...)

This particular statement can affect 2 rows at the max. But the check above was if rowsAffected != 1 which seems to be causing the issue

I cloned temporal, changed the if check to if rowsAffected > 2 and the issue got fixed. I don’t see the errors any more.

Could this be a bug ?

Thank you for reporting this. From the first look yes it does, but please open an issue here with all the details and your fix, so server team can take a look and confirm.

Thanks, I created one. I can probably create a PR as well if it helps, it is just a one line fix.
Thanks.

2 Likes

The issue is fixed and merged by the temporal team. It would be great if you could release a version at your earliest convenience. Thanks.