sp13
January 22, 2022, 4:04am
1
Hi,
We are running temporal with mysql as backend.
We run into frequent errors when a doing a parent → child workflow especially when child workflow id here is re-used. The effect is child workflow takes very long time to get initiated.
(I tried to replicate it with a simple setup cutting out our complex business logics, but I cannot replicate the issue. Not sure what part of code is causing this)
There appears to be something wrong in the mysql query and the code which I’ll explain below.
This is the error we are getting.
AppendHistoryNodes: expected 1 row to be affected for node table, got 2","metric-scope":4,"logging-call-at":"persistenceMetricClients.go:640","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error
/temporal/common/log/zap_logger.go:143
go.temporal.io/server/common/persistence.(*executionPersistenceClient).updateErrorMetric
/temporal/common/persistence/persistenceMetricClients.go:640
go.temporal.io/server/common/persistence.(*executionPersistenceClient).CreateWorkflowExecution
/temporal/common/persistence/persistenceMetricClients.go:215
go.temporal.io/server/service/history/shard.(*ContextImpl).CreateWorkflowExecution
/temporal/service/history/shard/context_impl.go:442
go.temporal.io/server/service/history/workflow.createWorkflowExecutionWithRetry.func1
/temporal/service/history/workflow/transaction_impl.go:306
go.temporal.io/server/common/backoff.Retry.func1
/temporal/common/backoff/retry.go:104
go.temporal.io/server/common/backoff.RetryContext
/temporal/common/backoff/retry.go:125
go.temporal.io/server/common/backoff.Retry
/temporal/common/backoff/retry.go:105
go.temporal.io/server/service/history/workflow.createWorkflowExecutionWithRetry
/temporal/service/history/workflow/transaction_impl.go:310
go.temporal.io/server/service/history/workflow.(*ContextImpl).CreateWorkflowExecution
/temporal/service/history/workflow/context.go:439
go.temporal.io/server/service/history.(*historyEngineImpl).StartWorkflowExecution
/temporal/service/history/historyEngine.go:575
go.temporal.io/server/service/history.(*Handler).StartWorkflowExecution
/temporal/service/history/handler.go:551
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1
/temporal/api/historyservice/v1/service.pb.go:861
go.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept
/temporal/common/rpc/interceptor/rate_limit.go:83
google.golang.org/grpc.getChainUnaryHandler.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept
/temporal/common/rpc/interceptor/telemetry.go:108
google.golang.org/grpc.getChainUnaryHandler.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1
/temporal/common/metrics/grpc.go:113
google.golang.org/grpc.getChainUnaryHandler.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1
/temporal/common/metrics/grpc.go:66
google.golang.org/grpc.getChainUnaryHandler.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/rpc.ServiceErrorInterceptor
/temporal/common/rpc/grpc.go:131
google.golang.org/grpc.chainUnaryServerInterceptors.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1119
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler
/temporal/api/historyservice/v1/service.pb.go:863
google.golang.org/grpc.(*Server).processUnaryRPC
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1292
google.golang.org/grpc.(*Server).handleStream
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1617
google.golang.org/grpc.(*Server).serveStreams.func1.2
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:940
And the error seems to be coming from here
if rowsAffected != 1 {
return fmt.Errorf("expected 1 row to be affected for node table, got %v", rowsAffected)
}
Corresponding insert statement is this
INSERT INTO history_node (...)
ON DUPLICATE KEY UPDATE (...) (...)
This particular statement can affect 2 rows at the max. But the check above was if rowsAffected != 1
which seems to be causing the issue
I cloned temporal, changed the if check to if rowsAffected > 2
and the issue got fixed. I don’t see the errors any more.
Could this be a bug ?
Thank you for reporting this. From the first look yes it does, but please open an issue here with all the details and your fix, so server team can take a look and confirm.
sp13
January 24, 2022, 4:30pm
3
Thanks, I created one. I can probably create a PR as well if it helps, it is just a one line fix.
Thanks.
opened 04:28PM - 24 Jan 22 UTC
potential-bug
## Expected Behavior
We get frequent errors in temporal, when this happens we s… ee a considerable delay in creating child workflows.
```
AppendHistoryNodes: expected 1 row to be affected for node table, got 2","metric-scope":4,"logging-call-at":"persistenceMetricClients.go:640","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error
/temporal/common/log/zap_logger.go:143
go.temporal.io/server/common/persistence.(*executionPersistenceClient).updateErrorMetric
/temporal/common/persistence/persistenceMetricClients.go:640
go.temporal.io/server/common/persistence.(*executionPersistenceClient).CreateWorkflowExecution
/temporal/common/persistence/persistenceMetricClients.go:215
go.temporal.io/server/service/history/shard.(*ContextImpl).CreateWorkflowExecution
/temporal/service/history/shard/context_impl.go:442
go.temporal.io/server/service/history/workflow.createWorkflowExecutionWithRetry.func1
/temporal/service/history/workflow/transaction_impl.go:306
go.temporal.io/server/common/backoff.Retry.func1
/temporal/common/backoff/retry.go:104
go.temporal.io/server/common/backoff.RetryContext
/temporal/common/backoff/retry.go:125
go.temporal.io/server/common/backoff.Retry
/temporal/common/backoff/retry.go:105
go.temporal.io/server/service/history/workflow.createWorkflowExecutionWithRetry
/temporal/service/history/workflow/transaction_impl.go:310
go.temporal.io/server/service/history/workflow.(*ContextImpl).CreateWorkflowExecution
/temporal/service/history/workflow/context.go:439
go.temporal.io/server/service/history.(*historyEngineImpl).StartWorkflowExecution
/temporal/service/history/historyEngine.go:575
go.temporal.io/server/service/history.(*Handler).StartWorkflowExecution
/temporal/service/history/handler.go:551
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1
/temporal/api/historyservice/v1/service.pb.go:861
go.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept
/temporal/common/rpc/interceptor/rate_limit.go:83
google.golang.org/grpc.getChainUnaryHandler.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept
/temporal/common/rpc/interceptor/telemetry.go:108
google.golang.org/grpc.getChainUnaryHandler.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1
/temporal/common/metrics/grpc.go:113
google.golang.org/grpc.getChainUnaryHandler.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1
/temporal/common/metrics/grpc.go:66
google.golang.org/grpc.getChainUnaryHandler.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1133
go.temporal.io/server/common/rpc.ServiceErrorInterceptor
/temporal/common/rpc/grpc.go:131
google.golang.org/grpc.chainUnaryServerInterceptors.func1
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1119
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler
/temporal/api/historyservice/v1/service.pb.go:863
google.golang.org/grpc.(*Server).processUnaryRPC
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1292
google.golang.org/grpc.(*Server).handleStream
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:1617
google.golang.org/grpc.(*Server).serveStreams.func1.2
/go/pkg/mod/google.golang.org/grpc@v1.39.0/server.go:940
```
## Actual Behavior
And the error seems to be coming from [here](https://github.com/temporalio/temporal/blob/14ed6a8f720fdc675955a5692c35ce6996368239/common/persistence/sql/history_store.go#L104)
```
if rowsAffected != 1 {
return fmt.Errorf("expected 1 row to be affected for node table, got %v", rowsAffected)
}
```
Corresponding insert statement is [this](https://github.com/temporalio/temporal/blob/14ed6a8f720fdc675955a5692c35ce6996368239/common/persistence/sql/sqlplugin/mysql/events.go#L36)
```
INSERT INTO history_node (...)
ON DUPLICATE KEY UPDATE (...) (...)
```
This particular statement can affect 2 rows at the max. But the check above was `if rowsAffected != 1` which seems to be causing the issue
I cloned temporal, changed the if check to if `rowsAffected > 2` and the issue got fixed. I don’t see the errors any more.
Could this be a bug ?
## Steps to Reproduce the Problem
This is happens in our workflow code with business logic. I am not sure exactly what code is causing the issue, but I am not able to reproduce it with a simple example.
## Specifications
- Version: 1.14.3
- Platform: MacOS docker-compose
2 Likes
sp13
January 31, 2022, 3:51pm
4
The issue is fixed and merged by the temporal team. It would be great if you could release a version at your earliest convenience. Thanks.