I’m trying to resolve an issue where read queries to the visibility_tasks table (in temporal db) are failing because the queries would return too many rows. I’m trying to evaluate the impact of purging old rows from the table. I couldn’t find any information on the visibility_tasks table in the docs or this forum. Any context would be appreciated.
visibility_task
is internal concept similar to transfer_task
or timer_task
. Basically they are used to update visibility store every time when new workflow starts, closes, or call UpdateSearchAttributes
API. Also visibility_task
is created when retention deletes workflow from main database and visibility store as well. These tasks are also short lived and if visibility task processor (which lives inside history service) works fine they shouldn’t be accumulated.
So I would have 2 questions here:
- How do you end up with too many rows in
visibility_taks
table. - How do you know if those rows are old?
Hey Alex, this question was pretty bad, I completely misread an error trace.
The number of rows in the visibility_tasks table or the age of the rows was never the issue.
These are the errors I was seeing in my history logs -
{"level":"error","ts":"2022-06-29T19:14:17.080Z","msg":"Operation failed with internal error.","service":"history","error":"GetVisibilityTasks operation failed. Select failed. Error: Error 10001: target: temporal.55-aa.primary: vttablet: rpc error: code = Aborted desc = Row count exceeded 15000 (errno 10001) (sqlstate HY000) (CallerID: temporal): Sql: \"select task_id,
data, data_encoding from visibility_tasks where shard_id = :v1 and task_id > :v2 and task_id <= :v3 order by shard_id asc, task_id asc\", BindVars: {#maxLimit: \"type:INT64 value:\\\"15001\\\"\"v1: \"type:INT64 value:\\\"1145\\\"\"v2: \"type:INT64 value:\\\"870337237\\\"\"v3: \"type:INT64 value:\\\"931136076\\\"\"}","metric-scope":19,"shard-id":1145,"logging-call-at":"persistenceMetricClients.go:676","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/common/persistence.(*workflowExecutionPersistenceClient).updateErrorMetric\n\t/temporal/common/persistence/persistenceMetricClients.go:676\ngo.temporal.io/server/common/persistence.(*workflowExecutionPersistenceClient).GetVisibilityTasks\n\t/temporal/common/persistence/persistenceMetricClients.go:419\ngo.temporal.io/server/service/history.(*visibilityQueueProcessorImpl).readTasks\n\t/temporal/service/history/visibilityQueueProcessor.go:320\ngo.temporal.io/server/service/history.(*queueAckMgrImpl).readQueueTasks.func1\n\t/temporal/service/history/queueAckMgr.go:107\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*queueAckMgrImpl).readQueueTasks\n\t/temporal/service/history/queueAckMgr.go:111\ngo.temporal.io/server/service/history.(*queueProcessorBase).processBatch\n\t/temporal/service/history/queueProcessor.go:263\ngo.temporal.io/server/service/history.(*queueProcessorBase).processorPump\n\t/temporal/service/history/queueProcessor.go:220"}
It’s not the number of rows in the table that’s an issue, it’s the number of rows in a query’s result.
It’s because our SQL db has a restriction that a response to a query can’t exceed 15k rows.
While I can request an exception to that cap, is there a way I can tweak some configs to get the history service to never pull more than a certain number of visibility_tasks at once?