behavior looks ok from the metrics provided above (task request per sec; task type, etc)
Hi @Wenquan_Xing Hope you are feeling better.
Ok thanks for confirming.
-
But we have run the tests for so long and only this time the delete tasks spanned way after the test was done. Is there a way to throttle it, because we have to size the cassandra accordingly as the operations are too high.
-
Also the data size just keep increasing in the temporal.history_node table. We even tried changing the gc_grace_seconds to 1 day, does it have TTL?
the temporal namespace contains a retention config
this config is used to control how long after workflow finish will temporal delete the workflow history (from executions & history_tree & history_node table)
the deletion is a timer task (so task processing) and this timer task is created the moment a workflow is finished.
to reduce the history_node table size, try to decrease the retention to 1 day
tctl --ns <namespace> namespace update -rd 1
We already have that set through code, does it have same behavior?
RegisterNamespaceRequest.newBuilder()
.setName(NAMESPACE)
.setWorkflowExecutionRetentionPeriod(Duration.newBuilder().setSeconds(86400).build())
.build();
The below screenshot depicting the linearly increasing table size and current table size, it doesnât seem to delete anything though.
are you still creating new workflows?
NOTE:
one of the pic above shows DeleteHistoryEvent task being executed at 1.6K task / second
maybe worth checking: are there any cron workflow created? or workflow with retry policy created?
Yes we do have retries on all individual activities:
RetryOptions retryoptions = RetryOptions.newBuilder()
.setInitialInterval(oneSec)
.setBackoffCoefficient(2)
.setMaximumAttempts(3)
.build();
But no retry options on when the workflow is submitted though:
WorkflowOptions workflowOptions = WorkflowOptions.newBuilder()
.setWorkflowExecutionTimeout(Duration.ofSeconds(45))
.setWorkflowTaskTimeout(Duration.ofSeconds(15))
.setTaskQueue(TASK_QUEUE)
.build();
Yeah I am still running the test, and the data size just keeps increasing. The deletes also keep happening way past the completion of the test, sometimes 4 hours to 10 hours.