Temporal in idle state generating huge read/write traffic to Cassandra

behavior looks ok from the metrics provided above (task request per sec; task type, etc)

1 Like

Hi @Wenquan_Xing Hope you are feeling better.
Ok thanks for confirming.

  1. But we have run the tests for so long and only this time the delete tasks spanned way after the test was done. Is there a way to throttle it, because we have to size the cassandra accordingly as the operations are too high.

  2. Also the data size just keep increasing in the temporal.history_node table. We even tried changing the gc_grace_seconds to 1 day, does it have TTL?

the temporal namespace contains a retention config
this config is used to control how long after workflow finish will temporal delete the workflow history (from executions & history_tree & history_node table)

the deletion is a timer task (so task processing) and this timer task is created the moment a workflow is finished.

to reduce the history_node table size, try to decrease the retention to 1 day

tctl --ns <namespace> namespace update -rd 1
1 Like

We already have that set through code, does it have same behavior?

RegisterNamespaceRequest.newBuilder()
.setName(NAMESPACE)
.setWorkflowExecutionRetentionPeriod(Duration.newBuilder().setSeconds(86400).build())
.build();

The below screenshot depicting the linearly increasing table size and current table size, it doesn’t seem to delete anything though.

Screenshot here.

are you still creating new workflows?

NOTE:
one of the pic above shows DeleteHistoryEvent task being executed at 1.6K task / second

maybe worth checking: are there any cron workflow created? or workflow with retry policy created?

1 Like

Yes we do have retries on all individual activities:

        RetryOptions retryoptions = RetryOptions.newBuilder()
                .setInitialInterval(oneSec)
                .setBackoffCoefficient(2)
                .setMaximumAttempts(3)
                .build();

But no retry options on when the workflow is submitted though:

        WorkflowOptions workflowOptions = WorkflowOptions.newBuilder()
                .setWorkflowExecutionTimeout(Duration.ofSeconds(45))
                .setWorkflowTaskTimeout(Duration.ofSeconds(15))
                .setTaskQueue(TASK_QUEUE)
                .build();

Yeah I am still running the test, and the data size just keeps increasing. The deletes also keep happening way past the completion of the test, sometimes 4 hours to 10 hours.