Temporal in idle state generating huge read/write traffic to Cassandra

Wenquan_Xing · July 20, 2021, 12:44am

behavior looks ok from the metrics provided above (task request per sec; task type, etc)

ramyamagham · July 20, 2021, 5:08am

Hi @Wenquan_Xing Hope you are feeling better.
Ok thanks for confirming.

But we have run the tests for so long and only this time the delete tasks spanned way after the test was done. Is there a way to throttle it, because we have to size the cassandra accordingly as the operations are too high.
Also the data size just keep increasing in the temporal.history_node table. We even tried changing the gc_grace_seconds to 1 day, does it have TTL?

Wenquan_Xing · July 21, 2021, 12:33am

the temporal namespace contains a retention config
this config is used to control how long after workflow finish will temporal delete the workflow history (from executions & history_tree & history_node table)

the deletion is a timer task (so task processing) and this timer task is created the moment a workflow is finished.

to reduce the history_node table size, try to decrease the retention to 1 day

tctl --ns <namespace> namespace update -rd 1

ramyamagham · July 21, 2021, 5:42am

We already have that set through code, does it have same behavior?

RegisterNamespaceRequest.newBuilder()
.setName(NAMESPACE)
.setWorkflowExecutionRetentionPeriod(Duration.newBuilder().setSeconds(86400).build())
.build();

The below screenshot depicting the linearly increasing table size and current table size, it doesn’t seem to delete anything though.

ramyamagham · July 21, 2021, 5:42am

Screenshot here.

Wenquan_Xing · July 21, 2021, 7:29am

are you still creating new workflows?

NOTE:
one of the pic above shows DeleteHistoryEvent task being executed at 1.6K task / second

maybe worth checking: are there any cron workflow created? or workflow with retry policy created?

ramyamagham · July 21, 2021, 6:59pm

Yes we do have retries on all individual activities:

        RetryOptions retryoptions = RetryOptions.newBuilder()
                .setInitialInterval(oneSec)
                .setBackoffCoefficient(2)
                .setMaximumAttempts(3)
                .build();

But no retry options on when the workflow is submitted though:

        WorkflowOptions workflowOptions = WorkflowOptions.newBuilder()
                .setWorkflowExecutionTimeout(Duration.ofSeconds(45))
                .setWorkflowTaskTimeout(Duration.ofSeconds(15))
                .setTaskQueue(TASK_QUEUE)
                .build();

Yeah I am still running the test, and the data size just keeps increasing. The deletes also keep happening way past the completion of the test, sometimes 4 hours to 10 hours.

Topic		Replies	Views
Workflows Stuck in Running Mode for Several Days Community Support java-sdk , cassandra	2	944	August 24, 2021
Large number of database calls by temporal to datastax cassandra Community Support	1	405	January 21, 2023
Temporal Performance with golang microservices Community Support go-sdk , mysql , cassandra	9	1759	August 7, 2022
Workflow Performance with Java SDK Community Support java-sdk	1	740	February 20, 2023
Local Activity time limits Community Support general-impl	1	953	June 2, 2022

Temporal in idle state generating huge read/write traffic to Cassandra

Related topics