State transition builds up and suddenly drops

We are doing capacity planning. We are finding our target state transition per second to estimate resources. However, we notice a weird pattern of state transition in the current setup. The Datadog metric we are reading from is temporal.server.state_transition.count

We have around 1000 workflows running per 30 seconds. In each workflow, 4 child workflows are running. After each workflow run, it will sleep for 30 seconds and start again with continue-as-new.

Do you know why the state transition graph builds up and suddenly drops like this?

Can you compare graph with that of persistence request for create and update workflow exec (which is same thing really).
Grafana query would look like:

sum(rate(persistence_requests{operation=~"CreateWorkflowExecution|UpdateWorkflowExecution"}[1m]))

Would also try see if dips match possible service host restarts, service emits restarts metric