How to understand workflow task metrics

Good day! During testing we measure workflow metrics such as:

  • rate of Started Tasks
  • rate of Completed Tasks
  • rate of Scheduled Tasks
    I thought when system can handle load, all of these metrics should be equal. But Scheduled Tasks’ rate is always larger than Completed and Started. So can you explain this behavior and what shall we see on our
    dashboard when the system that can handle some load without problems?
1 Like

can you provide the metrics below?


and persistence_requests for CreateTask operation?

Thank You for answer! I’ll attach here necessary metrics within the same time range.
By the way can You tell me, what is the difference between poll_success and poll_success_sync.

Hey @Roman,
Based on the graphs it looks like you don’t have enough pollers to keep up with the load. Can you bring up more workers and see if that helps. The metric poll_success_sync is the counter which tells how many sync matches are happening on the matching engine. This is a special optimization which allows us to dispatch a task to the poller without writing to the database. If this is lower than poll_success it typically means you don’t have enough concurrent pollers.
Another thing is you might be running into scalability bottleneck on number of TaskQueue partitions. By default we create 4 partitions for a TaskQueue. For a throughput of 2.5k tasks per second I recommend you should have atleast 10 partitions (250 tasks per partition). Here are the dynamic config knobs which controls number of partitions.