Mismatch between Temporal UI workflow counts and official Grafana dashboard metrics

Hi Team,

We are using Temporal Cloud and have installed the official Grafana dashboard for Temporal ( dashboards/cloud/temporal_cloud.json at master · temporalio/dashboards · GitHub ).

While comparing the Grafana dashboard metrics with the values shown in the Temporal UI, we are noticing significant discrepancies.
For example, for a given time range (24 Nov, 11:00 AM – 12:30 PM):

  • Temporal UI shows:

    • X Completed

    • Y Running

    • Z Continued As New

    • etc.

But the official Grafana dashboard panels (which use metrics like temporal_cloud_v0_workflow_success_count, temporal_cloud_v0_workflow_failed_count, temporal_cloud_v0_workflow_terminate_count, temporal_cloud_v0_workflow_continued_as_new_count, etc.) show very different numbers.
Even after aligning the time range, the counts from Prometheus do not match what the UI reports.

Based on our investigation, it appears that:

  • The Temporal UI is showing the number of workflows in each final state.

  • The Grafana dashboard is showing the number of state transition events, where counters can increment multiple times for a single workflow (e.g., retries, continues-as-new, failures before eventual success).

Because of this, a single workflow may generate multiple increments in Prometheus metrics, so the values can’t match 1:1 with the UI.

My questions:

  1. Is this interpretation correct?

  2. Is the mismatch expected for the official Grafana dashboard?

  3. Is there any recommended way to get UI-equivalent workflow counts (based on Visibility state) exposed as Prometheus metrics?

We want to verify if our dashboards are behaving correctly, or if something needs to be configured differently.

Thanks in advance!

Data you look at in the UI is visibility data. This data is bound by your namespace retention period, so for example number of completed executions would show executions that have completed within time X where X is your namespace retention period, for example set to 10 days (iirs max is 90 days on cloud)

The Grafana dashboard is showing the number of state transition events, where counters can increment multiple times for a single workflow

Not sure this is correct as these are server metrics. Your SDK (worker) metrics could be larger as sdk worker requests for example completion of a workflow execution which server can under some conditions not accept, but what you are showing are server metrics, and server would fail for example an execution just once (it has final say).
My best guess is that the time range for metrics might be different than that of namespace retention, and maybe you are looking at accumulation? whats the metrics queries you are using for these counters?

Hi @tihomir, thanks for the response.

To clarify, the time ranges I’m using in Grafana and in the Temporal UI are exactly the same.
Here is one example with concrete numbers for the range:

24 Nov, 08:00 AM → 12:00 PM

Temporal UI values for that window (Visibility API):

  • Completed: X

  • Terminated: Y

  • ContinuedAsNew: Z

Grafana (Prometheus) values for the same window:

Query:

sum(increase(temporal_cloud_v0_workflow_success_count{temporal_namespace="xyz"}[$__range]))

This returns: 9.17k

Temporal UI Completed in the same range: X (significantly different)

Another example:

sum(increase(temporal_cloud_v0_workflow_terminate_count{temporal_namespace="xyz"}[$__range]))

Grafana result: 886
Temporal UI Terminated in same window: 35

So even though both UI and Grafana are using the same timestamp window, the numbers are still far apart.

I’m trying to determine whether:

  1. This discrepancy is expected (UI using Visibility store vs metrics coming from server counters),

  2. Or whether our setup is missing some configuration needed to get UI-equivalent counts.

Thanks again for your help!