Attaching custom tags workflow metrics

Hi I tried something similar to whats suggested in this post

and as pointed out in the orignial post, i donot see these metrics
I tried these with both 0.28.0 and 0.29.0. (java sdk and server)

Upon taking a deeper look, i see that the tags to get added into metric scope, and they are available at the time of workflow completion (i.e. in the completeWorkflow method of ReplayWorkflowExecutor) and inside ScopeImpl (tally). (refer screenshot)

I see that the ReplayWorkflowExecutor calls metricsScope.counter(MetricsType.WORKFLOW_COMPLETED_COUNTER).inc(1); in line 105

However, i dont see this metrics in prometheus server. e.g.

Also, i see a metric “workflow_success” in prometheus server,

but this is not listed in MetricsTypes.java. Not sure where is workflow_success getting populated from. (does it happen from temporal_frontend or temporal_worker ?

Since the “temporal_**” metrics are not available the Temporal dashboard also looks pretty much empty

Not sure why java sdk is not reporting these metrics back correctly. is this a bug?

Also my metric reporter code look something like this

 StatsReporter reporter = new MicrometerClientStatsReporter(new SimpleMeterRegistry());
  
  Map<String, String> map = ImmutableMap.of("Usecase", "Greeting");
  
  
  
    Scope scope = new
    RootScopeBuilder().reporter(reporter).
    reportEvery(com.uber.m3.util.Duration.ofSeconds(10)).tagged(map);

WorkflowServiceStubsOptions options = WorkflowServiceStubsOptions.newBuilder().setMetricsScope(scope).build();

Not sure where things are getting dropped.

1 Like

i tried with a go sdk (greetings and helloworld) examples.
but that too did not make any difference , i still see the temporal dashboards in grafana were still empty.

It will really be useful, if there is an example of custom metric/tag etc.

1 Like

What i am not sure is how this SimpleMeterRegistry works,
should it have the endpoint of prometheus server and push the metrics directly?
or is that the metric scopes gets passed to temporal server (frontend/history) as part of the grpc payload and those component pushes the metrics/tags to prometheus.

ok, i replaced the StatsReporter with PrintStatsReporter() from uber tally example

So that i could print stuff in console, and i see only timers are getting called, the counters etc are not getting called.
I think in prometeus too only the below metrics are getting publised.

  Flushed
19:46:35.686 [main] INFO  i.t.i.grpc.WorkflowServiceStubsImpl - Created GRPC client for channel: ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=1, target=127.0.0.1:7233}}
19:46:35.932 [main] INFO  io.temporal.internal.worker.Poller - start(): Poller{options=PollerOptions{maximumPollRateIntervalMilliseconds=1000, maximumPollRatePerSecond=0.0, pollBackoffCoefficient=2.0, pollBackoffInitialInterval=PT0.1S, pollBackoffMaximumInterval=PT1M, pollThreadCount=2, pollThreadNamePrefix='Workflow Poller taskQueue="HelloActivity2", namespace="default"'}, identity=40168@G4FSLQ2}
19:46:35.934 [main] INFO  io.temporal.internal.worker.Poller - start(): Poller{options=PollerOptions{maximumPollRateIntervalMilliseconds=1000, maximumPollRatePerSecond=0.0, pollBackoffCoefficient=2.0, pollBackoffInitialInterval=PT0.1S, pollBackoffMaximumInterval=PT1M, pollThreadCount=1, pollThreadNamePrefix='Local Activity Poller taskQueue="HelloActivity2", namespace="default"'}, identity=40168@G4FSLQ2}
19:46:35.936 [main] INFO  io.temporal.internal.worker.Poller - start(): Poller{options=PollerOptions{maximumPollRateIntervalMilliseconds=1000, maximumPollRatePerSecond=0.0, pollBackoffCoefficient=2.0, pollBackoffInitialInterval=PT0.1S, pollBackoffMaximumInterval=PT1M, pollThreadCount=5, pollThreadNamePrefix='Activity Poller taskQueue="HelloActivity2", namespace="default"'}, identity=40168@G4FSLQ2}
19:46:35.936 [main] INFO  io.temporal.internal.worker.Poller - start(): Poller{options=PollerOptions{maximumPollRateIntervalMilliseconds=1000, maximumPollRatePerSecond=0.0, pollBackoffCoefficient=2.0, pollBackoffInitialInterval=PT0.1S, pollBackoffMaximumInterval=PT1M, pollThreadCount=5, pollThreadNamePrefix='Host Local Workflow Poller'}, identity=77372c64-166a-44dd-91d0-f82881dd252c}
TimerImpl temporal_request_latency: 1.407889s
TimerImpl temporal_long_request_latency: 1.596152801s
TimerImpl temporal_workflow_task_schedule_to_start_latency: 6ms
TimerImpl temporal_request_latency: 22.6256ms
TimerImpl temporal_workflow_task_replay_latency: 14.006901ms
TimerImpl temporal_workflow_task_execution_latency: 148.963201ms
TimerImpl temporal_request_latency: 9.1853ms
TimerImpl temporal_workflow_task_execution_total_latency: 160.273001ms
TimerImpl temporal_long_request_latency: 1.765705599s
TimerImpl temporal_activity_schedule_to_start_latency: 7ms
TimerImpl temporal_activity_schedule_to_start_latency: 7ms
TimerImpl temporal_activity_execution_latency: 5.434201ms
TimerImpl temporal_request_latency: 42.7891ms
TimerImpl temporal_activity_endtoend_latency: 40ms
TimerImpl temporal_long_request_latency: 1.8299811s
TimerImpl temporal_workflow_task_schedule_to_start_latency: 6ms
TimerImpl temporal_workflow_task_replay_latency: 1.012ms
TimerImpl temporal_workflow_endtoend_latency: 254ms
TimerImpl temporal_workflow_task_execution_latency: 7.860499ms
TimerImpl temporal_request_latency: 42.438199ms
TimerImpl temporal_request_latency: 43.4042ms
TimerImpl temporal_workflow_task_execution_total_latency: 50.8911ms
TimerImpl temporal_long_request_latency: 303.166ms
Hello World!

Also, inside the Metric scope object i see only these two counters initialized

temporal_sticky_cache_hit=com.uber.m3.tally.CounterImpl@6d9c7d6a,

temporal_sticky_cache_total_forced_eviction=com.uber.m3.tally.CounterImpl@4f62e365}

counters like
MetricsType.WORKFLOW_COMPLETED_COUNTER,
MetricsType.WORKFLOW_CONTINUE_AS_NEW_COUNTER etc are not present in metricscope?

Not sure if i am heading in right direction!

1 Like

Have you considered using PrometheusMeterRegistry?

1 Like

Thanks for the pointer @maxim,
yes with spring injected PrometheusMeterRegistry, i am able to see the temporal_* stuff in my actuator/prometheus endpoint.
this is cool.

However, i am still not clear on how can i have these info passed on the grafana ( especially to the Temporal Dashboard)

I am interested in metrics like Workflow success By Workflowtype/Namespace etc.

should the prometheus server be configured to fetch data from the spring actuator, I know this is not a temporal question really, but still things are not clear in my head.

Is this understanding correct , about flow of data correct?

Temporal workflow impl(jdk/spring) ----pushes data to-----> PrometheusMeterRegistry <---- pulls data from registry------ Prometheus Server <---- pulls data from prometheus server------Grafana Dashboard (Temporal Dashboard)

After configuring PrometheusMeterRegistry

I see below metrics

temporal_poller_start_total{Namespace="mynamespace",TaskQueue="MyQ",} 8.0
temporal_workflow_task_replay_latency_seconds_count{Namespace="mynamespace",TaskQueue="MyQ",WorkflowType="Sample1",} 3.0

but i see an additional comma and the custom tags are still missing.
Is this same as what is being discussed in a similar thread @manu ?

In Prometheus formats this extra coma always there. So this is not an issue.

ok,sorry i am not a prometheus expert, did not know the comma is alway there, but yet, the tags seems to be missing, however the tags are actually there in the metric scope if you look at my previous comment on printreporter example.

I was more curious to know what the actual bug is which @manu was talking about :slight_smile:

1 Like