How to enable Metrics Support using JAVA SDK

Experts,

Can someone share the details on how can we get out of the box metric support in temporal for the java SDK.

I don’t see any documentation on how to enable those metric .

Thanks in advance

Temporal Java SDK reports metrics using Uber Tally interface. You can use MicrometerClientStatsReporter implementation of the Tally StatsReporter to adapt it to Prometheus and other backends supported by Micrometer. Or you can implement your own Reporter for unsupported backends.

Use WorkflowServiceStubsOptions.Builder.setMetricsScope to set the metrics scope with the SDK.

      StatsReporter reporter = new MicrometerClientStatsReporter(registry);
      Scope scope = new RootScopeBuilder()
              .reporter(reporter)
              .reportEvery(com.uber.m3.util.Duration.ofSeconds(10));
      WorkflowServiceStubsOptions options = WorkflowServiceStubsOptions.newBuilder().setMetricsScope(scope).build();
      WorkflowServiceStubs stubs = WorkflowServiceStubs.newInstance(options);
      WorkflowClient client = WorkflowClient.newInstance(stubs);

Thanks @maxim,

I tried what you have suggested but i am getting error on HelloActivity.java.

Following are the steps i performed.

  1. Steps as mentioned in https://github.com/temporalio/helm-charts
    1.a > TemporalImage\v28\helm-charts-0.28.0> helm dependencies update
    1.b> i disabled the ElasticSearch and Kafka in value.yaml file
    1.c> made the cluster size of cassandra as 1
    1.d> TemporalImage\v28\helm-charts-0.28.0> helm install testhelm . --timeout 900s
    were it was successfully deployed
    1.e> did some port forwarding:
    1.e.1> kubectl port-forward services/testhelm-temporal-frontend-headless 7233:7233
    1.e.2>kubectl port-forward services/testhelm-temporal-web 8088:8088
    1.e.3>kubectl port-forward services/testhelm-prometheus-server 9292:80
    1.e.4>kubectl port-forward services/ravihelm-grafana 9393:80

Then i tried running the HelloActivity.java with your changes


public static void main(String args) {
// gRPC stubs wrapper that talks to the local docker instance of temporal service.

// client that can be used to start and signal workflows

StatsReporter reporter = new MicrometerClientStatsReporter((new SimpleMeterRegistry()));
Scope scope =
    new RootScopeBuilder()
        .reporter(reporter)
        .reportEvery(com.uber.m3.util.Duration.ofSeconds(10));
WorkflowServiceStubsOptions options =
    WorkflowServiceStubsOptions.newBuilder().setMetricsScope(scope).build();
WorkflowServiceStubs stubs = WorkflowServiceStubs.newInstance(options);
// WorkflowClient client2 = WorkflowClient.newInstance(stubs);

// WorkflowServiceStubs service = WorkflowServiceStubs.newInstance();
WorkflowClient client = WorkflowClient.newInstance(stubs);

// worker factory that can be used to create workers for specific task queues
WorkerFactory factory = WorkerFactory.newInstance(client);
// Worker that listens on a task queue and hosts both workflow and activity implementations.
Worker worker = factory.newWorker(TASK_QUEUE);
// Workflows are stateful. So you need a type to create instances.
worker.registerWorkflowImplementationTypes(GreetingWorkflowImpl.class);
// Activities are stateless and thread safe. So a shared instance is used.
worker.registerActivitiesImplementations(new GreetingActivitiesImpl());
// Start listening to the workflow and activity task queues.
factory.start();

// Start a workflow execution. Usually this is done from another program.
// Uses task queue from the GreetingWorkflow @WorkflowMethod annotation.
GreetingWorkflow workflow =
    client.newWorkflowStub(
        GreetingWorkflow.class, WorkflowOptions.newBuilder().setTaskQueue(TASK_QUEUE).build());
// Execute a workflow waiting for it to complete. See {@link
// io.temporal.samples.hello.HelloSignal}
// for an example of starting workflow without waiting synchronously for its result.
String greeting = workflow.getGreeting("World");
System.out.println(greeting);
System.out.println("Reporter->" + reporter);

System.exit(0);

}
}

But i am getting the following exception
12:32:34.816 [Workflow Poller taskQueue=“HelloActivity”, namespace=“default”: 1] ERROR io.temporal.internal.worker.Poller - Failure in thread Workflow Poller taskQueue=“HelloActivity”, namespace=“default”: 1
io.grpc.StatusRuntimeException: NOT_FOUND: Namespace default does not exist.
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:244)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:225)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:142)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollWorkflowTaskQueue(WorkflowServiceGrpc.java:2658)
at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:77)
at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:37)
at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:273)
at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

and i cant view these in my temporal localhost:8088. I am able to bring up the graph of grafana.

If i run the same sample with your changes when i directly start the temporal from the docker-compose up command then this is successful.

I am not sure why .

i wanted to see the metrics on my Grafana but the graph is not reflecting anything . I tried running https://github.com/temporalio/helm-charts#running-temporal-cli-from-the-admin-tools-container to resolve the namespace issue but the command failed

PS \TemporalImage\v28\helm-charts-0.28.0> kubectl exec -it helmtest-temporal-admintools-58fdf949fc-78dzg /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] – [COMMAND] instead.
bash-5.0# tctl namespace list
Error: Error when list namespaces info
Error Details: context deadline exceeded
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)
bash-5.0# tctl namespace list
Error: Error when list namespaces info
Error Details: last connection error: connection error: desc = “transport: Error while dialing dial tcp: lookup helmtest-frontend on 10.96.0.10:53: no such host”
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)
bash-5.0# tctl --namespace nonesuch namespace desc
Error: Operation DescribeNamespace failed.
Error Details: last connection error: connection error: desc = “transport: Error while dialing dial tcp: lookup helmtest-frontend on 10.96.0.10:53: no such host”
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)
bash-5.0# tctl --namespace nonesuch namespace desc
Error: Operation DescribeNamespace failed.
Error Details: last connection error: connection error: desc = “transport: Error while dialing dial tcp: lookup helmtest-frontend on 10.96.0.10:53: no such host”
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)
bash-5.0# tctl --namespace nonesuch namespace desc
Error: Operation DescribeNamespace failed.
Error Details: last connection error: connection error: desc = “transport: Error while dialing dial tcp: lookup helmtest-frontend on 10.96.0.10:53: no such host”
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)

Thanks in advance
**************************

I believe there was a known problem of helm chart not registering the default namespace. @markmark was it already fixed?

In the meantime register it using temporal CLI:

tctl -ns default namespace register

Hi @maxim,

I tried this command after repeating all the above steps but it also has failed

PS C:\Ravi\Software\TemporalImage\v28\helm-charts-0.28.0> kubectl exec -it ravihelm-temporal-admintools-58fdf949fc-67kqk /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] – [COMMAND] instead.
bash-5.0# tctl -ns default namespace register
Error: Register namespace operation failed.
Error Details: context deadline exceeded
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)
bash-5.0# tctl --namespace default namespace re
Error: Register namespace operation failed.
Error Details: context deadline exceeded
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)
bash-5.0#

When i try to list i can view the namespace
PS C:\Ravi\Software\TemporalImage\v28\helm-charts-0.28.0> helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
ravihelm default 1 2020-08-12 15:15:33.5666403 +0530 IST deployed temporal-0.2.0 0.28.0

Try adding --address frontendaddress:port parameter.

./tctl --address 127.0.0.1:7233 -ns MyNamespace namespace register

In your case change the address to the frontend DNS/IP address.

if you are running it from your pod you could also try

./tctl --address yourfrontendpod-headless:7233 -ns …

1 Like

Thanks now the program got executed successfully after the default namespace registration. Unfortunately these stuff are not mentioned in the documentation which will be of great help for new beginner .

Thanks all

1 Like

Hi @techFool, apologies for the not-so great initial experience, and thank you so much for bringing this to our attention!

We are treating this issue as a bug and will address this in the near-ish future.

Thank you!
Mark.

PS @techFool, just to double-check, running the sequence described here

works in our test pipelines, so I am curious why it didn’t seem to work in your test. I see that you installed with a different helm installation name (although I am not sure if it’s helmtest or testhelm, I see references to both). So I will play with this a bit, to see if we have a bug in our helm charts around deployment names. Thank you for bringing this to my attention!

Hi @markmark,

i was running HelloActivity example to view the Metrics. After all the configuration the HelloActivity was failing as default namespace was not present. The command mentioned in the doc was also not working. The afetr suggestion from @maxim and @madhu the command was able to create the default namespace. After which the HelloActivity was getting deployed successfully.

1 Like

@maxim I am trying to get Prometheus metrics working for our JavaSDK workers using your snippet:

      StatsReporter reporter = new MicrometerClientStatsReporter(registry);
      Scope scope = new RootScopeBuilder()
              .reporter(reporter)
              .reportEvery(com.uber.m3.util.Duration.ofSeconds(10));
      WorkflowServiceStubsOptions options = WorkflowServiceStubsOptions.newBuilder().setMetricsScope(scope).build();
      WorkflowServiceStubs stubs = WorkflowServiceStubs.newInstance(options);
      WorkflowClient client = WorkflowClient.newInstance(stubs);

And I am using a registry created with:

    registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);

This is resulting in the following stack trace every time we try to run a workflow. Any pointers would be greatly appreciated:

Oct 20, 2020 8:48:56 AM io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@7f1b896b
java.lang.IllegalArgumentException: Prometheus requires that all meters with the same name have the same set of tag keys. There is already an existing meter named 'temporal_request_latency_seconds' containing tag keys [Operation]. The meter you are attempting to register has keys [Namespace, Operation, TaskQueue, WorkflowType].
	at io.micrometer.prometheus.PrometheusMeterRegistry.lambda$applyToCollector$16(PrometheusMeterRegistry.java:420)
	at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1932)
	at io.micrometer.prometheus.PrometheusMeterRegistry.applyToCollector(PrometheusMeterRegistry.java:404)
	at io.micrometer.prometheus.PrometheusMeterRegistry.newTimer(PrometheusMeterRegistry.java:198)
	at io.micrometer.core.instrument.MeterRegistry.lambda$timer$2(MeterRegistry.java:308)
	at io.micrometer.core.instrument.MeterRegistry.getOrCreateMeter(MeterRegistry.java:612)
	at io.micrometer.core.instrument.MeterRegistry.registerMeterIfNecessary(MeterRegistry.java:566)
	at io.micrometer.core.instrument.MeterRegistry.timer(MeterRegistry.java:306)
	at io.micrometer.core.instrument.Timer$Builder.register(Timer.java:539)
	at io.micrometer.core.instrument.MeterRegistry.timer(MeterRegistry.java:433)
	at io.temporal.common.reporter.MicrometerClientStatsReporter.reportTimer(MicrometerClientStatsReporter.java:69)
	at com.uber.m3.tally.TimerImpl.record(TimerImpl.java:55)
	at com.uber.m3.tally.TimerImpl.recordStopwatch(TimerImpl.java:69)
	at com.uber.m3.tally.Stopwatch.stop(Stopwatch.java:48)
	at io.temporal.serviceclient.GrpcMetricsInterceptor$MetricsClientCall$1.onClose(GrpcMetricsInterceptor.java:124)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:413)
	at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:742)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:721)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:740)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:149)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondWorkflowTaskCompleted(WorkflowServiceGrpc.java:2673)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.lambda$sendReply$0(WorkflowWorker.java:369)
	at io.temporal.internal.common.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:109)
	at io.temporal.internal.common.GrpcRetryer.retryWithResult(GrpcRetryer.java:127)
	at io.temporal.internal.common.GrpcRetryer.retry(GrpcRetryer.java:106)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.sendReply(WorkflowWorker.java:362)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:313)
	at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:275)
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:73)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)


While we are working on a fix on the SDK side, you could use a work-around suggested in https://github.com/temporalio/sdk-java/issues/200

We have a Java SDK metrics sample: samples-java/src/main/java/io/temporal/samples/metrics at main · temporalio/samples-java · GitHub

Also covered in the new Application development guide: Application development - Observability | Temporal Documentation