Setting PROMETHEUS_ENDPOINT in docker-compose results in warning logs

Jocelyn · July 30, 2020, 5:50pm

Based on https://github.com/temporalio/temporal/issues/319 , I’ve added PROMETHEUS_ENDPOINT: '0.0.0.0:9090' to the Temporal docker-compose file. Launching results in:

{"level":"warn","ts":"2020-07-30T17:38:51.115Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [operation stats_type namespace]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
{"level":"warn","ts":"2020-07-30T17:38:51.115Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [stats_type namespace operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}

Is this expected behavior from Temporal?

Also, the helm example shows additional metrics tags, should those be added by default in docker-compose setups?

Thank you

markmark · August 1, 2020, 12:10am

Hey Jocelyn, thank you for trying this out, and for your question!

We haven’t tested prometheus docker-compose setup yet.

Without having looked into this in depth, I think you should be able to provide a prometheus configuration file, that would tell prometheus which endpoints to scrape (e. g. ‘temporal:9090’). Here is the relevant section in prometheus docs:

https://prometheus.io/docs/introduction/first_steps/#configuring-prometheus

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

Separate from that, the error message you included makes me wonder if we have a bug around using prometheus from inside our docker-compose, in a setup where all service roles are collapsed into one.

I will create a bug report for us to investigate and, if this is a real problem, track that. (you are welcome to create that, too if you prefer to have it under your name). https://github.com/temporalio/temporal/issues/new/choose

(If you happen to have a docker-compose file with a repro that you are able share with us, it would help us reproducing the problem).)

Thank you!
Mark.

Jocelyn · August 6, 2020, 9:04pm

Hi Mark & thank you for your reply. So I did configure Prometheus with a config similar to your suggestion and also configured Grafana with the Temporal dashboards, that part seems to be working fine and I can see some graph data such as memory allocation or goroutines, I’m just not sure all the metrics are working properly.

I’ve just tested again with the latest release of Temporal, and still getting the same warning logs. Here is the docker-compose file to reproduce:

version: '3.7'
services:
  postgres:
    image: postgres:12.3
    restart: unless-stopped
    ports:
      - '5432:5432'
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: placeholder
    volumes:
      - postgres-data-volume:/var/lib/postgresql/data
  temporal-server:
    image: temporalio/server:0.28.0
    restart: unless-stopped
    ports:
      - '7233:7233'
    environment:
      AUTO_SETUP: 'true'
      DB: 'postgres'
      DB_PORT: '5432'
      POSTGRES_USER: 'temporal'
      POSTGRES_PWD: 'placeholder'
      POSTGRES_SEEDS: 'postgres'
      DYNAMIC_CONFIG_FILE_PATH: 'config/dynamicconfig/development.yaml'
      PROMETHEUS_ENDPOINT: '0.0.0.0:9090'
    depends_on:
      - postgres
  temporal-web:
    image: temporalio/web:0.28.0
    restart: unless-stopped
    ports:
      - '8088:8088'
    environment:
      TEMPORAL_GRPC_ENDPOINT: temporal-server:7233
    depends_on:
      - temporal-server
volumes:
  postgres-data-volume:

Running this results in a few warnings related to Prometheus:

"level":"warn","ts":"2020-08-06T20:10:45.028Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.087Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.192Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.481Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"forwarded\", help: \"forwarded counter\", constLabels: {}, variableLabels: [operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.534Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"forwarded\", help: \"forwarded counter\", constLabels: {}, variableLabels: [operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:49.293Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [namespace stats_type operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:49.293Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [stats_type operation namespace]} has different label names or a different help string","logging-call-at":"metrics.go:135"}

Which points to https://github.com/temporalio/temporal/blob/ca33ba46a6352fc17ff82dda1a0e4eda228a68ec/common/service/config/metrics.go#L135

Adding just the tags doesn’t seem to fix the issue, I tried it by mounting a volume of a modified config_template.yaml with tags types frontend, matching, history and worker:

version: '3.7'
services:
  postgres:
    image: postgres:12.3
    restart: unless-stopped
    ports:
      - '5432:5432'
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: placeholder
    volumes:
      - postgres-data-volume:/var/lib/postgresql/data
  temporal-server:
    image: temporalio/server:0.28.0
    restart: unless-stopped
    ports:
      - '7233:7233'
    environment:
      AUTO_SETUP: 'true'
      DB: 'postgres'
      DB_PORT: '5432'
      POSTGRES_USER: 'temporal'
      POSTGRES_PWD: 'placeholder'
      POSTGRES_SEEDS: 'postgres'
      DYNAMIC_CONFIG_FILE_PATH: 'config/dynamicconfig/development.yaml'
      PROMETHEUS_ENDPOINT: '0.0.0.0:9090'
    volumes:
      - ./config/config_template.yaml:/etc/temporal/config/config_template.yaml:ro
    depends_on:
      - postgres
  temporal-web:
    image: temporalio/web:0.28.0
    restart: unless-stopped
    ports:
      - '8088:8088'
    environment:
      TEMPORAL_GRPC_ENDPOINT: temporal-server:7233
    depends_on:
      - temporal-server
volumes:
  postgres-data-volume:

FYI, I haven’t created a bug report for now. Thank you,
Jocelyn.

Jocelyn · August 12, 2020, 7:02pm

Opened github ticket https://github.com/temporalio/temporal/issues/673

Pedro_Almeida · November 11, 2020, 9:44am

Hello,

Is this issue still present in 1.2.1 release? Could be wrong, but believe the ticket was merged in master but it is not present in 1.2.1 release?

When solved, and by looking into the fix, adding this into docker-compose will be enough to use it?

“PROMETHEUS_ENDPOINT=0.0.0.0:9090”

Thanks!

alex · November 11, 2020, 11:09pm

Hi Pedro.
I’ve just published new 1.3.0 release and this thing is fixed there. Please note, you will need to remove metrics section from every service and add one to global section in your config file.

Pedro_Almeida · November 12, 2020, 4:31pm

Thank you Alex.

Using the 1.3.0 image we can scrap the metrics using Prometheus. But we noticed that most of the grafana Dashboards available shows empty info, so we began searching for issues.

One thing we noticed is the warnings in the log :

temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.760Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "service_latency", help: "service_latency histogram", constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.761Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "service_latency", help: "service_latency histogram", constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.761Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "service_requests", help: "service_requests counter", constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.761Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "service_requests", help: "service_requests counter", constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.802Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "forwarded", help: "forwarded counter", constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.802Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "forwarded", help: "forwarded counter", constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“info”,“ts”:“2020-11-12T16:02:25.600Z”,“msg”:“Get dynamic config”,“name”:“limit.blobSize.error”,“value”:“2097152”,“default-value”:“2097152”,“logging-call-at”:“config.go:79”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:25.605Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "service_requests", help: "service_requests counter", constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:25.606Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "service_latency", help: "service_latency histogram", constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:25.614Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "history_size", help: "history_size histogram", constLabels: {}, variableLabels: [operation stats_type]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:25.617Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: "service_errors_execution_already_started", help: "service_errors_execution_already_started counter", constLabels: {}, variableLabels: [operation namespace]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}

Note that we are not using any development_config.yaml while using docker-compose.

Looking into the grafana Dashboard, we noticed that most of graphs that are showing empty results are looking into metrics not available in Prometheus. Example :

temporal_activity_execution_latency_bucket
temporal_workflow_completed

Metrics that we do have info :

activity_end_to_end_latency_bucket
persistence_latency_bucket

Should the grafana dashboards available here https://github.com/temporalio/dashboards/tree/master/dashboards be used in this setup?

I can send you a file with all metrics being scraped in the Prometheus endpoint in temporal server ( can’t attach files here i think? ). Let me know if it helps.

docker-compose.yml

version: ‘3’
services:
mysql:
image: mysql:5.7
ports:
- “3306:3306”
environment:
- “MYSQL_ROOT_PASSWORD=root”
temporal:
image: temporalio/auto-setup:${SERVER_TAG:-1.3.0}
ports:
- “7233:7233”
- “9100:9100”
volumes:
- ${DYNAMIC_CONFIG_DIR:-…/config/dynamicconfig}:/etc/temporal/config/dynamicconfig
environment:
- “DB=mysql”
- “MYSQL_USER=root”
- “MYSQL_PWD=root”
- “MYSQL_SEEDS=mysql”
- “DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/development.yaml”
- “PROMETHEUS_ENDPOINT=0.0.0.0:9100”
depends_on:
- mysql
temporal-admin-tools:
image: temporalio/admin-tools:${SERVER_TAG:-1.3.0}
stdin_open: true
tty: true
environment:
- “TEMPORAL_CLI_ADDRESS=temporal:7233”
depends_on:
- temporal
temporal-web:
image: temporalio/web:${WEB_TAG:-1.1.1}
environment:
- “TEMPORAL_GRPC_ENDPOINT=temporal:7233”
- “TEMPORAL_PERMIT_WRITE_API=true”
ports:
- “8088:8088”
depends_on:
- temporal

Thanks again for all the help.

alex · November 12, 2020, 5:39pm

Yes, we are aware of these errors with different label names and working on it right now. Hopefully it will be fixed in next release.

As for dashboards, @samar should have more context.

samar · November 17, 2020, 11:12pm

The two examples you shared for metric with empty results are emitted from sdk. Here is the definition for temporal_activity_execution_latency. Are you sure your workflow/activity workers are correctly configured for metric scrapping?

Dashboards we shared at https://github.com/temporalio/dashboards/blob/master/dashboards are still work in progress, so at this point you should use those as examples to build your own dashboards. We are still iterating over the monitoring experience for Temporal and soon will create supported dashboards which just works out of the box for others to import.

SDK dashboards relies on all the metric emitted from workflow/activity workers. So make sure they are configured correctly otherwise this dashboard won’t have any data to show as it relies on metric emitted from client SDK.

Pedro_Almeida · November 18, 2020, 12:03pm

Hello @samar,

Thanks for the feedback, although i’m not sure what config for the workers i need to check.

Is it at the start of the workers or the server? Any example / docs to check?

Thanks again for the help

samar · November 19, 2020, 6:46pm

You need to configure MetricsScope when initializing the client used by worker and then setup prometheus to scrape metric from your workers. Here is an example which shows how to configure MetricsScope when initializing temporal client.

Pedro_Almeida · November 23, 2020, 11:32am

Thanks @samar,

I was able to expose the metrics from worker this way.

Topic		Replies	Views
What metrics does temporal expose out of box and how to consume this in prometheus? Community Support prometheus , metrics	10	8823	August 5, 2022
Metrics endpoint not being enabled on upgrade Community Support java-sdk , upgrading	3	1410	July 28, 2022
Grafana graphs - not all graphs are getting populated Community Support metrics , advanced_visibility	3	1309	August 11, 2020
Missing workflow data from /metrics Community Support metrics	4	1729	March 26, 2021
How to enable Metrics Support using JAVA SDK Community Support java-sdk , metrics	14	4391	August 12, 2022

Setting PROMETHEUS_ENDPOINT in docker-compose results in warning logs

Related topics