Setting PROMETHEUS_ENDPOINT in docker-compose results in warning logs

Based on https://github.com/temporalio/temporal/issues/319 , I’ve added PROMETHEUS_ENDPOINT: '0.0.0.0:9090' to the Temporal docker-compose file. Launching results in:

{"level":"warn","ts":"2020-07-30T17:38:51.115Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [operation stats_type namespace]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
{"level":"warn","ts":"2020-07-30T17:38:51.115Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [stats_type namespace operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}

Is this expected behavior from Temporal?

Also, the helm example shows additional metrics tags, should those be added by default in docker-compose setups?

Thank you

2 Likes

Hey Jocelyn, thank you for trying this out, and for your question!

We haven’t tested prometheus docker-compose setup yet.

Without having looked into this in depth, I think you should be able to provide a prometheus configuration file, that would tell prometheus which endpoints to scrape (e. g. ‘temporal:9090’). Here is the relevant section in prometheus docs:

https://prometheus.io/docs/introduction/first_steps/#configuring-prometheus

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

Separate from that, the error message you included makes me wonder if we have a bug around using prometheus from inside our docker-compose, in a setup where all service roles are collapsed into one.

I will create a bug report for us to investigate and, if this is a real problem, track that. (you are welcome to create that, too if you prefer to have it under your name). https://github.com/temporalio/temporal/issues/new/choose

(If you happen to have a docker-compose file with a repro that you are able share with us, it would help us reproducing the problem).)

Thank you!
Mark.

Hi Mark & thank you for your reply. So I did configure Prometheus with a config similar to your suggestion and also configured Grafana with the Temporal dashboards, that part seems to be working fine and I can see some graph data such as memory allocation or goroutines, I’m just not sure all the metrics are working properly.

I’ve just tested again with the latest release of Temporal, and still getting the same warning logs. Here is the docker-compose file to reproduce:

version: '3.7'
services:
  postgres:
    image: postgres:12.3
    restart: unless-stopped
    ports:
      - '5432:5432'
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: placeholder
    volumes:
      - postgres-data-volume:/var/lib/postgresql/data
  temporal-server:
    image: temporalio/server:0.28.0
    restart: unless-stopped
    ports:
      - '7233:7233'
    environment:
      AUTO_SETUP: 'true'
      DB: 'postgres'
      DB_PORT: '5432'
      POSTGRES_USER: 'temporal'
      POSTGRES_PWD: 'placeholder'
      POSTGRES_SEEDS: 'postgres'
      DYNAMIC_CONFIG_FILE_PATH: 'config/dynamicconfig/development.yaml'
      PROMETHEUS_ENDPOINT: '0.0.0.0:9090'
    depends_on:
      - postgres
  temporal-web:
    image: temporalio/web:0.28.0
    restart: unless-stopped
    ports:
      - '8088:8088'
    environment:
      TEMPORAL_GRPC_ENDPOINT: temporal-server:7233
    depends_on:
      - temporal-server
volumes:
  postgres-data-volume:

Running this results in a few warnings related to Prometheus:

"level":"warn","ts":"2020-08-06T20:10:45.028Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.087Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.192Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.481Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"forwarded\", help: \"forwarded counter\", constLabels: {}, variableLabels: [operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.534Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"forwarded\", help: \"forwarded counter\", constLabels: {}, variableLabels: [operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:49.293Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [namespace stats_type operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:49.293Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [stats_type operation namespace]} has different label names or a different help string","logging-call-at":"metrics.go:135"}

Which points to https://github.com/temporalio/temporal/blob/ca33ba46a6352fc17ff82dda1a0e4eda228a68ec/common/service/config/metrics.go#L135

Adding just the tags doesn’t seem to fix the issue, I tried it by mounting a volume of a modified config_template.yaml with tags types frontend, matching, history and worker:

version: '3.7'
services:
  postgres:
    image: postgres:12.3
    restart: unless-stopped
    ports:
      - '5432:5432'
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: placeholder
    volumes:
      - postgres-data-volume:/var/lib/postgresql/data
  temporal-server:
    image: temporalio/server:0.28.0
    restart: unless-stopped
    ports:
      - '7233:7233'
    environment:
      AUTO_SETUP: 'true'
      DB: 'postgres'
      DB_PORT: '5432'
      POSTGRES_USER: 'temporal'
      POSTGRES_PWD: 'placeholder'
      POSTGRES_SEEDS: 'postgres'
      DYNAMIC_CONFIG_FILE_PATH: 'config/dynamicconfig/development.yaml'
      PROMETHEUS_ENDPOINT: '0.0.0.0:9090'
    volumes:
      - ./config/config_template.yaml:/etc/temporal/config/config_template.yaml:ro
    depends_on:
      - postgres
  temporal-web:
    image: temporalio/web:0.28.0
    restart: unless-stopped
    ports:
      - '8088:8088'
    environment:
      TEMPORAL_GRPC_ENDPOINT: temporal-server:7233
    depends_on:
      - temporal-server
volumes:
  postgres-data-volume:

FYI, I haven’t created a bug report for now. Thank you,
Jocelyn.

1 Like

Opened github ticket https://github.com/temporalio/temporal/issues/673

1 Like

Hello,

Is this issue still present in 1.2.1 release? Could be wrong, but believe the ticket was merged in master but it is not present in 1.2.1 release?

When solved, and by looking into the fix, adding this into docker-compose will be enough to use it?

  • “PROMETHEUS_ENDPOINT=0.0.0.0:9090”

Thanks!

1 Like

Hi Pedro.
I’ve just published new 1.3.0 release and this thing is fixed there. Please note, you will need to remove metrics section from every service and add one to global section in your config file.

1 Like

Thank you Alex.

Using the 1.3.0 image we can scrap the metrics using Prometheus. But we noticed that most of the grafana Dashboards available shows empty info, so we began searching for issues.

One thing we noticed is the warnings in the log :

temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.760Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “service_latency”, help: “service_latency histogram”, constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.761Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “service_latency”, help: “service_latency histogram”, constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.761Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “service_requests”, help: “service_requests counter”, constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.761Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “service_requests”, help: “service_requests counter”, constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.802Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “forwarded”, help: “forwarded counter”, constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:21.802Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “forwarded”, help: “forwarded counter”, constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“info”,“ts”:“2020-11-12T16:02:25.600Z”,“msg”:“Get dynamic config”,“name”:“limit.blobSize.error”,“value”:“2097152”,“default-value”:“2097152”,“logging-call-at”:“config.go:79”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:25.605Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “service_requests”, help: “service_requests counter”, constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:25.606Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “service_latency”, help: “service_latency histogram”, constLabels: {}, variableLabels: [operation]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:25.614Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “history_size”, help: “history_size histogram”, constLabels: {}, variableLabels: [operation stats_type]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}
temporal_1 | {“level”:“warn”,“ts”:“2020-11-12T16:02:25.617Z”,“msg”:“error in prometheus reporter”,“error”:“a previously registered descriptor with the same fully-qualified name as Desc{fqName: “service_errors_execution_already_started”, help: “service_errors_execution_already_started counter”, constLabels: {}, variableLabels: [operation namespace]} has different label names or a different help string”,“logging-call-at”:“metrics.go:135”}

Note that we are not using any development_config.yaml while using docker-compose.

Looking into the grafana Dashboard, we noticed that most of graphs that are showing empty results are looking into metrics not available in Prometheus. Example :

temporal_activity_execution_latency_bucket
temporal_workflow_completed

Metrics that we do have info :

activity_end_to_end_latency_bucket
persistence_latency_bucket

Should the grafana dashboards available here https://github.com/temporalio/dashboards/tree/master/dashboards be used in this setup?

I can send you a file with all metrics being scraped in the Prometheus endpoint in temporal server ( can’t attach files here i think? ). Let me know if it helps.

docker-compose.yml

version: ‘3’
services:
mysql:
image: mysql:5.7
ports:
- “3306:3306”
environment:
- “MYSQL_ROOT_PASSWORD=root”
temporal:
image: temporalio/auto-setup:{SERVER_TAG:-1.3.0} ports: - "7233:7233" - "9100:9100" volumes: - {DYNAMIC_CONFIG_DIR:-…/config/dynamicconfig}:/etc/temporal/config/dynamicconfig
environment:
- “DB=mysql”
- “MYSQL_USER=root”
- “MYSQL_PWD=root”
- “MYSQL_SEEDS=mysql”
- “DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/development.yaml”
- “PROMETHEUS_ENDPOINT=0.0.0.0:9100”
depends_on:
- mysql
temporal-admin-tools:
image: temporalio/admin-tools:{SERVER_TAG:-1.3.0} stdin_open: true tty: true environment: - "TEMPORAL_CLI_ADDRESS=temporal:7233" depends_on: - temporal temporal-web: image: temporalio/web:{WEB_TAG:-1.1.1}
environment:
- “TEMPORAL_GRPC_ENDPOINT=temporal:7233”
- “TEMPORAL_PERMIT_WRITE_API=true”
ports:
- “8088:8088”
depends_on:
- temporal

Thanks again for all the help.

Yes, we are aware of these errors with different label names and working on it right now. Hopefully it will be fixed in next release.

As for dashboards, @samar should have more context.

The two examples you shared for metric with empty results are emitted from sdk. Here is the definition for temporal_activity_execution_latency. Are you sure your workflow/activity workers are correctly configured for metric scrapping?

Dashboards we shared at https://github.com/temporalio/dashboards/blob/master/dashboards are still work in progress, so at this point you should use those as examples to build your own dashboards. We are still iterating over the monitoring experience for Temporal and soon will create supported dashboards which just works out of the box for others to import.

SDK dashboards relies on all the metric emitted from workflow/activity workers. So make sure they are configured correctly otherwise this dashboard won’t have any data to show as it relies on metric emitted from client SDK.

Hello @samar,

Thanks for the feedback, although i’m not sure what config for the workers i need to check.

Is it at the start of the workers or the server? Any example / docs to check?

Thanks again for the help

You need to configure MetricsScope when initializing the client used by worker and then setup prometheus to scrape metric from your workers. Here is an example which shows how to configure MetricsScope when initializing temporal client.

Thanks @samar,

I was able to expose the metrics from worker this way.

1 Like