Setting PROMETHEUS_ENDPOINT in docker-compose results in warning logs

Based on https://github.com/temporalio/temporal/issues/319 , I’ve added PROMETHEUS_ENDPOINT: '0.0.0.0:9090' to the Temporal docker-compose file. Launching results in:

{"level":"warn","ts":"2020-07-30T17:38:51.115Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [operation stats_type namespace]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
{"level":"warn","ts":"2020-07-30T17:38:51.115Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [stats_type namespace operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}

Is this expected behavior from Temporal?

Also, the helm example shows additional metrics tags, should those be added by default in docker-compose setups?

Thank you

2 Likes

Hey Jocelyn, thank you for trying this out, and for your question!

We haven’t tested prometheus docker-compose setup yet.

Without having looked into this in depth, I think you should be able to provide a prometheus configuration file, that would tell prometheus which endpoints to scrape (e. g. ‘temporal:9090’). Here is the relevant section in prometheus docs:

https://prometheus.io/docs/introduction/first_steps/#configuring-prometheus

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

Separate from that, the error message you included makes me wonder if we have a bug around using prometheus from inside our docker-compose, in a setup where all service roles are collapsed into one.

I will create a bug report for us to investigate and, if this is a real problem, track that. (you are welcome to create that, too if you prefer to have it under your name). https://github.com/temporalio/temporal/issues/new/choose

(If you happen to have a docker-compose file with a repro that you are able share with us, it would help us reproducing the problem).)

Thank you!
Mark.

Hi Mark & thank you for your reply. So I did configure Prometheus with a config similar to your suggestion and also configured Grafana with the Temporal dashboards, that part seems to be working fine and I can see some graph data such as memory allocation or goroutines, I’m just not sure all the metrics are working properly.

I’ve just tested again with the latest release of Temporal, and still getting the same warning logs. Here is the docker-compose file to reproduce:

version: '3.7'
services:
  postgres:
    image: postgres:12.3
    restart: unless-stopped
    ports:
      - '5432:5432'
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: placeholder
    volumes:
      - postgres-data-volume:/var/lib/postgresql/data
  temporal-server:
    image: temporalio/server:0.28.0
    restart: unless-stopped
    ports:
      - '7233:7233'
    environment:
      AUTO_SETUP: 'true'
      DB: 'postgres'
      DB_PORT: '5432'
      POSTGRES_USER: 'temporal'
      POSTGRES_PWD: 'placeholder'
      POSTGRES_SEEDS: 'postgres'
      DYNAMIC_CONFIG_FILE_PATH: 'config/dynamicconfig/development.yaml'
      PROMETHEUS_ENDPOINT: '0.0.0.0:9090'
    depends_on:
      - postgres
  temporal-web:
    image: temporalio/web:0.28.0
    restart: unless-stopped
    ports:
      - '8088:8088'
    environment:
      TEMPORAL_GRPC_ENDPOINT: temporal-server:7233
    depends_on:
      - temporal-server
volumes:
  postgres-data-volume:

Running this results in a few warnings related to Prometheus:

"level":"warn","ts":"2020-08-06T20:10:45.028Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.087Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.192Z","msg":"error in prometheus reporter","error":"listen tcp 0.0.0.0:9090: bind: address already in use","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.481Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"forwarded\", help: \"forwarded counter\", constLabels: {}, variableLabels: [operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:45.534Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"forwarded\", help: \"forwarded counter\", constLabels: {}, variableLabels: [operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:49.293Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [namespace stats_type operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
"level":"warn","ts":"2020-08-06T20:10:49.293Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"history_size\", help: \"history_size histogram\", constLabels: {}, variableLabels: [stats_type operation namespace]} has different label names or a different help string","logging-call-at":"metrics.go:135"}

Which points to https://github.com/temporalio/temporal/blob/ca33ba46a6352fc17ff82dda1a0e4eda228a68ec/common/service/config/metrics.go#L135

Adding just the tags doesn’t seem to fix the issue, I tried it by mounting a volume of a modified config_template.yaml with tags types frontend, matching, history and worker:

version: '3.7'
services:
  postgres:
    image: postgres:12.3
    restart: unless-stopped
    ports:
      - '5432:5432'
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: placeholder
    volumes:
      - postgres-data-volume:/var/lib/postgresql/data
  temporal-server:
    image: temporalio/server:0.28.0
    restart: unless-stopped
    ports:
      - '7233:7233'
    environment:
      AUTO_SETUP: 'true'
      DB: 'postgres'
      DB_PORT: '5432'
      POSTGRES_USER: 'temporal'
      POSTGRES_PWD: 'placeholder'
      POSTGRES_SEEDS: 'postgres'
      DYNAMIC_CONFIG_FILE_PATH: 'config/dynamicconfig/development.yaml'
      PROMETHEUS_ENDPOINT: '0.0.0.0:9090'
    volumes:
      - ./config/config_template.yaml:/etc/temporal/config/config_template.yaml:ro
    depends_on:
      - postgres
  temporal-web:
    image: temporalio/web:0.28.0
    restart: unless-stopped
    ports:
      - '8088:8088'
    environment:
      TEMPORAL_GRPC_ENDPOINT: temporal-server:7233
    depends_on:
      - temporal-server
volumes:
  postgres-data-volume:

FYI, I haven’t created a bug report for now. Thank you,
Jocelyn.

1 Like

Opened github ticket https://github.com/temporalio/temporal/issues/673