I’m new to temporal and we run it in production. I’ve noticed since the beginning that Prometheus server fills up the assigned storage rather fast, going from 10 to 20 GB in about a week.
I can’t seem to find where to throttle/configure that.
Can someone point me in the right direction or give some advice please?
temporal produces a lot of metrics out of the box - its possible to either increase your scrape interval or if want certain metrics are unwanted/uneeeded drop them using prometheus scrape_config settings. depending on how you are running your production prometheus setup this might be in your prometheus config/rules or in a kubernetes ServiceMonitor resource for temporal.
at the same time i’d encourage keeping all the metrics for as long as is reasonable for debugging issues in production.