According to SDK metrics | Temporal Documentation none of SDK metrics is available in python-sdk, is that true? If not, how should I scrape the SDK metrics emitted from a Python worker process?
Thanks!
According to SDK metrics | Temporal Documentation none of SDK metrics is available in python-sdk, is that true? If not, how should I scrape the SDK metrics emitted from a Python worker process?
Thanks!
Python is still in beta and documentation on metrics haven’t been updated yet. We are working on this. In the meantime, every metric on that page that TypeScript supports, Python also supports. We are also in the process of documenting how to expose the metrics. See the comment at [Observability] Metrics is missing in Python · Issue #1465 · temporalio/documentation · GitHub for how to configure a Prometheus endpoint.
Thanks a lot, that’s exactly what I’m looking for!
Regarding prometheus: is there a way to hitch application-specific metrics onto the sdk metrics endpoint? Absent that, does guidance (or example Python code) exist as to how to go about exposing application-specific metrics in any fashion? It’s not clear to me how to go about this given multiple worker processes.
Thank you!
We do not have a way to do so onto the SDK metrics endpoint, but you can have a different endpoint. You can set up Prometheus metrics the way you would any other Python application, just don’t reuse the port that the SDK metrics are on and make sure you scrape both.
Thanks for the reply!
I’m using the synchronous/multiprocess worker model as described here: samples-python/hello_activity_multiprocess.py at main · temporalio/samples-python · GitHub . If others have exposed prometheus metrics from workers of this configuration, I’d love to see some example code. I’d also be interested to see example code making use of push gateway, if that’s the advisable way to export metrics frok a multiprocess temporal worker.
It would be super useful to see an example of this in samples-python, as it seems likely lots of users of the Python SDK are going to want to expose prometheus metrics
I can add a sample showing custom Prometheus metrics in Python but it would be unrelated Temporal. It’d just be showing how to use Prometheus in Python. As for multiprocess, we just use concurrent.futures — Launching parallel tasks — Python 3.11.1 documentation. Whatever way Prometheus works with that (or not) is unrelated to Temporal.
This turned out to be simpler than I originally thought. Despite the topic being somewhat off-topic, for posterity I’ll share what I did.
The Python Prometheus client has a provision for multiprocess services, see: GitHub - prometheus/client_python: Prometheus instrumentation library for Python applications . I have a situation where I’m operating temporal workers on the same hosts as multiprocess Python app servers which already expose a scrape endpoint. By configuring my temporal workers to use the same PROMETHEUS_MULTIPROC_DIR
env var value as the app server, the app server effectively exposes metrics emitted by my temporal workers.
Soon I intend to move the workers to their own hosts, and so I wrote a tiny scrape endpoint to expose worker metrics which I will run separately from the workers:
def start_telemetry_server():
"""Runs an HTTP server which exposes user-defined metrics and blocks
forever.
"""
# see: https://github.com/prometheus/client_python#multiprocess-mode-eg-gunicorn
multiproc_dpath = os.environ.get("PROMETHEUS_MULTIPROC_DIR")
if multiproc_dpath is None:
logger.critical(
"Refusing to start telemetry server. PROMETHEUS_MULTIPROC_DIR "
"environment variable must be defined and set to the directory in "
"which prometheus bookeeping files are stored."
)
sys.exit(1)
@atexit.register
def cleanup_prom_bookeeping():
fs.open_fs(multiproc_dpath).glob("*").remove()
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
class MetricsServer(http.server.BaseHTTPRequestHandler):
# pylint: disable=invalid-name
def do_GET(self):
data = generate_latest(registry)
self.send_response(200)
self.send_header("Content-type", CONTENT_TYPE_LATEST)
self.send_header("Content-Length", str(len(data)))
self.end_headers()
self.wfile.write(data)
all_addresses = ""
listen_port = settings.AUX_PROMETHEUS_PORT
server = http.server.HTTPServer((all_addresses, listen_port), MetricsServer)
server.serve_forever()
It would be helpful to see it added to the prometheus-sample being done in addition to the temporal metrics