Python SDK metrics?

Siyang_Xie · September 21, 2022, 10:44pm

According to SDK metrics | Temporal Documentation none of SDK metrics is available in python-sdk, is that true? If not, how should I scrape the SDK metrics emitted from a Python worker process?

Thanks!

Chad_Retz · September 22, 2022, 1:19am

Python is still in beta and documentation on metrics haven’t been updated yet. We are working on this. In the meantime, every metric on that page that TypeScript supports, Python also supports. We are also in the process of documenting how to expose the metrics. See the comment at [Observability] Metrics is missing in Python · Issue #1465 · temporalio/documentation · GitHub for how to configure a Prometheus endpoint.

Siyang_Xie · September 22, 2022, 2:16am

Thanks a lot, that’s exactly what I’m looking for!

pleases.borderland · December 7, 2022, 3:33am

Regarding prometheus: is there a way to hitch application-specific metrics onto the sdk metrics endpoint? Absent that, does guidance (or example Python code) exist as to how to go about exposing application-specific metrics in any fashion? It’s not clear to me how to go about this given multiple worker processes.

Thank you!

Chad_Retz · December 7, 2022, 1:00pm

We do not have a way to do so onto the SDK metrics endpoint, but you can have a different endpoint. You can set up Prometheus metrics the way you would any other Python application, just don’t reuse the port that the SDK metrics are on and make sure you scrape both.

pleases.borderland · December 7, 2022, 7:51pm

Thanks for the reply!

I’m using the synchronous/multiprocess worker model as described here: samples-python/hello_activity_multiprocess.py at main · temporalio/samples-python · GitHub . If others have exposed prometheus metrics from workers of this configuration, I’d love to see some example code. I’d also be interested to see example code making use of push gateway, if that’s the advisable way to export metrics frok a multiprocess temporal worker.

It would be super useful to see an example of this in samples-python, as it seems likely lots of users of the Python SDK are going to want to expose prometheus metrics

Chad_Retz · December 7, 2022, 7:55pm

I can add a sample showing custom Prometheus metrics in Python but it would be unrelated Temporal. It’d just be showing how to use Prometheus in Python. As for multiprocess, we just use concurrent.futures — Launching parallel tasks — Python 3.11.1 documentation. Whatever way Prometheus works with that (or not) is unrelated to Temporal.

pleases.borderland · December 8, 2022, 7:43pm

This turned out to be simpler than I originally thought. Despite the topic being somewhat off-topic, for posterity I’ll share what I did.

The Python Prometheus client has a provision for multiprocess services, see: GitHub - prometheus/client_python: Prometheus instrumentation library for Python applications . I have a situation where I’m operating temporal workers on the same hosts as multiprocess Python app servers which already expose a scrape endpoint. By configuring my temporal workers to use the same PROMETHEUS_MULTIPROC_DIR env var value as the app server, the app server effectively exposes metrics emitted by my temporal workers.

Soon I intend to move the workers to their own hosts, and so I wrote a tiny scrape endpoint to expose worker metrics which I will run separately from the workers:

def start_telemetry_server():
    """Runs an HTTP server which exposes user-defined metrics and blocks
    forever.
    """
    # see: https://github.com/prometheus/client_python#multiprocess-mode-eg-gunicorn
    multiproc_dpath = os.environ.get("PROMETHEUS_MULTIPROC_DIR")
    if multiproc_dpath is None:
        logger.critical(
            "Refusing to start telemetry server. PROMETHEUS_MULTIPROC_DIR "
            "environment variable must be defined and set to the directory in "
            "which prometheus bookeeping files are stored."
        )
        sys.exit(1)

    @atexit.register
    def cleanup_prom_bookeeping():
        fs.open_fs(multiproc_dpath).glob("*").remove()

    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)

    class MetricsServer(http.server.BaseHTTPRequestHandler):
        # pylint: disable=invalid-name
        def do_GET(self):
            data = generate_latest(registry)
            self.send_response(200)
            self.send_header("Content-type", CONTENT_TYPE_LATEST)
            self.send_header("Content-Length", str(len(data)))
            self.end_headers()

            self.wfile.write(data)

    all_addresses = ""
    listen_port = settings.AUX_PROMETHEUS_PORT
    server = http.server.HTTPServer((all_addresses, listen_port), MetricsServer)
    server.serve_forever()

Nathan_Price · September 16, 2024, 6:59pm

It would be helpful to see it added to the prometheus-sample being done in addition to the temporal metrics

Topic		Replies	Views
Merging Prometheus Metrics from Temporal SDK and Local Services in a Python Application Community Support python-sdk , metrics	2	285	July 17, 2024
What metrics does temporal expose out of box and how to consume this in prometheus? Community Support prometheus , metrics	10	8803	August 5, 2022
Prom metrics missing using python worker and go workflow Community Support general-impl , kubernetes	1	720	June 28, 2022
How to monitor ScheduleToStart latency Community Support metrics	10	2094	February 28, 2022
Sdk metrics in go Community Support	4	413	June 19, 2023

Python SDK metrics?

Related topics