via the application’s worker
However, as far as the deployment itself, the HPA is currently only scaling on memory (or CPU utilization) so it looks something like
Many people use “available slots” to know when to scale workers. See Developer's guide - Worker performance | Temporal Documentation. You would have benchmarked your workers and set their max-concurrent-activities set to a number you know the resources can handle, then you would watch if available slots get too low.
thank you for the reply! That sounds great! As far as the python sdk for temporal, how do I evoke metrics like temporal_worker_task_slots_available or other temporal_ metrics in my activity/worker? Is there a python example of this?
are those default metrics that arise anytime I consider a metrics runtime(in this case, otel) like
from temporalio.contrib.opentelemetry import TracingInterceptor
from temporalio.runtime import OpenTelemetryConfig, Runtime, TelemetryConfig
runtime = init_runtime_with_telemetry()
Yes. Here is a sample of configuring OTel metrics on the runtime (which you only need to create one of and use when you’re creating your client). The interceptor is for tracing by the way, only the runtime needs to be configured for metrics.
awesome, so I tried configuring our otel collector and our temporal client to consider the otel runtime but I’m unable to get the temporal_ metrics at least for my remote worker. Is there anything else I would need to consider for a remote worker/activity? I also noticed, for the activites that were not remote, it was writing temporal_ metrics to a service name called temporal-core-sdk despite defining a different service name as far as the provider samples-python/open_telemetry/worker.py at 4303a9b15f4ddc4cd770bc0ba33afef90a25d3ae · temporalio/samples-python · GitHub
No, this should work without issue if you give it an OTLP gRPC endpoint.
That link is to a trace provider and is unrelated to metrics. When configuring TelemetryConfig for the metrics, you can set attach_service_name to overwrite the default of temporal-core-sdk.
@Chad_Retz thank you for the reply! I’m a little stumped as to why I’m able to send custom metrics yet unable to send the default temporal_ metrics to our otel instrumentator+dd via our remote worker. Would you suggest revisiting the configs?
As far as attach_service_name would setting it to True overwrite the service_name from temporal-core-sdk to the service name we provide? Or would we need to set it to False? (I noticed it seems to be set to True by default sdk-python/temporalio/runtime.py at main · temporalio/sdk-python · GitHub)
Yes, make sure that the endpoint you are giving as metrics=OpenTelemetryConfig(url="http://whatever") accepts OTel metrics, and that you are creating that one global runtime and using it across all clients you create.
It is defaulted to True which means the SDK attaches the service name. You set it to False to stop setting it. You can set global_tags to set any tags for all metrics (including service_name after setting attach to false).
Yes, make sure that the endpoint you are giving as metrics=OpenTelemetryConfig(url="http://whatever") accepts OTel metrics, and that you are creating that one global runtime and using it across all clients you create.
Thank you for the reply. I’m considering a global runtime and I’m able to write custom metrics using that endpoint(our endpoint is http://localhost:4317 since our app/worker also contains a otel sidecar upon deployment). It’s just not writing the default metrics. As a workaround, is there a way to import/consider default metrics as a variable so that I can manually add it as a metric in otel as I would a custom metric? Would love to be able to import temporal_worker_task_slots_available in some way and manually record that metric into otel. Or I was wondering if there was source code where temporal_worker_task_slots_available is computed?
It is defaulted to True which means the SDK attaches the service name. You set it to False to stop setting it. You can set global_tags to set any tags for all metrics (including service_name after setting attach to false).
Just to triple check I’m understanding this correctly, if I set to attach_service_name=False
in this case "my-service"? Thus, if we set attach_service_name=False those metrics would now be under service_name:my-service instead of service_name:temporal-core-sdk?
Can you clarify “write custom metrics”? Is this using my_runtime.metric_meter().[whatever] or some other tool? Does directly making metrics on the meter work? Are you sure you are setting this runtime as the client options?
This is deep in the Rust core and consuming the metric via the metric system may be the best way. Python does have the ability to now use a metric buffer where you can manually consume metrics instead of exposing via otel/prometheus. We don’t have a sample yet, but see this test.
This is confusing “tracing” and “metrics”. global_tags applies to metrics and is unrelated to TracerProvider, and yes if you set it with service_name tag it should be on every metric.