Hello
Looking at https://github.com/temporalio/dashboards/blob/6094dd666f386e76a3c03e0049f02521210b6883/dashboards/sdk.json#L446 I see that there should exist metrics named temporal_workflow_completed
, temporal_workflow_failed
, etc.
I cannot find any of these in the output from /metrics . I’m getting the following values:
build_age
build_information
client_errors
client_latency_bucket
client_latency_count
client_latency_sum
client_redirection_errors
client_redirection_latency_bucket
client_redirection_latency_count
client_redirection_latency_sum
client_redirection_requests
client_requests
event_blob_size_bucket
event_blob_size_count
event_blob_size_sum
gomaxprocs
history_size_bucket
history_size_count
history_size_sum
invalid_task_queue_name
memory_allocated
memory_gc_pause_ms_bucket
memory_gc_pause_ms_count
memory_gc_pause_ms_sum
memory_heap
memory_heapidle
memory_heapinuse
memory_num_gc
memory_stack
namespace_cache_callbacks_latency_bucket
namespace_cache_callbacks_latency_count
namespace_cache_callbacks_latency_sum
namespace_cache_prepare_callbacks_latency_bucket
namespace_cache_prepare_callbacks_latency_count
namespace_cache_prepare_callbacks_latency_sum
num_goroutines
persistence_latency_bucket
persistence_latency_count
persistence_latency_sum
persistence_requests
restarts
service_authorization_latency_bucket
service_authorization_latency_count
service_authorization_latency_sum
service_errors_context_timeout
service_errors_entity_not_found
service_errors_execution_already_started
service_latency_bucket
service_latency_count
service_latency_sum
service_requests
version_check_failed
version_check_latency_bucket
version_check_latency_count
version_check_latency_sum
version_check_request_failed
Is there something I missed turning on? I’ve set the environment variable PROMETHEUS_ENDPOINT: "0.0.0.0:9090"
on the frontend node.
I’m using Docker Swarm to run my services. Is there anything from https://github.com/temporalio/helm-charts/blob/master/values.yaml I should set as env to trigger metrics on the workflows?
1 Like
temporal_workflow_completed
, temporal_workflow_failed
these metrics are reported by SDK:
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
package metrics
// Metrics keys
const (
TemporalMetricsPrefix = "temporal_"
WorkflowCompletedCounter = TemporalMetricsPrefix + "workflow_completed"
WorkflowCanceledCounter = TemporalMetricsPrefix + "workflow_canceled"
WorkflowFailedCounter = TemporalMetricsPrefix + "workflow_failed"
WorkflowContinueAsNewCounter = TemporalMetricsPrefix + "workflow_continue_as_new"
WorkflowEndToEndLatency = TemporalMetricsPrefix + "workflow_endtoend_latency" // measure workflow execution from start to close
WorkflowTaskReplayLatency = TemporalMetricsPrefix + "workflow_task_replay_latency"
WorkflowTaskQueuePollEmptyCounter = TemporalMetricsPrefix + "workflow_task_queue_poll_empty"
WorkflowTaskQueuePollSucceedCounter = TemporalMetricsPrefix + "workflow_task_queue_poll_succeed"
WorkflowTaskScheduleToStartLatency = TemporalMetricsPrefix + "workflow_task_schedule_to_start_latency"
WorkflowTaskExecutionLatency = TemporalMetricsPrefix + "workflow_task_execution_latency"
* or in the "license" file accompanying this file. This file is distributed on
* an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
package io.temporal.internal.metrics;
public class MetricsType {
public static final String TEMPORAL_METRICS_PREFIX = "temporal_";
public static final String WORKFLOW_COMPLETED_COUNTER =
TEMPORAL_METRICS_PREFIX + "workflow_completed";
public static final String WORKFLOW_CANCELED_COUNTER =
TEMPORAL_METRICS_PREFIX + "workflow_canceled";
public static final String WORKFLOW_FAILED_COUNTER = TEMPORAL_METRICS_PREFIX + "workflow_failed";
public static final String WORKFLOW_CONTINUE_AS_NEW_COUNTER =
TEMPORAL_METRICS_PREFIX + "workflow_continue_as_new";
/** measure workflow execution from start to close */
public static final String WORKFLOW_E2E_LATENCY =
TEMPORAL_METRICS_PREFIX + "workflow_endtoend_latency";
Thank you for your reply!
Turns out, there’s several issues which I did not understand when I wrote this question.
First of all - I was only picking up the metrics from the Temporal FRONTEND node. In order to get a complete set of metric data, one should also set the PROMETHEUS_ENDPOINT
environment variable for the HISTORY, WORKER and HISTORY nodes.
Will ADMINTOOLS also give you metrics? I don’t know.
Secondly - I did not understand that you need to report “your own” metrics directly from your Temporal Client as implemented by the SDK you’re using. There’s the example in samples-go where we see how to set the MetricsScope when creating the Temporal Client.
c, err := client.NewClient(client.Options{
MetricsScope: newPrometheusScope(prometheus.Configuration{
ListenAddress: "0.0.0.0:9090",
TimerType: "histogram",
}),
})
So - between setting the env correctly for all the Temporal server nodes and reporting from the Temporal Client when using the SDK - I’m now getting all the metrics. I think?
(this post for other people trying to get this working. Is this collected in the documentation in any way, yet?)
1 Like
admin tools provides the tools for accessing Temporal, I do not expect this pod to emit metrics