Missing workflow data from /metrics

johanforssell · March 22, 2021, 12:53pm

Hello

Looking at https://github.com/temporalio/dashboards/blob/6094dd666f386e76a3c03e0049f02521210b6883/dashboards/sdk.json#L446 I see that there should exist metrics named temporal_workflow_completed, temporal_workflow_failed, etc.

I cannot find any of these in the output from /metrics. I’m getting the following values:

build_age
build_information
client_errors
client_latency_bucket
client_latency_count
client_latency_sum
client_redirection_errors
client_redirection_latency_bucket
client_redirection_latency_count
client_redirection_latency_sum
client_redirection_requests
client_requests
event_blob_size_bucket
event_blob_size_count
event_blob_size_sum
gomaxprocs
history_size_bucket
history_size_count
history_size_sum
invalid_task_queue_name
memory_allocated
memory_gc_pause_ms_bucket
memory_gc_pause_ms_count
memory_gc_pause_ms_sum
memory_heap
memory_heapidle
memory_heapinuse
memory_num_gc
memory_stack
namespace_cache_callbacks_latency_bucket
namespace_cache_callbacks_latency_count
namespace_cache_callbacks_latency_sum
namespace_cache_prepare_callbacks_latency_bucket
namespace_cache_prepare_callbacks_latency_count
namespace_cache_prepare_callbacks_latency_sum
num_goroutines
persistence_latency_bucket
persistence_latency_count
persistence_latency_sum
persistence_requests
restarts
service_authorization_latency_bucket
service_authorization_latency_count
service_authorization_latency_sum
service_errors_context_timeout
service_errors_entity_not_found
service_errors_execution_already_started
service_latency_bucket
service_latency_count
service_latency_sum
service_requests
version_check_failed
version_check_latency_bucket
version_check_latency_count
version_check_latency_sum
version_check_request_failed

Is there something I missed turning on? I’ve set the environment variable PROMETHEUS_ENDPOINT: "0.0.0.0:9090" on the frontend node.

I’m using Docker Swarm to run my services. Is there anything from https://github.com/temporalio/helm-charts/blob/master/values.yaml I should set as env to trigger metrics on the workflows?

Wenquan_Xing · March 22, 2021, 6:44pm

temporal_workflow_completed , temporal_workflow_failed these metrics are reported by SDK:

github.com

temporalio/sdk-go/blob/d87457aa1f2fe2dd2ae54ca2cb8f896b7203d4f5/internal/common/metrics/constants.go#L31


// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.

package metrics

// Metrics keys
const (
	TemporalMetricsPrefix = "temporal_"

	WorkflowCompletedCounter     = TemporalMetricsPrefix + "workflow_completed"
	WorkflowCanceledCounter      = TemporalMetricsPrefix + "workflow_canceled"
	WorkflowFailedCounter        = TemporalMetricsPrefix + "workflow_failed"
	WorkflowContinueAsNewCounter = TemporalMetricsPrefix + "workflow_continue_as_new"
	WorkflowEndToEndLatency      = TemporalMetricsPrefix + "workflow_endtoend_latency" // measure workflow execution from start to close

	WorkflowTaskReplayLatency           = TemporalMetricsPrefix + "workflow_task_replay_latency"
	WorkflowTaskQueuePollEmptyCounter   = TemporalMetricsPrefix + "workflow_task_queue_poll_empty"
	WorkflowTaskQueuePollSucceedCounter = TemporalMetricsPrefix + "workflow_task_queue_poll_succeed"
	WorkflowTaskScheduleToStartLatency  = TemporalMetricsPrefix + "workflow_task_schedule_to_start_latency"
	WorkflowTaskExecutionLatency        = TemporalMetricsPrefix + "workflow_task_execution_latency"

github.com

temporalio/sdk-java/blob/f0bbf84c496454d9909caefe6ca2784f76b8e592/temporal-sdk/src/main/java/io/temporal/internal/metrics/MetricsType.java#L24


*  or in the "license" file accompanying this file. This file is distributed on
*  an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
*  express or implied. See the License for the specific language governing
*  permissions and limitations under the License.
*/

package io.temporal.internal.metrics;

public class MetricsType {
 public static final String TEMPORAL_METRICS_PREFIX = "temporal_";
 public static final String WORKFLOW_COMPLETED_COUNTER =
     TEMPORAL_METRICS_PREFIX + "workflow_completed";
 public static final String WORKFLOW_CANCELED_COUNTER =
     TEMPORAL_METRICS_PREFIX + "workflow_canceled";
 public static final String WORKFLOW_FAILED_COUNTER = TEMPORAL_METRICS_PREFIX + "workflow_failed";
 public static final String WORKFLOW_CONTINUE_AS_NEW_COUNTER =
     TEMPORAL_METRICS_PREFIX + "workflow_continue_as_new";
 /** measure workflow execution from start to close */
 public static final String WORKFLOW_E2E_LATENCY =
     TEMPORAL_METRICS_PREFIX + "workflow_endtoend_latency";

johanforssell · March 26, 2021, 1:37pm

Thank you for your reply!

Turns out, there’s several issues which I did not understand when I wrote this question.

First of all - I was only picking up the metrics from the Temporal FRONTEND node. In order to get a complete set of metric data, one should also set the PROMETHEUS_ENDPOINT environment variable for the HISTORY, WORKER and HISTORY nodes.

Will ADMINTOOLS also give you metrics? I don’t know.

Secondly - I did not understand that you need to report “your own” metrics directly from your Temporal Client as implemented by the SDK you’re using. There’s the example in samples-go where we see how to set the MetricsScope when creating the Temporal Client.

	c, err := client.NewClient(client.Options{
		MetricsScope: newPrometheusScope(prometheus.Configuration{
			ListenAddress: "0.0.0.0:9090",
			TimerType:     "histogram",
		}),
	})

So - between setting the env correctly for all the Temporal server nodes and reporting from the Temporal Client when using the SDK - I’m now getting all the metrics. I think?

(this post for other people trying to get this working. Is this collected in the documentation in any way, yet?)

Wenquan_Xing · March 26, 2021, 8:29pm

admin tools provides the tools for accessing Temporal, I do not expect this pod to emit metrics

Wenquan_Xing · March 26, 2021, 8:30pm

i believe so

Topic		Replies	Views
Individual workflow metric Community Support metrics	4	1635	March 30, 2022
How to get the workflow execution time from external system using java sdk to use it for Metrics Community Support java-sdk , helm , metrics	1	1690	August 14, 2020
Attaching custom tags workflow metrics Community Support prometheus , metrics	8	2940	August 26, 2020
Temporal_workflow_completed metrics seems not exist anymore (upgrade to 1.17.0 Community Support metrics	3	939	July 4, 2022
Prom metrics missing using python worker and go workflow Community Support general-impl , kubernetes	1	722	June 28, 2022

Missing workflow data from /metrics

Related topics