Guidance on Logging Temporal Trace Errors with OpenTelemetry & Datadog

tudor_bostan · April 1, 2025, 3:42pm

Hi everyone,

I’m currently integrating Temporal with OpenTelemetry and Datadog for distributed tracing. I’ve implemented a custom sampler in Go that is intended to always sample and log error spans. The sampler checks for error indicators—specifically, an "error" attribute or an "otel.status_code" attribute set to Error—and logs the span details when an error is detected.

My goal is to ensure that all error traces are properly logged and visible in Datadog’s temporal log view. However, despite the sampler correctly sampling these error spans, I’m not seeing the expected error logs in Datadog.

Has anyone encountered this issue or could share insights on best practices for logging error spans in the context of Temporal and OpenTelemetry? Are there specific configurations or recommended approaches—either within Temporal or Datadog’s exporter settings—that might be necessary to capture these error details?

Any guidance or pointers to relevant documentation would be greatly appreciated!

Thanks in advance for your help.

package sampling

import (
	"fmt"
	"log"

	"go.opentelemetry.io/otel/codes"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

type CustomSampler struct {
	defaultSampler sdktrace.Sampler
}

func NewCustomSampler(defaultRatio float64) sdktrace.Sampler {
	errorAwareSampler := &CustomSampler{
		defaultSampler: sdktrace.TraceIDRatioBased(defaultRatio),
	}
	return sdktrace.ParentBased(
		errorAwareSampler,
		sdktrace.WithRemoteParentSampled(errorAwareSampler),
		sdktrace.WithRemoteParentNotSampled(errorAwareSampler),
		sdktrace.WithLocalParentSampled(errorAwareSampler),
		sdktrace.WithLocalParentNotSampled(errorAwareSampler),
	)
}

func (s *CustomSampler) ShouldSample(p sdktrace.SamplingParameters) sdktrace.SamplingResult {
	for _, attr := range p.Attributes {
		if attr.Key == "error" || (attr.Key == "otel.status_code" && attr.Value.AsString() == codes.Error.String()) {
			return sdktrace.SamplingResult{
				Decision:   sdktrace.RecordAndSample,
				Attributes: p.Attributes,
			}
		}
	}
	return s.defaultSampler.ShouldSample(p)
}

func (s *CustomSampler) Description() string {
	return fmt.Sprintf("CustomSampler(errors=100%%, other=%s)", s.defaultSampler.Description())
}

Kevin_Woo · May 7, 2025, 10:00pm

My goal is to ensure that all error traces are properly logged and visible in Datadog’s temporal log view. However, despite the sampler correctly sampling these error spans, I’m not seeing the expected error logs in Datadog.

Which errors do you expect?

Would you be able to provide a repro sample?

tudor_bostan · May 14, 2025, 5:16pm

Hi @Kevin_Woo,
Regarding the Temporal Otel trace configuration — I expect to see all error traces, not just a percentage of them. For example, if I set the samplingRate to 10% and there are 10 errors, I still want to see all 10 error traces, not just one.

Kevin_Woo · May 28, 2025, 12:51am

Can you share how you’re using your sampler with Temporal SDK and how the the trace logs are configured to be sent to Datadog?

Note, Datadog contributed a interceptor (and sample-go/datadog) to generate deterministic SpanIds to handle long running workflows and workflow Replay across machines. OTel is not able to be deterministic.

tudor_bostan · May 28, 2025, 11:47am

func InitializeGlobalGrpcTracerProvider(ctx context.Context, cfg *Config) (*Provider, error) {
	if err := validateConfig(cfg); err != nil {
		return nil, fmt.Errorf("validate trace provider config: %w", err)
	}

	exp, err := newOtlpTraceExporter(ctx, cfg)
	if err != nil {
		return nil, fmt.Errorf("create otlp trace grpc exporter: %w", err)
	}

	bsp := sdktrace.NewBatchSpanProcessor(exp,
		sdktrace.WithBatchTimeout(defaultBatchTimeout),
		sdktrace.WithMaxQueueSize(defaultMaxQueueSize),
		sdktrace.WithMaxExportBatchSize(defaultMaxExportBatchSize),
	)

	sampler := cfg.Sampler
	if sampler == nil {
		sampler = sdktrace.AlwaysSample()
	}

	res, err := newResource(ctx, cfg)
	if err != nil {
		return nil, fmt.Errorf("create resource: %w", err)
	}

	tp := sdktrace.NewTracerProvider(
		sdktrace.WithSpanProcessor(bsp),
		sdktrace.WithSampler(sampler),
		sdktrace.WithResource(res),
	)

	otel.SetTracerProvider(tp)
	otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
		propagation.TraceContext{},
		propagation.Baggage{},
	))

	return &Provider{
		tp:     tp,
		config: cfg,
	}, nil
}

// getSamplingRate converts a string sampling ratio to a float64 rate.
// The input can be either:
// - A ratio between 0 and 1 (e.g., "0.1" for 10% sampling)
// - A percentage between 0 and 100 (e.g., "10" for 10% sampling)
// Returns defaultSamplingRatio if the input is invalid.
func getSamplingRate(samplingRatio string) float64 {
	if samplingRatio == "" {
		return defaultSamplingRatio
	}

	v, err := strconv.ParseFloat(samplingRatio, 64)
	if err != nil {
		config.Logger.Debug("invalid sampling ratio",
			zap.String("value", samplingRatio),
			zap.Error(err))
		return defaultSamplingRatio
	}

	switch {
	case v == 0:
		config.Logger.Debug("sampling ratio cannot be zero")
		return defaultSamplingRatio
	case v > 0 && v <= 1:
		// Input is already a ratio (including 1), use as is
		return v
	case v > 1 && v <= 100:
		// Input is a percentage, convert to ratio
		return v / percentageFactor
	default:
		config.Logger.Debug("sampling ratio out of range",
			zap.Float64("value", v))
		return defaultSamplingRatio
	}
}

Kevin_Woo · May 28, 2025, 8:15pm

Thanks, taking a look. Another question popped up while thinking through this, are you looking to try to capture SDK errors or it’s just errors thrown from your Workflow and Activities?

I don’t believe the SDKs are setup to emit spans, so you’ll only get results from the stuff you instrument, but I’ll also double check this.

Kevin_Woo · May 28, 2025, 9:48pm

Also are you using Datadog’s OTel SDK, or just plain OTel?

I suspect your ShouldSample() method is not correctly matching errors so that they show up 100% of the time, instead they’re hitting the TraceIDRatioBased sampler.

For Datadog’s error key matches, they are using these keys, I think specifically the errors are keyed by error.message instead of just error.

For OTel’s specific, I believe it’s keyed as exception.message (v1.33.0).

Try doing a strings.Contains(attr.Key, "error") and strings.Contains(attr.Key, "exception") to see if that gets you into that condition?

Topic		Replies	Views
Open Tracing interceptor logs failure for internal temporal errors Community Support java-sdk , error-handling , opentracing	1	547	April 21, 2022
Datadog Tracing Community Support go-sdk	9	3773	February 1, 2024
Temporal not writing to OpenTelemetry Collector Community Support	1	920	August 1, 2023
How to integrate open-telemetry tracing to Temporal Server in K8S Community Support tracing , opentracing , server	0	595	July 4, 2023
OpenTeletry Integration with Temporal SDK Community Support java-sdk , tracing	3	567	February 25, 2025

Guidance on Logging Temporal Trace Errors with OpenTelemetry & Datadog

Related topics