Datadog Tracing

We use Datadog for tracing most of our application today. They happen to have good coverage for our libraries here: dd-trace-go/contrib at v1 · DataDog/dd-trace-go · GitHub

But while they say they are OpenTracing compatible, they aren’t in actuality. Datadog spans aren’t linkable to opentracing spans and vice versa. This significantly reduces the benefit of tracing for us. I did leave a comment here to confirm/clarify: question: opentracing.SpanFromContext(request.Context()) returns nil inside a datadog span · Issue #813 · DataDog/dd-trace-go · GitHub

One possible alternative is to use the ContextPropagator to propagate the span context separately from the opentracing one, but I do not know how/where to create the spans. I also don’t see a place to finish the spans.

Maybe workflow interceptors? But there’s no activity interceptor for golang yet.

Any thoughts on the “easiest” way to proceed?

1 Like

I think the correct answer is to fix the Golang interceptors and add the activity interceptors. Then adding support for custom tracing would be trivial.

hey @maxim

Trying to understand the work that needs to be done here. If I implemented a version of this for datadog, I’d miss activity traces? sdk-go/tracing_interceptor.go at b1c3a91d252ed4177afaba9f6550dea40c434180 · temporalio/sdk-go · GitHub

I believe we fixed the Golang interceptors already. Let me check.

1 Like

Yes, I think they have all the needed features.

1 Like

We support both opentracing and opentelemetry and is none of those work for you, AFAIK, you can provide your own opentelemetry context propagator (not to be confused with Temporal context propagators) to connect the DataDog spans to Otel spans and back.

Yeah, Datadog specifically has some weirdness in their implementation. I’ve personally updated my codebase to use OpenTracing and that addressed by problems even before the interceptors fully worked.

Got it working, seems to work well with datadog. Here’s a datadog interceptor implementation if someone needs it.

package temporal

import (
	"context"
	"fmt"

	"go.temporal.io/sdk/interceptor"
	"go.temporal.io/sdk/log"
	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace"
	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
)

type TextMap struct {
	Map map[string]string
}

var _ tracer.TextMapReader = new(TextMap)
var _ tracer.TextMapWriter = new(TextMap)

func newTextMap() *TextMap {
	return &TextMap{Map: map[string]string{}}
}

func (tm *TextMap) Set(key, val string) {
	tm.Map[key] = val
}
func (tm *TextMap) ForeachKey(handler func(key, val string) error) error {
	for k, v := range tm.Map {
		if err := handler(k, v); err != nil {
			return err
		}
	}
	return nil
}

type spanContextKey struct{}

const defaultHeaderKey = "_tracer-data"

type ddTracer struct {
	interceptor.BaseTracer
	options TracerOptions
}

func NewTracer(options TracerOptions) interceptor.Tracer {
	return &ddTracer{options: options}
}

// NewTracingInterceptor creates an interceptor for setting on client options
// that implements Datadog tracing for workflows.
func NewTracingInterceptor(options TracerOptions) interceptor.Interceptor {
	t := NewTracer(options)
	return interceptor.NewTracingInterceptor(t)
}

// TracerOptions are options provided to NewTracingInterceptor or NewTracer.
type TracerOptions struct {

	// DisableSignalTracing can be set to disable signal tracing.
	DisableSignalTracing bool

	// DisableQueryTracing can be set to disable query tracing.
	DisableQueryTracing bool
}

func (t *ddTracer) Options() interceptor.TracerOptions {
	return interceptor.TracerOptions{
		DisableSignalTracing: t.options.DisableSignalTracing,
		DisableQueryTracing:  t.options.DisableQueryTracing,
		SpanContextKey:       spanContextKey{},
		HeaderKey:            defaultHeaderKey,
	}
}

func (t *ddTracer) UnmarshalSpan(m map[string]string) (interceptor.TracerSpanRef, error) {
	textMap := &TextMap{Map: m}
	spanCtx, err := tracer.Extract(textMap)
	if err != nil {
		return nil, err
	}
	return &tracerSpanRef{SpanContext: spanCtx}, nil
}

func (t *ddTracer) MarshalSpan(span interceptor.TracerSpan) (map[string]string, error) {
	textMap := newTextMap()
	if err := tracer.Inject(span.(*tracerSpan).Span.Context(), textMap); err != nil {
		return nil, err
	}
	return textMap.Map, nil
}

func (t *ddTracer) SpanFromContext(ctx context.Context) interceptor.TracerSpan {
	span, found := tracer.SpanFromContext(ctx)
	if !found {
		return nil
	}
	return &tracerSpan{Span: span}
}

func (t *ddTracer) ContextWithSpan(ctx context.Context, span interceptor.TracerSpan) context.Context {
	return tracer.ContextWithSpan(ctx, span.(*tracerSpan).Span)
}

func (t *ddTracer) StartSpan(opts *interceptor.TracerStartSpanOptions) (interceptor.TracerSpan, error) {
	// Create context with parent
	var parent ddtrace.SpanContext
	switch p := opts.Parent.(type) {
	case nil:
		// nil we ignore
	case *tracerSpan:
		parent = p.Span.Context()
	case *tracerSpanRef:
		parent = p.SpanContext
	default:
		return nil, fmt.Errorf("unrecognized parent type %T", p)
	}

	span := tracer.StartSpan(opts.Operation+":"+opts.Name, tracer.ChildOf(parent), tracer.StartTime(opts.Time))

	// Set tags
	for k, v := range opts.Tags {
		span.SetTag(k, v)
	}

	return &tracerSpan{Span: span}, nil
}

func (t *ddTracer) GetLogger(logger log.Logger, ref interceptor.TracerSpanRef) log.Logger {
	var spanCtx ddtrace.SpanContext
	switch p := ref.(type) {
	case *tracerSpan:
		spanCtx = p.Span.Context()
	case *tracerSpanRef:
		spanCtx = p.SpanContext
	default:
		return logger
	}
	return log.With(logger, "dd.trace_id", spanCtx.TraceID(), "dd.span_id", spanCtx.SpanID())
}

type tracerSpanRef struct{ ddtrace.SpanContext }

type tracerSpan struct{ ddtrace.Span }

func (t *tracerSpan) Finish(opts *interceptor.TracerFinishSpanOptions) {
	// Will ignore if error is nil
	t.Span.Finish(tracer.WithError(opts.Error))
}

Use like:

	return client.NewClient(client.Options{
		Interceptors: []interceptor.ClientInterceptor{
			NewTracingInterceptor(TracerOptions{}),
		},
	})
1 Like

When there are errors with a workflow (ie: an error with determinism), I do get strange traces that fill up the trace view. Seems like a retry of a workflow invocation is creating strange traces with weird end and start times.

Maybe when the workflow is invoked it uses the start time of the workflow for the trace start time even if it’s starting much later.

Is there support for tracing in the SDK Python tracing yet?