Inconsistent Sleep Durations in Temporal Workflows

PierreVdw · November 13, 2023, 5:13am

Hello Temporal community,

I’m working on a Temporal workflow, and I’ve noticed some inconsistencies in sleep durations when I use different methods for introducing delays. Specifically, I’m expecting a sleep duration of 33 milliseconds, but the actual sleep times sometimes vary significantly.

Here’s a brief summary of what I’ve observed:

I’m using workflow.Sleep(ctx, 33*time.Millisecond) to introduce a delay, but I’m seeing sleep times that occasionally deviate significantly from the expected 33 milliseconds.
I’ve also tried workflow.AwaitWithTimeout(ctx, 33*time.Microsecond) and workflow.Await with a timer, but I still encounter similar inconsistencies in sleep durations.

Here are some code snippets:

for {
	start := workflow.Now(ctx)
	fmt.Printf("[%s] sleeping start\n", start)
	workflow.Sleep(ctx, 33*time.Millisecond)
	end := workflow.Now(ctx)
	fmt.Printf("[%s] done sleeping (took %v ms)\n", end, end.Sub(start).Milliseconds())
}

Output:
[2023-11-13 04:42:32.960026124 +0000 UTC] sleeping start
[2023-11-13 04:42:33.003890779 +0000 UTC] done sleeping (took 43 ms)
[2023-11-13 04:42:33.003890779 +0000 UTC] sleeping start
[2023-11-13 04:42:34.006123603 +0000 UTC] done sleeping (took 1002 ms)
[2023-11-13 04:42:34.006123603 +0000 UTC] sleeping start
[2023-11-13 04:42:35.010955423 +0000 UTC] done sleeping (took 1004 ms)
[2023-11-13 04:42:35.010955423 +0000 UTC] sleeping start
[2023-11-13 04:42:36.007020711 +0000 UTC] done sleeping (took 996 ms)
[2023-11-13 04:42:36.007020711 +0000 UTC] sleeping start
[2023-11-13 04:42:37.009073729 +0000 UTC] done sleeping (took 1002 ms)

func (a *DialerActivities) PreventCpsThrottle(ctx context.Context, cpsLimit int) error {
	timeToSleep := time.Second / time.Duration(cpsLimit)
	time.Sleep(timeToSleep)

	return nil
}

for {
	start := workflow.Now(ctx)
	_ = workflow.ExecuteActivity(ctx, activity.Dialer.PreventCpsThrottle, 30).Get(ctx, nil)
	end := workflow.Now(ctx)
	fmt.Printf("[%s] done sleeping (took %v ms)\n", end, end.Sub(start).Milliseconds())
}

Output:
[2023-11-13 04:43:53.361941691 +0000 UTC] sleeping start
[2023-11-13 04:43:53.409735008 +0000 UTC] done sleeping (took 47 ms)
[2023-11-13 04:43:53.409735008 +0000 UTC] sleeping start
[2023-11-13 04:43:53.491488656 +0000 UTC] done sleeping (took 81 ms)
[2023-11-13 04:43:53.491488656 +0000 UTC] sleeping start
[2023-11-13 04:43:53.591618227 +0000 UTC] done sleeping (took 100 ms)
[2023-11-13 04:43:53.591618227 +0000 UTC] sleeping start
[2023-11-13 04:43:53.691441616 +0000 UTC] done sleeping (took 99 ms)
[2023-11-13 04:43:53.691441616 +0000 UTC] sleeping start
[2023-11-13 04:43:53.79112727 +0000 UTC] done sleeping (took 99 ms)

for i := 0; i < 5; i++ {
	start := workflow.Now(ctx)
	fmt.Printf("[%s] sleeping start\n", start)

	timer := workflow.NewTimer(ctx, time.Duration(33*time.Microsecond))
	workflow.Await(ctx, func() bool {
		return timer.IsReady()
	})

	end := workflow.Now(ctx)
	fmt.Printf("[%s] done sleeping (took %v ms)\n", end, end.Sub(start).Milliseconds())
}

Output:
[2023-11-13 05:03:26.887772415 +0000 UTC] sleeping start
[2023-11-13 05:03:26.906969428 +0000 UTC] done sleeping (took 19 ms)
[2023-11-13 05:03:26.906969428 +0000 UTC] sleeping start
[2023-11-13 05:03:27.905998517 +0000 UTC] done sleeping (took 999 ms)
[2023-11-13 05:03:27.905998517 +0000 UTC] sleeping start
[2023-11-13 05:03:28.907473299 +0000 UTC] done sleeping (took 1001 ms)
[2023-11-13 05:03:28.907473299 +0000 UTC] sleeping start
[2023-11-13 05:03:29.907984135 +0000 UTC] done sleeping (took 1000 ms)
[2023-11-13 05:03:29.907984135 +0000 UTC] sleeping start
[2023-11-13 05:03:30.909456751 +0000 UTC] done sleeping (took 1001 ms)

I’m curious about why this might be happening and if there are any insights or best practices for achieving more consistent timing in Temporal workflows. Is there something I should be aware of regarding how Temporal handles sleep and timers that could explain these variations?

Any guidance or suggestions would be greatly appreciated. Thank you for your help!

taonic · November 13, 2023, 6:12am

Hi @PierreVdw, thanks for your post. Yes, I’m able to reproduce this behaviour. Let me dig a little deeper and circle back. Cheers!

PierreVdw · November 13, 2023, 6:14am

Hi @taonic, thank you. I look forward to your feedback.

taonic · November 15, 2023, 9:51am

Hi @PierreVdw sorry about the wait.

After some conversations with our team, I was made aware that sub-second timer (your 1st and 3rd scenario) cannot be achieved reliably due to some implementation details at the queue level. Besides, given how Activity and Workflow Tasks are scheduled, polled, executed and responded between the worker and server there will always be some level of network latency contribute to the overall sleeping time as observed in your 2nd scenario.

To manage rate limit, we typically recommend setting TaskQueueActivitiesPerSecond. You can also refer to other posts for more context on this topic.

maxim · November 15, 2023, 7:22pm

To expand on @taonic’s answer. The latency between workflow steps depends on a specific deployment of the Temporal cluster. Your latencies are on the high side. The same test executed against the Temporal cloud would give much lower latencies. But they it still be higher than the 30ms sleep.

PierreVdw · November 16, 2023, 6:19am

Hi @taonic and @maxim,

We had a hunch that this would be due to how activities and workflows work behind the scenes as well as the possible network latency. Just thought we’d post here to confirm.

Thank you kindly for the feedback and time you took to investigate this topic.

Topic		Replies	Views
Workflow.Sleep caused Non-deterministic error Community Support go-sdk , general-impl	10	673	September 27, 2023
Non-Determinism and Sleep: Is it ok to calculate sleep duration at runtime? Community Support workflow-implementat	3	327	May 22, 2024
Finding the Time Elapsed in Testing a Workflow that has workflow.Sleep Community Support go-sdk	2	1242	February 12, 2022
Configuring await() to work like sleep() Community Support java-sdk	1	795	January 15, 2021
Temporal sleep feature Community Support java-sdk , general-impl	1	2925	April 21, 2023

Inconsistent Sleep Durations in Temporal Workflows

Related topics