Hi everyone,
I’ve been building out a complex workflow orchestration using the Temporal Go SDK, and while the durable execution model is working great for our long-running processes, I’ve hit a strange performance snag during local development and high-concurrency testing.
I’ve noticed that whenever my local environment is under significant resource pressure—specifically when I’m running background automation scripts or testing the roblox executor environment for its multi-threading capabilities—my Temporal workers start failing their heartbeats. The most frustrating part is that activities that should be resilient are being timed out by the server and retried unnecessarily. While discussing these bottlenecks with a few other developers, it seemed like the high CPU concurrency from my background tasks was causing the worker process to miss its heartbeat window, leading the Temporal server to assume the worker has crashed.
Has anyone else noticed their local workers hitting silent connection timeouts or experiencing activity eviction if there are other high-performance tasks running in the background? I’m seeing occasional “Activity Task Heartbeat Timeout” errors in my logs that only seem to trigger when the system is under heavy load, which makes me think the gRPC stream is being deprioritized by the OS during these spikes.
I’m trying to figure out if there’s a way to make the worker more resilient—perhaps by increasing the HeartbeatTimeout for specific activities or adjusting the worker’s thread priority to ensure the heartbeat signal stays consistent in “noisy” multitasking environments. If you’ve dealt with stabilizing your worker performance while multitasking heavily on a single machine, I’d really value your advice!
Best regards!