Temporal Worker Crashes: JS OOM Error

Hi! I’m new to temporal, and I’m experiencing a worker crash that I don’t know how to debug. Looking for some advice. I have a basic temporal worker running:

import { activities } from "@mobsol/activities";
import { getConnectionOptions, namespace, taskQueue } from "@mobsol/common";
import { prisma } from "@mobsol/db";
import { NativeConnection, Worker } from "@temporalio/worker";

async function run() {
  const connection = await NativeConnection.connect(getConnectionOptions());
  // Mock DB connection initialization in Worker
  const worker = await Worker.create({
    workflowsPath: require.resolve("@mobsol/workflows/"),
    activities: activities(prisma),
    connection,
    namespace,
    taskQueue,
  });

  await worker.run();
}

run().catch((err) => {
  console.error(err);
  process.exit(1);
});

However, it ends up crashing with a javascript out of memory error.

worker:dev: <--- Last few GCs --->
worker:dev:
worker:dev: [71985:0x118008000]  1422142 ms: Mark-Compact 3570.5 (4127.0) -> 3544.1 (4115.4) MB, 492.79 / 0.00 ms  (average mu = 0.151, current mu = 0.084) allocation failure; scavenge might not succeed
worker:dev: [71985:0x118008000]  1422670 ms: Mark-Compact 3571.8 (4128.4) -> 3544.7 (4116.0) MB, 480.33 / 0.00 ms  (average mu = 0.121, current mu = 0.089) allocation failure; scavenge might not succeed
worker:dev:
worker:dev:
worker:dev: <--- JS stacktrace --->
worker:dev:
worker:dev: FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
worker:dev: ----- Native stack trace -----
worker:dev:
worker:dev:  1: 0x104444ff8 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev:  2: 0x1045c4588 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev:  3: 0x1045c4538 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev:  4: 0x104769b64 v8::internal::Heap::CallGCPrologueCallbacks(v8::GCType, v8::GCCallbackFlags, v8::internal::GCTracer::Scope::ScopeId) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev:  5: 0x10476c5ec v8::internal::Heap::ComputeMutatorUtilization(char const*, double, double) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev:  6: 0x10476c2d4 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev:  7: 0x10476b65c v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev:  8: 0x10476a0f8 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags)::$_6::operator()() const [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev:  9: 0x104769ed0 void heap::base::Stack::SetMarkerAndCallbackImpl<v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags)::$_6>(heap::base::Stack*, void*, void const*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 10: 0x104358028 PushAllRegistersAndIterateStack [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 11: 0x104768980 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 12: 0x104760ed0 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 13: 0x104761674 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 14: 0x1047490dc v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 15: 0x10473e020 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 16: 0x10473df14 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::AllocationType) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 17: 0x104971874 v8::internal::Handle<v8::internal::NameDictionary> v8::internal::HashTable<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::New<v8::internal::Isolate>(v8::internal::Isolate*, int, v8::internal::AllocationType, v8::internal::MinimumCapacity) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 18: 0x104971de4 v8::internal::Handle<v8::internal::NameDictionary> v8::internal::HashTable<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::EnsureCapacity<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NameDictionary>, int, v8::internal::AllocationType) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 19: 0x1049729b8 v8::internal::Handle<v8::internal::NameDictionary> v8::internal::Dictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::Add<v8::internal::Isolate, (v8::internal::AllocationType)0>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NameDictionary>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, v8::internal::InternalIndex*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 20: 0x104a4b6f8 v8::internal::Runtime_AddDictionaryProperty(int, unsigned long*, v8::internal::Isolate*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 21: 0x10424b954 Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 22: 0x12a962874
worker:dev: 23: 0x1041f6b18 Builtins_AsyncFunctionEnter [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 24: 0x129a99f90
worker:dev: 25: 0x12a878498
worker:dev: 26: 0x1041f7b10 Builtins_AsyncFunctionAwaitResolveClosure [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 27: 0x1042b12d8 Builtins_PromiseFulfillReactionJob [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 28: 0x1041e6654 Builtins_RunMicrotasks [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 29: 0x1041be794 Builtins_JSRunMicrotasksEntry [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 30: 0x1046e41b4 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 31: 0x1046e4978 v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 32: 0x104706028 v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 33: 0x104705e60 v8::internal::MicrotaskQueue::PerformCheckpointInternal(v8::Isolate*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 34: 0x104358b90 node::InternalCallbackScope::Close() [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 35: 0x1043590e8 node::InternalMakeCallback(node::Environment*, v8::Local<v8::Object>, v8::Local<v8::Object>, v8::Local<v8::Function>, int, v8::Local<v8::Value>*, node::async_context) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 36: 0x1043706b8 node::AsyncWrap::MakeCallback(v8::Local<v8::Function>, int, v8::Local<v8::Value>*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 37: 0x10448d7b8 node::worker::MessagePort::OnMessage(node::worker::MessagePort::MessageProcessingMode) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 38: 0x107a8e4a8 uv__async_io [/opt/homebrew/Cellar/libuv/1.48.0/lib/libuv.1.dylib]
worker:dev: 39: 0x107a9e164 uv__io_poll [/opt/homebrew/Cellar/libuv/1.48.0/lib/libuv.1.dylib]
worker:dev: 40: 0x107a8e93c uv_run [/opt/homebrew/Cellar/libuv/1.48.0/lib/libuv.1.dylib]
worker:dev: 41: 0x10435996c node::SpinEventLoopInternal(node::Environment*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 42: 0x10448c7fc node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 43: 0x10448c550 node::NodeMainInstance::Run() [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 44: 0x1043fbe40 node::Start(int, char**) [/opt/homebrew/Cellar/node/21.7.1/bin/node]
worker:dev: 45: 0x18859a0e0 start [/usr/lib/dyld]

Does a worker have access to all the RAM on my system? Is there a setting I should change to increase the memory limit? What data is passed from a workflow or activity to the worker /what could cause this issue? Thanks for any tips you have!

Hello, and welcome!

Does a worker have access to all the RAM on my system? Is there a setting I should change to increase the memory limit?

Not automatically. Node.js, like most garbage collected engines, deal with two fundamental category of memory: the heap, which is managed by the garbage collection process, and non-heap, which is everything in memory that is managed by the garbage collection process.

Now, Node.js sets a limit on the maximum size of the heap (1). You may specify that limit yourself (--max-old-space-size=8192 to set a limit of 8GB). If you don’t, it will automatically determine that limit based on your machine’s memory size, which may or may not be appropriate for your needs. In your case, max-old-space-size seems to be set to ~4GB. Try setting --max-old-space-size to ~80% of your machine’s available memory (expressed in MB).

If that’s not enough and you still face out of memory issues after tuning your Node’s max heap size, can you please provide some context on what your workflow and activities are doing? Worker has a few settings that may affect memory usage of the process, notably maxCachedWorkflows and maxConcurrentActivityTaskExecutions. According to the code you posted, you are currently using the default values. These default values are definitely not appropriate for some use cases, and may therefore need some tuning.


(1) I’m simplifying this a bit. To be exact, the heap can be further subdivided, and there are limits for each component of the heap. However, the Old Space generally represents >90% of the total heap size, and it’s very rare that other limits need to be hand tuned. Just remember however that, since they are other things that need to fit in memory, max-old-space-size can’t be set to 100% of your machine’s available, hence my recommendation to set max-old-space-size to 80% of your machine’s memory.

Thanks for the in-depth write up. I will try to adjust the nodejs heap limits. I guess one question I have about this is: what gets stored on a worker as it processes activities / workflows / updates / etc? I have a long running simulation workflow, which is managing a number ~10 vehicle workflows (think ride-sharing / food delivery type scenario), and is also managing a number of shorter lived request workflows (the riders / food requests). At some point during the simulation, the worker runs out of memory. However, if I restart the worker, the simulation continues. This makes sense to me, because temporal workflows are durable. What I don’t understand is why the worker doesn’t immediately run out of memory again (it instead continues running for a while). What is being stored in the worker that gets cleared when I restart the worker? That seems to indicate something is being stored there that could have been cleared / the worker didn’t have to die? I hope that makes sense. I’m assuming this is something about temporal that I don’t understand. Thanks again for your response above – I really appreciate the detailed write up and the welcome to the community. Please let me know if you have any insight into the behavior I’m observing.

1 Like