WorkflowNotFoundException is thrown even thou it exist

I’m not sure whats going on. When I run the workflow and try to signal it via

  final DslWorkflow interpreter =
        workflowClient.newWorkflowStub(DslWorkflow.class, workflowId.toString());

This will throw a

"io.temporal.client.WorkflowNotFoundException: workflowId='0ce55361-1f23-4954-8090-10bede3b8258', runId='', workflowType='DslWorkflow'}
	at io.temporal.internal.sync.WorkflowStubImpl.signal(WorkflowStubImpl.java:98)
	at io.temporal.internal.sync.WorkflowInvocationHandler$SyncWorkflowInvocationHandler.signalWorkflow(WorkflowInvocationHandler.java:297)
	at io.temporal.internal.sync.WorkflowInvocationHandler$SyncWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:274)
	at io.temporal.internal.sync.WorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:178)
	at com.sun.proxy.$Proxy105.callback(Unknown Source)

I then verify that the workflow does exist with tctl describe

{
  "executionConfig": {
    "taskQueue": {
      "name": "CI_CD_WORKFLOW_TASK_QUEUE",
      "kind": "Normal"
    },
    "workflowExecutionTimeout": "8400s",
    "workflowRunTimeout": "8400s",
    "defaultWorkflowTaskTimeout": "10s"
  },
  "workflowExecutionInfo": {
    "execution": {
      "workflowId": "0ce55361-1f23-4954-8090-10bede3b8258",
      "runId": "5f91cd0f-c9b2-4435-83e2-d78d77d054a4"
    },
    "type": {
      "name": "DslWorkflow"
    },
    "startTime": "2021-11-15T22:57:33.928437239Z",
    "closeTime": "2021-11-16T01:17:33.931636812Z",
    "status": "TimedOut",
    "historyLength": "10",
    "memo": {

    },
    "autoResetPoints": {

    },
    "stateTransitionCount": "10"
  },
  "pendingActivities": [
    {
      "activityId": "af0cd598-069f-3546-bc54-1fa0bd2738b5",
      "activityType": {
        "name": "ActivityStart"
      },
      "state": "Started",
      "lastHeartbeatTime": "2021-11-16T00:57:35.065434434Z",
      "lastStartedTime": "2021-11-16T00:57:35.065434434Z",
      "attempt": 2,
      "maximumAttempts": 2,
      "expirationTime": "2021-11-16T01:17:34.010943982Z",
      "lastFailure": {
        "message": "activity timeout",
        "source": "Server",
        "failureType": "Failure_TimeoutFailureInfo: StartToClose"
      },
      "lastWorkerIdentity": "6@ci-cd-orchestrator-74559dd8bd-fkpcs"
    }
  ]
}

So i’m a bit confused on why it can’t find it? The main reason for the timeout is because it never received the signal. But the code which is used to signal got an exception that the workflow can’t be found.

Here is the history

1  WorkflowExecutionStarted   {WorkflowType:{Name:DslWorkflow}, ParentInitiatedEventId:0, TaskQueue:{Name:CI_CD_WORKFLOW_TASK_QUEUE,
                                 Kind:Normal},                                  WorkflowExecutionTimeout:2h20m0s, WorkflowRunTimeout:2h20m0s, WorkflowTaskTimeout:10s,
                                 Initiator:Unspecified, OriginalExecutionRunId:5f91cd0f-c9b2-4435-83e2-d78d77d054a4,
                                 Identity:6@ci-cd-orchestrator-74559dd8bd-wsqmk, FirstExecutionRunId:5f91cd0f-c9b2-4435-83e2-d78d77d054a4, Attempt:1,
                                 WorkflowExecutionExpirationTime:2021-11-16 01:17:33.928 +0000 UTC, FirstWorkflowTaskBackoff:0s}
   2  WorkflowTaskScheduled      {TaskQueue:{Name:CI_CD_WORKFLOW_TASK_QUEUE,
                                 Kind:Normal}, StartToCloseTimeout:10s,
                                 Attempt:1}
   3  WorkflowTaskStarted        {ScheduledEventId:2,
                                 RequestId:ad46cc09-a038-4caa-902a-34c338c63b57}
   4  WorkflowTaskCompleted      {ScheduledEventId:2, StartedEventId:3,
                                }
   5  ActivityTaskScheduled      {ActivityId:af0cd598-069f-3546-bc54-1fa0bd2738b5, ActivityType:{Name:ActivityStart},
                                 TaskQueue:{Name:CI_CD_WORKFLOW_TASK_QUEUE, Kind:Normal},
                                 Input:
                                 ScheduleToCloseTimeout:2h20m0s, ScheduleToStartTimeout:2h20m0s,
                                 StartToCloseTimeout:2h0m0s, HeartbeatTimeout:0s, WorkflowTaskCompletedEventId:4, RetryPolicy:{InitialInterval:1s,
                                 BackoffCoefficient:2, MaximumInterval:1m40s, MaximumAttempts:2, NonRetryableErrorTypes:[]}}
   6  WorkflowExecutionSignaled  {SignalName:updateMetaData,
                                 Input:
   7  WorkflowTaskScheduled      {TaskQueue:,
                                 Kind:Sticky}, StartToCloseTimeout:10s, Attempt:1}
   8  WorkflowTaskStarted        {ScheduledEventId:7,
                                 Identity:be86960c-96c2-46b0-89ae-d7c5adc5979d,
                                 RequestId:ac85efa1-1fac-4fde-a3e1-c79cc8b0e681}
   9  WorkflowTaskCompleted      {ScheduledEventId:7, StartedEventId:8,
                                 Identity:6@ci-cd-orchestrator-74559dd8bd-wsqmk}
  10  WorkflowExecutionTimedOut  {RetryState:Timeout}

I took out some of the input/outputs. Note: line 6 is a signal done from within one of the activities executed in the workflow. The code trying to signal the workflow from outside the workflow/activity is the one giving the exception

Are you 100% sure that the same namespace is used when signaling from outside?

I initialize a singleton WorkflowClient which is used in the worker and is mapped to one namespace. The strange part is if the code was pointing to the wrong namespace, the signal method would always fail. But in my case, i’m seeing it pass and seeing it fail at times.

The request that comes into the service that runs that worker doesn’t contain the namespace. The namespace is injected by the service so it’s always fixed.

I can print out the namespace of the WorkflowClient and if it happens again I’ll know for sure, but not sure if that is the issue.

Is there any other things I can look into?

Can WorkflowNotFoundException be thrown when you signal a workflow that is canceled, completed or timedout?

The reason why i’m asking is because when i use tctl to signal it I notice the code being NotFound.

Error Details: rpc error: code = NotFound desc = workflow execution already completed

Yes, WorkflowNotFoundException is thrown if there is no open workflow with that ID.