List filter after introducing Elasticsearch visibility store not working on Web, binary or Typescript SDK

I opened a thread here which contains details of initial issues but that title doesn’t make the most sense as we progressed from that point (feel free to take a look at that one and archive it).

Instead I will add a summary here of where we are at. We need to resolve this as its a major blocker for us to implement Temporal without advanced visibility.

This week we attempted to add Elasticsearch as the visibility store (since its recommended here “for any use case that spawns more than a few Workflow Executions.” - surely that would be most people??).

However, we are now hitting the issue where the web UI, tctl or Typescript SDK will not list anything if using a query. The persistence store does work and certain commands work fine with that as I will summarise. So we need to get some ideas on what we are doing wrong with our Elasticsearch setup.

  • We are using the Helm charts to deploy (Temporal server v1.21.1 I believe but the admin output suggests v1.21.2)

  • We are using external Elasticsearch

  • This is the output from tctl admin describe (hidden some details):

{
  "supportedClients": {
    "temporal-cli": "\u003c2.0.0",
    "temporal-go": "\u003c2.0.0",
    "temporal-java": "\u003c2.0.0",
    "temporal-php": "\u003c2.0.0",
    "temporal-server": "\u003c2.0.0",
    "temporal-typescript": "\u003c2.0.0",
    "temporal-ui": "\u003c3.0.0"
  },
  "serverVersion": "1.21.2",
  "membershipInfo": {
    "currentHost": {
      "identity": "***:7233"
    },
    "reachableMembers": [
      "***:6933",
      "***:6935",
      "***:6939",
      "***:6934"
    ],
    "rings": [
      {
        "role": "frontend",
        "memberCount": 1,
        "members": [
          {
            "identity": "***:7233"
          }
        ]
      },
      {
        "role": "history",
        "memberCount": 1,
        "members": [
          {
            "identity": "***:7234"
          }
        ]
      },
      {
        "role": "matching",
        "memberCount": 1,
        "members": [
          {
            "identity": "***:7235"
          }
        ]
      },
      {
        "role": "worker",
        "memberCount": 1,
        "members": [
          {
            "identity": "***:7239"
          }
        ]
      }
    ]
  },
  "clusterId": "f76485bd-2a88-4de0-95fb-f67fffbdd29e",
  "clusterName": "active",
  "historyShardCount": 512,
  "persistenceStore": "postgres",
  "visibilityStore": "elasticsearch",
  "versionInfo": {
    "current": {
      "version": "1.21.2",
      "releaseTime": "2023-07-15T02:00:00Z"
    },
    "recommended": {
      "version": "1.21.2",
      "releaseTime": "2023-07-15T02:00:00Z"
    },
    "alerts": [
      {
        "message": "🪐 A new release is available!",
        "severity": "Low"
      }
    ],
    "lastUpdateTime": "2023-09-15T14:01:30.099277938Z"
  },
  "failoverVersionIncrement": "10",
  "initialFailoverVersion": "1"
}
  • If I do tctl --address <our-temporal-cluster>:7233 --ns workflow-service-local workflow describe --workflow_id onboarding-owrNJQbjO7ogpUCYu1pIO (which is a workflow I definitely started via app code), I get:
{
  "executionConfig": {
    "taskQueue": {
      "name": "onboarding-local",
      "kind": "Normal"
    },
    "defaultWorkflowTaskTimeout": "10s"
  },
  "workflowExecutionInfo": {
    "execution": {
      "workflowId": "onboarding-owrNJQbjO7ogpUCYu1pIO",
      "runId": "2384a4f9-1e13-4ab0-ba6c-4bc6ff39cf29"
    },
    "type": {
      "name": "onboardingWorkflow"
    },
    "startTime": "2023-09-16T09:17:27.287372185Z",
    "closeTime": "2023-09-16T09:18:22.204908200Z",
    "status": "Completed",
    "historyLength": "11",
    "memo": {

    },
    "searchAttributes": {
      "indexedFields": {
        "BuildIds": "[\"unversioned\",\"unversioned:@temporalio/worker@1.8.2+dfc0e48fcf9fef5275a9f0336af1ea3398b7f4246c70877a36520a4013f0861c\"]",
        "FeasibilityCheckId": "[\"feasibility-101\"]"
      }
    },
    "autoResetPoints": {
      "points": [
        {
          "binaryChecksum": "@temporalio/worker@1.8.2+dfc0e48fcf9fef5275a9f0336af1ea3398b7f4246c70877a36520a4013f0861c",
          "runId": "2384a4f9-1e13-4ab0-ba6c-4bc6ff39cf29",
          "firstWorkflowTaskCompletedId": "5",
          "createTime": "2023-09-16T09:17:27.635288581Z",
          "resettable": true
        }
      ]
    },
    "stateTransitionCount": "6"
  },
  "pendingActivities": [
    {
      "activityId": "1",
      "activityType": {
        "name": "placeHolderActivity"
      },
      "state": "Scheduled",
      "attempt": 1,
      "scheduledTime": "2023-09-16T09:18:22.204830005Z",
      "expirationTime": "0001-01-01T00:00:00Z"
    }
  ]
}

If I do tctl --address <our-temporal-cluster>:7233 --ns workflow-service-local workflow show --workflow_id onboarding-owrNJQbjO7ogpUCYu1pIO -r 2384a4f9-1e13-4ab0-ba6c-4bc6ff39cf29 --output_filename myhistory.json, the output of the JSON is:

{
 "events": [
  {
   "eventId": "1",
   "eventTime": "2023-09-16T09:17:27.287372185Z",
   "eventType": "WorkflowExecutionStarted",
   "taskId": "1048576",
   "workflowExecutionStartedEventAttributes": {
    "workflowType": {
     "name": "onboardingWorkflow"
    },
    "taskQueue": {
     "name": "onboarding-local",
     "kind": "Normal"
    },
    "input": {

    },
    "workflowTaskTimeout": "10s",
    "originalExecutionRunId": "2384a4f9-1e13-4ab0-ba6c-4bc6ff39cf29",
    "identity": "158768@L-VKRA2PMX",
    "firstExecutionRunId": "2384a4f9-1e13-4ab0-ba6c-4bc6ff39cf29",
    "attempt": 1,
    "firstWorkflowTaskBackoff": "0s",
    "searchAttributes": {
     "indexedFields": {
      "FeasibilityCheckId": {
       "metadata": {
        "encoding": "anNvbi9wbGFpbg==",
        "type": "S2V5d29yZA=="
       },
       "data": "WyJmZWFzaWJpbGl0eS0xMDEiXQ=="
      }
     }
    },
    "header": {

    }
   }
  },
  {
   "eventId": "2",
   "eventTime": "2023-09-16T09:17:27.287419338Z",
   "eventType": "WorkflowExecutionSignaled",
   "taskId": "1048577",
   "workflowExecutionSignaledEventAttributes": {
    "signalName": "feasibility-check-set",
    "input": {
     "payloads": [
      {
       "metadata": {
        "encoding": "anNvbi9wbGFpbg=="
       },
       "data": "eyJydW5TdGF0dXMiOiJyZXF1ZXN0ZWQifQ=="
      }
     ]
    },
    "identity": "158768@L-VKRA2PMX",
    "header": {

    }
   }
  },
  {
   "eventId": "3",
   "eventTime": "2023-09-16T09:17:27.287422522Z",
   "eventType": "WorkflowTaskScheduled",
   "taskId": "1048578",
   "workflowTaskScheduledEventAttributes": {
    "taskQueue": {
     "name": "onboarding-local",
     "kind": "Normal"
    },
    "startToCloseTimeout": "10s",
    "attempt": 1
   }
  },
  {
   "eventId": "4",
   "eventTime": "2023-09-16T09:17:27.338072835Z",
   "eventType": "WorkflowTaskStarted",
   "taskId": "1048582",
   "workflowTaskStartedEventAttributes": {
    "scheduledEventId": "3",
    "identity": "157044@L-VKRA2PMX",
    "requestId": "5567872e-2db3-4ea5-b5b6-5ffe27e219d1",
    "historySizeBytes": "470"
   }
  },
  {
   "eventId": "5",
   "eventTime": "2023-09-16T09:17:27.635281941Z",
   "eventType": "WorkflowTaskCompleted",
   "taskId": "1048586",
   "workflowTaskCompletedEventAttributes": {
    "scheduledEventId": "3",
    "startedEventId": "4",
    "identity": "157044@L-VKRA2PMX",
    "workerVersioningId": {
     "workerBuildId": "@temporalio/worker@1.8.2+dfc0e48fcf9fef5275a9f0336af1ea3398b7f4246c70877a36520a4013f0861c"
    },
    "sdkMetadata": {
     "coreUsedFlags": [
      2,
      1
     ]
    },
    "meteringMetadata": {

    }
   }
  },
  {
   "eventId": "6",
   "eventTime": "2023-09-16T09:18:22.099776704Z",
   "eventType": "WorkflowExecutionSignaled",
   "taskId": "1048589",
   "workflowExecutionSignaledEventAttributes": {
    "signalName": "feasibility-check-set",
    "input": {
     "payloads": [
      {
       "metadata": {
        "encoding": "anNvbi9wbGFpbg=="
       },
       "data": "eyJydW5TdGF0dXMiOiJjb21wbGV0ZWQifQ=="
      }
     ]
    },
    "identity": "158943@L-VKRA2PMX",
    "header": {

    }
   }
  },
  {
   "eventId": "7",
   "eventTime": "2023-09-16T09:18:22.099782291Z",
   "eventType": "WorkflowTaskScheduled",
   "taskId": "1048590",
   "workflowTaskScheduledEventAttributes": {
    "taskQueue": {
     "name": "157044@L-VKRA2PMX-05683f1fc9044b7c95fa277973476379",
     "kind": "Sticky"
    },
    "startToCloseTimeout": "10s",
    "attempt": 1
   }
  },
  {
   "eventId": "8",
   "eventTime": "2023-09-16T09:18:22.123643347Z",
   "eventType": "WorkflowTaskStarted",
   "taskId": "1048594",
   "workflowTaskStartedEventAttributes": {
    "scheduledEventId": "7",
    "identity": "157044@L-VKRA2PMX",
    "requestId": "a29409b9-7eb0-43e4-8b7a-de4ec0be793a",
    "historySizeBytes": "939"
   }
  },
  {
   "eventId": "9",
   "eventTime": "2023-09-16T09:18:22.204748365Z",
   "eventType": "WorkflowTaskCompleted",
   "taskId": "1048598",
   "workflowTaskCompletedEventAttributes": {
    "scheduledEventId": "7",
    "startedEventId": "8",
    "identity": "157044@L-VKRA2PMX",
    "workerVersioningId": {
     "workerBuildId": "@temporalio/worker@1.8.2+dfc0e48fcf9fef5275a9f0336af1ea3398b7f4246c70877a36520a4013f0861c"
    },
    "sdkMetadata": {

    },
    "meteringMetadata": {

    }
   }
  },
  {
   "eventId": "10",
   "eventTime": "2023-09-16T09:18:22.204830005Z",
   "eventType": "ActivityTaskScheduled",
   "taskId": "1048599",
   "activityTaskScheduledEventAttributes": {
    "activityId": "1",
    "activityType": {
     "name": "placeHolderActivity"
    },
    "taskQueue": {
     "name": "onboarding-local",
     "kind": "Normal"
    },
    "header": {

    },
    "scheduleToCloseTimeout": "0s",
    "scheduleToStartTimeout": "0s",
    "startToCloseTimeout": "7200s",
    "heartbeatTimeout": "0s",
    "workflowTaskCompletedEventId": "9",
    "retryPolicy": {
     "initialInterval": "1s",
     "backoffCoefficient": 1,
     "maximumInterval": "100s"
    }
   }
  },
  {
   "eventId": "11",
   "eventTime": "2023-09-16T09:18:22.204908200Z",
   "eventType": "WorkflowExecutionCompleted",
   "taskId": "1048600",
   "workflowExecutionCompletedEventAttributes": {
    "result": {
     "payloads": [
      {
       "metadata": {
        "encoding": "YmluYXJ5L251bGw="
       }
      }
     ]
    },
    "workflowTaskCompletedEventId": "9"
   }
  }
 ]
}
  • In the app code, I can start the workflow fine, and then if I explicitly use the workflow ID, I can get the handle and signal to complete the workflow etc.

  • However, no use of list filter via a query will work. The web ui just shows empty. I have checked the console and no errors sticking out to me - I can see the query string being added as expected. It worked fine before when we didnt have a compatible visibility store added.

  • Similarly if I do tctl --address <our-temporal-cluster>:7233 --ns workflow-service-local workflow l -q "ExecutionStatus='Running'" or for ExecutionStatus when closed or using a custom search attribute - it always returns an empty list.

  • I looked at logs from frontend and matching pods but couldn’t see any errors or relevant errors. One error I did spot in the history logs is:

{"level":"error","ts":"2023-09-15T14:00:54.689Z","msg":"Unable to process new range","shard-id":81,"address":"172.16.65.134:7234","component":"timer-queue-processor","error":"shard status unknown","logging-call-at":"queue_base.go:316","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/service/history/queues.(*queueBase).processNewRange\n\t/home/builder/temporal/service/history/queues/queue_base.go:316\ngo.temporal.io/server/service/history/queues.(*scheduledQueue).processEventLoop\n\t/home/builder/temporal/service/history/queues/queue_scheduled.go:218"}
  • The issue is definitely in some Elasticsearch configuration or setup we have done but right now confused as to what it could be. Is there anything else we can look at or that we can provide to help debug? This is a big blocker for us as we need to make sure we have advanced visibility (Im guessing Elasticsearch is the way to go?). Any help would be greatly appreciated.

@tihomir Sorry to @ you in the thread but seen you have answered some similar stuff before. Any ideas on where we could look to debug here would be appreciated. Slightly blocked at the moment in implementing Temporal in our new service with advanced visibility (which is a must for us).

We may attempt to use a supported non-Elasticsearch option otherwise and see if that works…

Can you show your generated static config? Bash into one of your service host pods (like frontend, matching, history) and share content of your /etc/temporal/config/docker.yaml.

log:
  stdout: true
  level: "debug,info"

persistence:
  defaultStore: default
  visibilityStore: visibility
  numHistoryShards: 512
  datastores:
    default:
      sql:
        pluginName: "postgres"
        driverName: "postgres"
        databaseName: "temporal"
        connectAddr: 
        connectProtocol: "tcp"
        user: 
        password: 
        maxConnLifetime: 1h
        maxConns: 20
        secretName: ""
    visibility:
      sql:
        pluginName: "postgres"
        driverName: "postgres"
        databaseName: "temporal_visibility"
        connectAddr: ""
        connectProtocol: "tcp"
        user: ""
        password: ""
        maxConnLifetime: 1h
        maxConns: 20
        secretName: ""

global:
  membership:
    name: temporal
    maxJoinDuration: 30s
    broadcastAddress: 

  pprof:
    port: 7936

  metrics:
    tags:
      type: history
    prometheus:
      timerType: histogram
      listenAddress: "0.0.0.0:9090"

services:
  frontend:
    rpc:
      grpcPort: 7233
      membershipPort: 6933
      bindOnIP: "0.0.0.0"

  history:
    rpc:
      grpcPort: 7234
      membershipPort: 6934
      bindOnIP: "0.0.0.0"

  matching:
    rpc:
      grpcPort: 7235
      membershipPort: 6935
      bindOnIP: "0.0.0.0"

  worker:
    rpc:
      grpcPort: 7239
      membershipPort: 6939
      bindOnIP: "0.0.0.0"
clusterMetadata:
  enableGlobalDomain: false
  failoverVersionIncrement: 10
  masterClusterName: "active"
  currentClusterName: "active"
  clusterInformation:
    active:
      enabled: true
      initialFailoverVersion: 1
      rpcName: "temporal-frontend"
      rpcAddress: ""
dcRedirectionPolicy:
  policy: "noop"
  toDC: ""
archival:
  status: "disabled"

publicClient:
  hostPort: "temporal-frontend:7233"

dynamicConfigClient:
  filepath: "/etc/temporal/dynamic_config/dynamic_config.yaml"
  pollInterval: "10s"

In addition we did try changing visibility pluginName to postgres12 or whatever we saw in other posts and in some release notes. Note - I have removed some addresses and username/passwords

From your config:

datastores:
    visibility:
      sql:

You seem not to use ES here but define sql based visibility. And yes for this to work would need to set pluginName to postgres12, something like:

persistence:                                                
  defaultStore: default                        
  visibilityStore: visibility                  
  datastores:                                  
  default:                                 
    sql:                                   
      pluginName: "postgres12"             
     ...                     
  visibility:                              
    sql:                                                       
    pluginName: "postgres12"           
    databaseName: "temporal_visibility"
    ...

So we attempted with ES before but failed and so we switched to attempting with postgres, hence why the config is in this state now.

Ok so pluginName needs to be set on default (persistence) and visibility. We will try again and I will update on progress.


Tried the above but we still get Select failed: pq: relation "executions_visibility" does not exist

Im guessing the “executions_visibility” table is missing. Will try and see what we can do about that

@tihomir Followed another forum post which has the “executions_visibility” table missing. We redeployed everything from scracth and seems to be working now. Need to test it more thoroughly. Is there anything here I could help to contribute to in terms of docs? This was painful to setup using Helm charts

Is there a place where we can see the full schema for persistence and persistence store so we can see what tables are meant to be created or not?

Temporal Cluster deployment guide | Temporal Documentation points to the auto-setup.sh but before I go look around, wondered if it was documented?

For anyone checking in future, the schema was here: https://github.com/temporalio/temporal/tree/main/schema/postgresql/v12

To help others in future, I will try to document the steps tomorrow for how we got this working with Helm charts. We left the Elasticsearch angle unfortunately and not entirely sure if that would work quite the same way as this as we were having additional problems with that. Postgres is fine for us at the moment.

 2122  ./temporal-sql-tool --pl postgres12 --db temporal_visibility create
 2127  ./temporal-sql-tool  --db temporal_visibility setup-schema -v 0.0
 2128  ./temporal-sql-tool --pl postgres12 --db temporal_visibility update-schema -d ./schema/postgresql/v12/visibility/versioned

 2129  ./temporal-sql-tool --database temporal create-database
 2130  ./temporal-sql-tool --db temporal setup-schema -v 0.0
 2131  ./temporal-sql-tool --pl postgres12  --db temporal update-schema -d schema/postgresql/v12/temporal/versioned

Ran these steps and made sure its all in line with schemas and seems to be working for us.

As for Elasticsearch, we gave up and left it for now.