ElasticSearch not exactly matching workflow ids

Hey all – I’ve been working on setting up ElasticSearch and I’ve hit an issue that I’m not sure is a bug or the intended behavior. I’ve noticed that the visibility APIs (such as ListOpenWorkflowExecutionsByWorkflowID) will generate an ES query using “match” when adding the WorkflowID constraint to the query. This results in non-exact search results, and you can see it by using the web UI to search for an exact workflow ID and you get much more that you asked for. I suspect this is not the desired behavior with the ES visibility store since it seems that the SQL visibility store uses the equals operator when adding the workflow ID to the query (exact match). Note this also causes the “visibility” canary to fail since the result of its call to ListOpenWorkflowExecutionsByWorkflowID searching for its own workflow ID is often wrong.

Assuming the Temporal code is correct, then there must be some special configuration on the ElasticSearch index mapping to make those “match” queries work as if they were exact queries (normally you’d use a “term” query for this). When you first set up ElasticSearch with Temporal, you wind up with a default mapping (inferred by the contents of the document?) that looks like this for workflow id:

"WorkflowId": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

Do I need to manually change this mapping definition for WorkflowId to something that will result in an exact match? If so, to what? Thanks!

2 Likes

@skrul,
Thanks for reporting this issue. I need to try to reproduce it on our end to understand the problem better. I will try to respond back in couple of days when I have a repro.

1 Like

Hey @skrul,
Looks like I need more information in order for me to reproduce this. I tried the following scenario.

Let’s say if you get the following output from list workflow executions:

tctl wf list
       WORKFLOW TYPE       |                      WORKFLOW ID                       |                RUN ID                |    TASK QUEUE     | START TIME | EXECUTION TIME | END TIME
  SearchAttributesWorkflow | search_attributes_a0acbc16-b84c-4692-8612-d44af09310dc | 5f89941a-1484-4918-9e9f-e8cf6e98f333 | search-attributes | 16:26:09   | 16:26:09       | 16:26:12

If you run the following query to list workflow matching one workflow id I get one row back:

tctl wf list -w search_attributes_a0acbc16-b84c-4692-8612-d44af09310dc

       WORKFLOW TYPE       |                      WORKFLOW ID                       |                RUN ID                |    TASK QUEUE     | START TIME | EXECUTION TIME | END TIME
  SearchAttributesWorkflow | search_attributes_a0acbc16-b84c-4692-8612-d44af09310dc | 5f89941a-1484-4918-9e9f-e8cf6e98f333 | search-attributes | 16:26:09   | 16:26:09       | 16:26:12

Now if I run a query which matches just a substring of workflow id then it does not return anything:

tctl wf list -w search_attributes
  WORKFLOW TYPE | WORKFLOW ID | RUN ID | TASK QUEUE | START TIME | EXECUTION TIME | END TIME

Another thing I would like to point out is API like ListOpenWorkflowExecutionsByWorkflowID is mainly used when you are running with Cassandra/mysql based visibility stores. If you are running with Enhanced Visibility backed by MySQL I would rather use ListWorkflowExecution API which is much more powerful and supports SQL like syntax:

tctl wf list -q 'WorkflowId = "search_attributes_a0acbc16-b84c-4692-8612-d44af09310dc"'
       WORKFLOW TYPE       |                      WORKFLOW ID                       |                RUN ID                |    TASK QUEUE     | START TIME | EXECUTION TIME | END TIME
  SearchAttributesWorkflow | search_attributes_a0acbc16-b84c-4692-8612-d44af09310dc | 5f89941a-1484-4918-9e9f-e8cf6e98f333 | search-attributes | 16:26:09   | 16:26:09       | 16:26:12

I’m not even aware of a mechanism where we allow wild card queries. It would be great if you can provide some examples.