Version 1.20.1 had slow query in temporal_visibility

After upgrading temporal to version 1.20.1, I found that there was a slow log when querying temporal_visibility.
This SQL statement was obtained from “show processlist”.

SELECT ev.namespace_id, ev.run_id, ev.workflow_type_name, ev.workflow_id, ev.start_time, ev.execution_time, ev.status, ev.close_time, ev.history_length, ev.memo, ev.encoding, ev.task_queue, ev.search_attributes FROM executions_visibility ev LEFT JOIN custom_search_attributes USING (namespace_id, run_id) WHERE namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb' AND TemporalNamespaceDivision is null ORDER BY coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) DESC, start_time DESC, run_id LIMIT 1000

There are approximately 3 million rows of data under this namespace.

And the result of explain:

mysql> explain SELECT ev.namespace_id, ev.run_id, ev.workflow_type_name, ev.workflow_id, ev.start_time, ev.execution_time, ev.status, ev.close_time, ev.history_length, ev.memo, ev.encoding, ev.task_queue, ev.search_attributes FROM executions_visibility ev LEFT JOIN custom_search_attributes USING (namespace_id, run_id) WHERE namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb' AND TemporalNamespaceDivision is null ORDER BY coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) DESC, start_time DESC, run_id LIMIT 1000\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ev
   partitions: NULL
         type: ref
possible_keys: PRIMARY,default_idx,by_execution_time,by_workflow_id,by_workflow_type,by_status,by_history_length,by_task_queue,by_temporal_change_version,by_binary_checksums,by_batcher_user,by_temporal_scheduled_start_time,by_temporal_scheduled_by_id,by_temporal_schedule_paused,by_temporal_namespace_division
          key: PRIMARY
      key_len: 256
          ref: const
         rows: 1549921
     filtered: 10.00
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: custom_search_attributes
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY,by_bool_01,by_bool_02,by_bool_03,by_datetime_01,by_datetime_02,by_datetime_03,by_double_01,by_double_02,by_double_03,by_int_01,by_int_02,by_int_03,by_keyword_01,by_keyword_02,by_keyword_03,by_keyword_04,by_keyword_05,by_keyword_06,by_keyword_07,by_keyword_08,by_keyword_09,by_keyword_10,by_keyword_list_01,by_keyword_list_02,by_keyword_list_03
          key: PRIMARY
      key_len: 512
          ref: const,temporal_visibility.ev.run_id
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index
2 rows in set, 1 warning (0.01 sec)

And the result of explain analyze:

mysql> explain analyze SELECT ev.namespace_id, ev.run_id, ev.workflow_type_name, ev.workflow_id, ev.start_time, ev.execution_time, ev.status, ev.close_time, ev.history_length, ev.memo, ev.encoding, ev.task_queue, ev.search_attributes FROM executions_visibility ev LEFT JOIN custom_search_attributes USING (namespace_id, run_id) WHERE namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb' AND TemporalNamespaceDivision is null ORDER BY coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) DESC, start_time DESC, run_id LIMIT 1000\G;
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1000 row(s)  (cost=210064.29 rows=1000) (actual time=20665.083..20675.554 rows=1000 loops=1)
    -> Nested loop left join  (cost=210064.29 rows=1549921) (actual time=20665.081..20675.468 rows=1000 loops=1)
        -> Sort: coalesce(close_time,cast(_utf8mb4'9999-12-31 23:59:59' as datetime)) DESC, ev.start_time DESC, ev.run_id  (cost=16324.17 rows=1549921) (actual time=20665.033..20665.540 rows=1000 loops=1)
            -> Filter: ((ev.namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb') and (ev.TemporalNamespaceDivision is null))  (actual time=0.038..8314.438 rows=3205388 loops=1)
                -> Index lookup on ev using PRIMARY (namespace_id='f458436c-e0b6-47d6-ac5d-a633ef7d84bb')  (actual time=0.035..6668.868 rows=3205388 loops=1)
        -> Filter: (custom_search_attributes.namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb')  (cost=0.25 rows=1) (actual time=0.010..0.010 rows=0 loops=1000)
            -> Single-row covering index lookup on custom_search_attributes using PRIMARY (namespace_id='f458436c-e0b6-47d6-ac5d-a633ef7d84bb', run_id=ev.run_id)  (cost=0.25 rows=1) (actual time=0.010..0.010 rows=0 loops=1000)

1 row in set (20.96 sec)

It took 20 seconds to execute the SQL.

If force the use of the default_idx, the query speed will become much faster.

mysql> explain analyze SELECT ev.namespace_id, ev.run_id, ev.workflow_type_name, ev.workflow_id, ev.start_time, ev.execution_time, ev.status, ev.close_time, ev.history_length, ev.memo, ev.encoding, ev.task_queue, ev.search_attributes FROM executions_visibility ev USE INDEX(default_idx) LEFT JOIN custom_search_attributes USING (namespace_id, run_id) WHERE namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb' AND TemporalNamespaceDivision is null ORDER BY coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) DESC, start_time DESC, run_id LIMIT 1000\G;
*************************** 1. row ***************************
EXPLAIN: -> Limit: 1000 row(s)  (cost=115154.45 rows=1000) (actual time=0.073..18.884 rows=1000 loops=1)
    -> Nested loop left join  (cost=115154.45 rows=154992) (actual time=0.072..18.796 rows=1000 loops=1)
        -> Filter: ((ev.namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb') and (ev.TemporalNamespaceDivision is null))  (cost=60907.21 rows=154992) (actual time=0.057..12.063 rows=1000 loops=1)
            -> Index lookup on ev using default_idx (namespace_id='f458436c-e0b6-47d6-ac5d-a633ef7d84bb')  (cost=60907.21 rows=1549921) (actual time=0.054..11.499 rows=1000 loops=1)
        -> Filter: (custom_search_attributes.namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb')  (cost=0.25 rows=1) (actual time=0.007..0.007 rows=0 loops=1000)
            -> Single-row covering index lookup on custom_search_attributes using PRIMARY (namespace_id='f458436c-e0b6-47d6-ac5d-a633ef7d84bb', run_id=ev.run_id)  (cost=0.25 rows=1) (actual time=0.006..0.006 rows=0 loops=1000)

1 row in set (0.02 sec)


mysql> explain SELECT ev.namespace_id, ev.run_id, ev.workflow_type_name, ev.workflow_id, ev.start_time, ev.execution_time, ev.status, ev.close_time, ev.history_length, ev.memo, ev.encoding, ev.task_queue, ev.search_attributes FROM executions_visibility ev USE INDEX(default_idx) LEFT JOIN custom_search_attributes USING (namespace_id, run_id) WHERE namespace_id = 'f458436c-e0b6-47d6-ac5d-a633ef7d84bb' AND TemporalNamespaceDivision is null ORDER BY coalesce(close_time, cast('9999-12-31 23:59:59' as datetime)) DESC, start_time DESC, run_id LIMIT 1000\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ev
   partitions: NULL
         type: ref
possible_keys: default_idx
          key: default_idx
      key_len: 256
          ref: const
         rows: 1549921
     filtered: 10.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: custom_search_attributes
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY,by_bool_01,by_bool_02,by_bool_03,by_datetime_01,by_datetime_02,by_datetime_03,by_double_01,by_double_02,by_double_03,by_int_01,by_int_02,by_int_03,by_keyword_01,by_keyword_02,by_keyword_03,by_keyword_04,by_keyword_05,by_keyword_06,by_keyword_07,by_keyword_08,by_keyword_09,by_keyword_10,by_keyword_list_01,by_keyword_list_02,by_keyword_list_03
          key: PRIMARY
      key_len: 512
          ref: const,temporal_visibility.ev.run_id
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index
2 rows in set, 1 warning (0.01 sec)

Currently, I have to downgrade back to version 1.18.5 to ensure that the server runs smoothly.

@rj.huangsir Thanks for reporting this issue.

I’m investigating this, and it seems that MySQL is not smart enough to match with the correct index.
In your test, you used default_idx to make it faster, but there’s actually an exact match for the search you’re doing using by_temporal_namespace_division.

I don’t know why MySQL can’t find the right index. I’ll investigate more, and see what we can do to make this better.

@rj.huangsir I opened a ticket with MySQL and they were able to confirm it’s a optimization issue on their side, but it seems it’s not high severity issue for them: https://bugs.mysql.com/bug.php?id=111037

We’ll keep looking into this if we can go around this somehow. However, I wanted to point out that this is not an issue in PostgreSQL nor SQLite. Also, if performance is concern for you, I’d recommend using Elasticsearch for advanced visibility.