How is the task & task queue range_hash calculated?

Hi, Temporal community-

I’m currently using an un-sharded Vitess (MySQL) for my persistence store, and I’ve been looking at horizontal scalability via sharding there.

Just to fact-check my understanding - I understand range_id to be a monotonically increasing number per shard that protects against multiple hosts processing a shard at once. My understanding is primarily coming from this 2017 Cadence meetup video: Cadence Meetup: Cadence Architecture - YouTube.

But that video didn’t explain what a range_hash is and how that relates to range_id (if it does), or other entities such as namespaces, workflows, tasks, and queues.

In the context of sharding tables based on the first column in their composite primary key, is there an explanation of how range_hash is calculated and how it is used by tasks & task queues?

Thank you so much!

range_hash is not related to shard range_id. It is a hash calculated from namespace + taskqueue_name + taskqueue_type.

Thank you @Yimin_Chen , that makes a lot of sense, I figured it was something more stable than range_id which could change often.

I have a few follow-up questions about task_queue_id and how it relates. My assumption is that each partition in a task queue has a different task_queue_id. Do task queue partitions or task_queue_id affect range_hash?

That is right. Each partition has different task_queue_id. The range_hash is the hash value of task_queue_id.

Great, thank you for your time @Yimin_Chen !