Hi, I am new to Temporal and trying to learn various concepts by doing a proof of concept. I was wondering about queries in workflows and where they would be useful. The only use case I could think of is to expose some internal state of workflow for debugging purposes. I am not feeling that confident about exposing some data via workflow queries that will be consumed routinely by other workflows. The reason is that it feels like temporal would have to load the workflow instance to execute the query method/function that produces query output. This adds unnecessary load to worker cluster. I would rather provide an internal API deployed outside the worker cluster and scale it separately.
Am I thinking about queries correctly or am I missing something? Are there any other use cases of queries except debugging that I am not thinking about?
Apologies if this has been discussed before. I tried searching about this in the forum but couldn’t find anything related.
There can be many use cases for queries, this post shows how they could be used in a “human in the loop” scenario.
Uses cases revolve around being able to get/poll the current wf state, doing that with queries is fully consistent (relies on primary temporal db).
Using visibility queries, for example querying Elasticsearch directly or using ListWorkflowExecutions api is eventually consistent.
Thanks for your reply. Is there documentation around when temporal fetches the results of query and where are they persisted? I did a quick test and it seems that if the worker cluster is not running, the UI isn’t able to execute any queries on a running workflow. That lead me to assume that a worker cluster must be running for workflow to serve queries and its results aren’t persisted anywhere.
In “human in the loop” scenario, it may be ok to query the workflow directly as querying the workflow isn’t frequent. However, in cases where there might be 10s of 1000s other workflows that may be interested in a data produced by a workflow, would it be advisable to store that data outside the workflow/temporal into a data store that scales separately?
when temporal fetches the results of query; where are they persisted
Querying is a client request, like starting, canceling, terminating, signaling a workflow. It should be done when your workers pick up its request from a task queue, and process it. To process a query a worker replays the workflow history putting your workflow state to the point of when the query is processed, and returns the requested workflow state data. Yes temporal cluster and your client workers have to be up to process a query. Note you can also query closed workflow executions (up to the set retention period).
would it be advisable to store that data outside the workflow/temporal into a data store that scales separately
Temporal workflow history and visibility data are available during workflow execution, and up to defined retention period (1-30 days) once the workflow execution is closed. If you need to persist this data longer you would need to look into the archive feature.
The use case depends imo on the need of fully consistent or eventually consistent data. Since queries rely on execution history, they could have possible impact on performance when querying workflows that have very large histories.
You define workflow queries in your workflow code. Temporal does not expose workflow queries unless you define them.
ok… thanks for clarifying