Indexing workflows for search / filtering

This is not quite a support topic. I just wanted to bounce some ideas around, and figured others may have similar questions or ideas.

To search / filter workflows, the current recommendation if I understand correctly is to use a Temporal set-up with Kafka, and Elasticsearch.

This makes sense if you are working with high traffic applications considering both Kafka, and Elasticsearch were built for scale.

Now, for simpler use-cases, and/or lower traffic applications, this feels a bit overkill. Both Kafka, and Elasticsearch are not readily available in all cloud providers, and they can be quite costly to run (with or without a managed service provider).

There seems to be a few longer term options:

  • Possibly removing Kafka (https://github.com/temporalio/temporal/pull/295)
  • Support searching / filtering (search attributes) with the SQL persistence as well (Postgresql can handle this quite well)
  • Similar to the previous point, supporting more search engines (e.g. MeiliSearch)

And for the short-term, it seems like the suggestion from what I have seen is to essentially index the workflow through activities. i.e. Making it a part of your domain or building a dedicated service to capture / index workflows.

Arguably, baking workflows into a specific domain however, does not always feel like the right solution. Workflows can arguably cross various domain boundaries, and I feel the beauty of it is that you can focus on the overall / high-level business logic (orchestration logic).
And building a dedicated service to capture workflows feels like boilerplate which a workflow engine like Temporal aims to reduce.

I’d be interested in knowing what y’all think, and how others are approaching this.

In hindsight, this may have been better posted at https://community.temporal.io/c/discuss/7

Moved to discuss category.

We do plant implement the following feature requests at some point:

  • Remove Kafka
  • Support advanced search attributes with MySQL/PostgreSQL.

The open question is what should we do in the future with workflow indexing.
The current approach is that Temporal has its own SQL like predicate parser and converts queries to a technology-specific format. Initially, it was implemented as ElasticSearch didn’t provide an open source SQL parser support.
I’m not sure if we should stay with this approach and extend the parser every time we get a new storage engine. This would allow keeping predicates technology independent. At the same time, it is kind of limiting as it would reduce all the existing indexing engines to the smallest common denominator.

Another option is to perform only the ingestion part and let everyone query the data store directly. But it has all sorts of complexities, especially around multi-tenant clusters. For example, the predicate parser adds namespace to every query predicate automatically.

1 Like