Whats the best way to do simple analytics on historical data

So we want to do some simple analytics on historical events. Eg: count the number of events where some key in the output JSON has ‘x’ value.

For Cadence, we tried querying Cassandra directly, but it seems it serialized into a thrift binary object. We were able to find the thrift definitions but not sure of the right data structure. We’re able to query from S3 but since we’ve set up archival with retention of 14 days, the events only arrive in S3 after 14 days.

What’s the best way to do this? decoding the thrift in Cassandra or somehow enabling S3 insertions before the arrival period (with up to 24 hours delay)?

1 Like

The server itself should expose an API which fetch the workflow history from either DB (Cassandra / MySQL, etc) or archive (S3, etc)

I would suggest creating an issue to cadence

1 Like

Yes, an API would be great. I’ve created an issue - API to query Historical data · Issue #3874 · uber/cadence · GitHub