So we want to do some simple analytics on historical events. Eg: count the number of events where some key in the output JSON has ‘x’ value.
For Cadence, we tried querying Cassandra directly, but it seems it serialized into a thrift binary object. We were able to find the thrift definitions but not sure of the right data structure. We’re able to query from S3 but since we’ve set up archival with retention of 14 days, the events only arrive in S3 after 14 days.
What’s the best way to do this? decoding the thrift in Cassandra or somehow enabling S3 insertions before the arrival period (with up to 24 hours delay)?