How to best store and access most recent data?

nwoolmer · August 30, 2024, 11:46am

I think the suggestion is the following.

Create a separate table, WAL, with a dummy timestamp. Set this dummy timestamp as the designated timestamp. Store your ‘true’ timestamp in a separate column. Set deduplication keys on dummy timestamp and sensor_id (or similar key).

Before dropping an old partition, use INSERT INTO SELECT to copy the latest row for each sensor in this partition into the second table. These will be deduplicated automatically, and deduplicated inserts are quite efficient, versus UPDATE.

Then drop the partition you no longer require.

When you need to query the data, simply query both tables and combine the result based on the latest timestamp (i.e using max()). We can help you prepare the correct queries to do this.

We will shortly be releasing deduplication for variable sized types. This will allow you to user a VARCHAR for sensor_id instead of a SYMBOL or LONG, should you need to.

We are also WIP for materialised views, which I think are on the roadmap for Q4. This will resolve your problem long term, and remove these manual steps. You would simply define a view based on the LATEST BY query, and it will maintain this for you.