Periodic snapshot of "cache" table - Best Approach?

Hello,

I recently received some excellent advice for caching the latest data to a separate table, to make pulling the most recent values for every entity in my database much more efficient.

I now have a need to keep a montly “snapshot” of that cache table, which will make looking back at historical values from any period in time much more efficient.

What would be the best way to achieve this?

  • A monthly cron job that does a table COPY and names the new table based on the date? I assume with this approach that when I wanted to query “latest” values at a particular point in time, I’d have to find the appropriate snapshot table. At least querying LATEST ON would have to seach back no more than a month (rather than potentially many months or years to find the latest values).

  • A monthly job that inserts everything from the current/snapshot table into another single table that partitions by month? Would that work, though, if some of the sensor values have timestamps much older than the current month? Presumably the older sensor values would be relegated to older partitions and a query to look for the latest values at that point in time would still have to search back through older partitions? It seems to be this approach would be no different/no better than downsampling the original table that contains all sensor values going back for all time.

Happy to provide more explanation if required.
Thanks.

Hi @leem,

Glad you have made progress! Fwiw, one of the enhancements following the Materialised Views release will be LATEST BY support. It may be that this could then be combined with another materialised view to keep the monthly snapshots. Watch this space!

Your snapshots can be done in another table, just one. You will need to use a dummy designated timestamp again, with dates set within that month.

You could try to set the dummy timestamp in such a way that the ordering is useful to (microptimisation), but for now, I would suggest that you assume you need to sort it with ORDER BY every time when you want it in a different timestamp order.

To avoid LATEST BY scanning further back, simply select the data from the snapshot partirion using a subquery wirh a WHERE clause (e.g WHERE dummy_ts IN '2024-09') and subsequently apply the LATEST BY.

The simplest way to copy it to the new table is using INSERT INTO SELECT. For this use case, it should be more than quick enough.

It is probably still a good idea to set dedup keys on the historical snapshot table. And perhaps use now() or systimestamp() to reduce the risk of accidentally overwriting an old snapshot due to forgetting to update the timestamp you’ve set.

Let me know if any of this needs more explaining!

Sounds like a good plan. Thanks!

1 Like