Hello,
i configured Questdb in my environment with the following parameter:
cairo.writer.data.append.page.size=8M
There are days (basing on the writer) that is enough and other days that is excessive.
My partitions are subdivided in 1 day, and before installing in production a Dashboard active 24/7, the storage imprint of files .D was freed at 01:00.
Thus the .D files that were “empty” became about 2MB, freeing up 6MB of disk space.
Now, with some 24/7 readers active (that keep in mind takes only the last 5 min of data), the files .D are never freed.
Is there something i can do to recover this space?
This sounds like a bug. We have a VACUUM TABLE statement which cleans up un-used versions of columns. Unfortunately, we do not have a VACUUM INDEX statement that would allow you to force a compaction (or clean up a deleted index, if it hadn’t been fully removed).
Not indexes, apologies!
Please could you give an indication of how big the issue is? For example, if you only have a few tables and partitions, then the impact might not be big. But if this affected say 10,000 partitions, then it would add up to real money (i.e 60 GB).
As a workaround, here is something to try, and then something to mitigate.
Try disabling your readers for 5 mins either side of your partition rollover i.e at 01:00, and see if it gets compacted (or if it is completely too late!)
Consider migrating your storage to zfs which has built-in compression. The empty space would be highly compressible, so this can mitigate the impact until there is a fix.
For reference, with zfs, we usually expect about a 5x reduction in storage space required, for average time-series data.
We’d also be interested to hear more about your use case and what your system does, if you’re happy to share!
P.S What version of QuestDB do you use, what OS, and what filesystem?
Hello,
The data is taken from different types of sensors, so no out of order insertion;
In our specific case:
The sensors data needs be “syncronized” so we have a single table for each area. So we have X Areas, each Area with Y (depending on how much sensors we have) columns.
We use Grafana to show the data (24/7, we use the PostgreSQL data source)
We have to retain the data 90+ days.
So in the end, 6MB * X * Y *90 Days of storage loss.
In our specific case it’s not a big problem but if can be resolved it would be nice.
For different kind of polices, we use Windows Server 2019 + NTFS.
QuestDB 8.2.1
If we can tonight we will try to schedule the shutdown of the Grafana service, so we can test if the partition gets compacted.
Normally, the compaction happens on partition rollover i.e the latest partition becomes old and a new one begins. So in that case, did your latest ‘rolled over’ partition not shrink either?
Older ones probably won’t shrink as they won’t have been considered as part of the rollover.
Re: page sizes, as long as the page size is large enough for your ‘average’ write, you shouldn’t notice a performance drop. I had a use case with many small tables and lowered my page sizes to 64 kb or even lower - it didn’t make sense to write 8 MB out for just a handful of rows.
If you still have partitions with version numbers i.e 2024-12-18T15.1710, then these may be improved with VACUUM TABLE - but potentially not under reads. It is worth a try!