Reclaim Writer page size

Hello,
i configured Questdb in my environment with the following parameter:

cairo.writer.data.append.page.size=8M

There are days (basing on the writer) that is enough and other days that is excessive.

My partitions are subdivided in 1 day, and before installing in production a Dashboard active 24/7, the storage imprint of files .D was freed at 01:00.
Thus the .D files that were “empty” became about 2MB, freeing up 6MB of disk space.

Now, with some 24/7 readers active (that keep in mind takes only the last 5 min of data), the files .D are never freed.
Is there something i can do to recover this space?

Hi @P923 ,

This sounds like a bug. We have a VACUUM TABLE statement which cleans up un-used versions of columns. Unfortunately, we do not have a VACUUM INDEX statement that would allow you to force a compaction (or clean up a deleted index, if it hadn’t been fully removed).

Not indexes, apologies!

Please could you give an indication of how big the issue is? For example, if you only have a few tables and partitions, then the impact might not be big. But if this affected say 10,000 partitions, then it would add up to real money (i.e 60 GB).

As a workaround, here is something to try, and then something to mitigate.

  1. Try disabling your readers for 5 mins either side of your partition rollover i.e at 01:00, and see if it gets compacted (or if it is completely too late!)
  2. Consider migrating your storage to zfs which has built-in compression. The empty space would be highly compressible, so this can mitigate the impact until there is a fix.

For reference, with zfs, we usually expect about a 5x reduction in storage space required, for average time-series data.

We’d also be interested to hear more about your use case and what your system does, if you’re happy to share!

P.S What version of QuestDB do you use, what OS, and what filesystem?

Issue to track: Unable to reclaim wasted space in data files · Issue #5257 · questdb/questdb · GitHub

Hello,
The data is taken from different types of sensors, so no out of order insertion;

In our specific case:

  • The sensors data needs be “syncronized” so we have a single table for each area. So we have X Areas, each Area with Y (depending on how much sensors we have) columns.
  • We use Grafana to show the data (24/7, we use the PostgreSQL data source)
  • We have to retain the data 90+ days.

So in the end, 6MB * X * Y *90 Days of storage loss.
In our specific case it’s not a big problem but if can be resolved it would be nice.

For different kind of polices, we use Windows Server 2019 + NTFS.
QuestDB 8.2.1

If we can tonight we will try to schedule the shutdown of the Grafana service, so we can test if the partition gets compacted.

Thank you,

Thank you for the info!

Re: Grafana, we do have a native Grafana plugin, which you could try out versus the PostgreSQL source: QuestDB plugin for Grafana | Grafana Labs

Re: a fix, Windows may make this more difficult, but this likely happens on Linux too. We will see what we can do :slight_smile:

I’ve tested the workaround with a table partitioned by hour.
Stopping the readers at .58 minutes didn’t make the partition to shrink.

In the debug log can i find something that can help understand why it didn’t do it?

Another solution would be reduce the page size.
What happens if it is too small?
What are the performance impacts?

Thank you

Normally, the compaction happens on partition rollover i.e the latest partition becomes old and a new one begins. So in that case, did your latest ‘rolled over’ partition not shrink either?

Older ones probably won’t shrink as they won’t have been considered as part of the rollover.

Re: page sizes, as long as the page size is large enough for your ‘average’ write, you shouldn’t notice a performance drop. I had a use case with many small tables and lowered my page sizes to 64 kb or even lower - it didn’t make sense to write 8 MB out for just a handful of rows.

Hello,
Unfortunately not.

  • Partition created 2024-12-18T14
  • Partition created 2024-12-18T15.1710
  • Stop of the readers at T15 and 58 minutes
  • Partition created 2024-12-18T16.14234

and T15.1710 not compacted.

If you still have partitions with version numbers i.e 2024-12-18T15.1710, then these may be improved with VACUUM TABLE - but potentially not under reads. It is worth a try!

I’ve launched the VACUUM TABLE

2024-12-18T16:24:43.407274Z D i.q.c.TxnScoreboard open clean [file=E:\qdbroot\db\GRPT~45_txn_scoreboard, fd=788865235367350]

2024-12-18T16:24:43.407433Z I i.q.c.VacuumColumnVersions enumerating files at E:\qdbroot\db\GRPT~45\2024-12-18T15.1710

2024-12-18T16:24:43.407758Z I i.q.c.O3PartitionPurgeJob processed [table=TableToken{tableName=GRPT, dirName=GRPT~45, tableId=45, isWal=true, isSystem=false}]

2024-12-18T16:24:43.407964Z I i.q.c.VacuumColumnVersions enumerating files at E:\qdbroot\db\GRPT~45\2024-12-18T16.14234

But no affect on the disk size.