Is it true that QuestDB uses mmap for file I/O?

puzpuzpuz · June 27, 2024, 4:13pm

Sharing a question from the community

Q: I saw QuestDB mentioned in a list of databases from the “Are You Sure You Want to Use MMAP in Your Database Management System?” paper. Does it use mmap for both reads and writes?

puzpuzpuz · June 27, 2024, 4:18pm

A: As often, the reality is a bit more complicated than what’s in the papers.

As of v8.0, QuestDB uses mmap to read column and metadata files. To avoid paying the cost of mmap call, such as TLB shootdowns, we reuse the mmapped memory across query executions and use mremap when column files grow and a reader needs to switch to the newest transaction. We also use pread calls when merging out-of-order (O3) data. Finally, our parallel CSV import uses io_uring, Linux asynchronous syscall and I/O mechanism, to speed up reading of the CSV rows.

Regarding writes, we use both mmap and pwrite depending on the task. For instance, WAL (Write-Ahead Log) append-only data is written via mmap, in sliding pages. On the other hand, when merging O3 data we prefer writing data via pwrite.

In general, there is no point in using only a single I/O mechanism in a database. For example, mmap works well for reads, involving table scans, as it avoids the overhead of pread memory copying in subsequent executions while also benefiting from the OS page cache.