Sharing a question from the community
Q: I saw QuestDB mentioned in a list of databases from the “Are You Sure You Want to Use MMAP in Your Database Management System?” paper. Does it use mmap for both reads and writes?
Sharing a question from the community
Q: I saw QuestDB mentioned in a list of databases from the “Are You Sure You Want to Use MMAP in Your Database Management System?” paper. Does it use mmap for both reads and writes?
A: As often, the reality is a bit more complicated than what’s in the papers.
As of v8.0, QuestDB uses mmap
to read column and metadata files. To avoid paying the cost of mmap call, such as TLB shootdowns, we reuse the mmapped memory across query executions and use mremap
when column files grow and a reader needs to switch to the newest transaction. We also use pread
calls when merging out-of-order (O3) data. Finally, our parallel CSV import uses io_uring, Linux asynchronous syscall and I/O mechanism, to speed up reading of the CSV rows.
Regarding writes, we use both mmap
and pwrite
depending on the task. For instance, WAL (Write-Ahead Log) append-only data is written via mmap
, in sliding pages. On the other hand, when merging O3 data we prefer writing data via pwrite
.
In general, there is no point in using only a single I/O mechanism in a database. For example, mmap
works well for reads, involving table scans, as it avoids the overhead of pread
memory copying in subsequent executions while also benefiting from the OS page cache.