We got a full disk in our server yesterday. We expanded the disk and restarted, but we are getting an error with one of the tables. Database is now instable, as it worked for a bit but we are getting the errors again in the log.
[path=/var/lib/questdb/db/trades~148/wal2/0]: IO error for operation on /var/lib/questdb/db/trades~148/wal2/0: No such file or directory (os error 2): No such file or directory (os error 2)]
When running QuestDB it is critical to avoid any full disk events, as files might get inconsistent and recovery is tricky. We should plan ahead and monitor to make sure the server always have enough storage available.
If some files were lost due to the disk failure, there is no way we can recover that data, as the OS lost it. What is happening is that QuestDB has metadata saying those files should exist and they should apply to the table, which is why it is complaining. We can fix the table and keep working from now, minus the lost data.
We can issue:
ALTER TABLE weather SET TYPE BYPASS WAL;
So the table will ignore/purge WAL files and will stop complaining about the missing files.
Then we can resume WAL processing from now on:
ALTER TABLE weather SET TYPE WAL;
If you had deduplication on, the configuration should still be valid, but it never hurts to double check via:
select * from tables() where table_name = 'your_table';
select * from table_columns('your_table');
To make sure deduplication is still active. Otherwise you could always enable it back.
If you are using deduplication, and if you are ingesting data from some replay-able source, like a Kafka broker or some API that allows you to go back in time, you could always re-ingest all the data since before the problem started and you would end up with the whole dataset with no duplicates.