Error in Restore snapshot

I am using the instructions for Backup and restore | QuestDB to migrate a database to a new machine. I am getting an error upon restarting the service; it starts going through the restore process, seems to do ok with a few of my data tables, but fails at the telemetry_wal table. I’m not finding much guidance on how to debug this. I’ve tried re-creating the snapshot information with snapshot prepare (original DB has not changed) and copying that into the new location, but that didn’t have any effect. Suggestions on how to proceed?

2024-08-02T18:15:48.156951Z I i.q.c.DatabaseSnapshotAgentImpl rebuilding symbol files [table=/home/user/.questdb/db/telemetry_config, column=version, count=2]
2024-08-02T18:15:48.157281Z I i.q.c.DatabaseSnapshotAgentImpl rebuilding symbol files [table=/home/user/.questdb/db/telemetry_config, column=os, count=1]
2024-08-02T18:15:48.157669Z I i.q.c.DatabaseSnapshotAgentImpl rebuilding symbol files [table=/home/user/.questdb/db/telemetry_config, column=package, count=1]
2024-08-02T18:15:48.158035Z I i.q.c.DatabaseSnapshotAgentImpl recovered _meta file [src=/home/user/.questdb/snapshot/db/sys.telemetry_wal/_meta, dst=/home/user/.questdb/db/sys.telemetry_wal/_meta]
2024-08-02T18:15:48.158135Z I i.q.c.DatabaseSnapshotAgentImpl recovered _txn file [src=/home/user/.questdb/snapshot/db/sys.telemetry_wal/_txn, dst=/home/user/.questdb/db/sys.telemetry_wal/_txn]
2024-08-02T18:15:48.158227Z I i.q.c.DatabaseSnapshotAgentImpl recovered _cv file [src=/home/user/.questdb/snapshot/db/sys.telemetry_wal/_cv, dst=/home/user/.questdb/db/sys.telemetry_wal/_cv]
2024-08-02T18:15:48.159431Z I i.q.c.p.WriterPool closed
2024-08-02T18:15:48.159472Z I i.q.c.p.ReaderPool closed
2024-08-02T18:15:48.159500Z I i.q.c.p.SequencerMetadataPool closed
2024-08-02T18:15:48.159545Z I i.q.c.p.TableMetadataPool closed
2024-08-02T18:15:48.159574Z I i.q.c.p.WalWriterPool closed
java.lang.AssertionError
at io.questdb@8.1.0/io.questdb.std.LongList.setQuick(LongList.java:420)
at io.questdb@8.1.0/io.questdb.cairo.TxReader.unsafeLoadPartitions(TxReader.java:543)
at io.questdb@8.1.0/io.questdb.cairo.TxReader.unsafeLoadAll(TxReader.java:423)
at io.questdb@8.1.0/io.questdb.cairo.TxWriter.unsafeLoadAll(TxWriter.java:450)
at io.questdb@8.1.0/io.questdb.cairo.TxWriter.ofRW(TxWriter.java:241)
at io.questdb@8.1.0/io.questdb.cairo.DatabaseSnapshotAgentImpl.rebuildTableFiles(DatabaseSnapshotAgentImpl.java:179)
at io.questdb@8.1.0/io.questdb.cairo.DatabaseSnapshotAgentImpl.lambda$recoverSnapshot$0(DatabaseSnapshotAgentImpl.java:565)
at io.questdb@8.1.0/io.questdb.std.FilesFacadeImpl.iterateDir(FilesFacadeImpl.java:285)
at io.questdb@8.1.0/io.questdb.cairo.DatabaseSnapshotAgentImpl.recoverSnapshot(DatabaseSnapshotAgentImpl.java:554)
at io.questdb@8.1.0/io.questdb.cairo.CairoEngine.(CairoEngine.java:137)
at io.questdb@8.1.0/io.questdb.Bootstrap.newCairoEngine(Bootstrap.java:366)
at io.questdb@8.1.0/io.questdb.ServerMain.(ServerMain.java:85)
at io.questdb@8.1.0/io.questdb.ServerMain.(ServerMain.java:79)
at io.questdb@8.1.0/io.questdb.ServerMain.main(ServerMain.java:172)

Hi @ilana8

Nothing jumps out immediately.

As other folks on the team see it, something may.

For now, I’d like to point to this reference document that we have in addition to the SNAPSHOT guide:

Perhaps the reference method and the additional context might lead to a revelation.

We’ll return if anything jumps out - please update us with new findings.

I removed the sys.telemetry_wal directory out of snapshots/db, since that seemed to be causing the problem. Then on the next restart, it recovered and rebuilt all the tables, and i see that it does

2024-08-02T19:07:19.040981Z I i.q.g.e.QueryProgress exe [id=3, sql=CREATE TABLE IF NOT EXISTS "sys.telemetry_wal" (created timestamp, event short, tableId int, walId int, seqTxn long, rowCount long,physicalRowCount long,latency float) timestamp(created) partition by MONTH BYPASS WAL, principal=admin, cache=false, jit=false]

along the way as well. DB is up and seems to be fine. I don’t know what sys.telemetry_wal does, but hopefully it won’t be an issue later.

1 Like

The telemetry table records basic information about what the database is doing. This is very coarse information, and should not interfere with normal processing. This could be an old bug, or something new introduced in our recent SNAPSHOT updates.

We should be able to resolve this soon, possibly in the next version.

Glad you have a resolution for now!