Avoid query timeouts after ILP ingestion

Hi

I’m testing QuestDB on windows.

I’ve created a loop as follows

some pseudocode

for each symbol in symbols
1) get last bar date stored in QuestDB table (run this query: SELECT MAX(time_stamp) FROM minute_bars WHERE symbol = '{symbol}')
2) fetch minute bars asynchronously from data vendor over tcp socket
3) when we recieve a minute bar send it to questdb using ILP (here I setup a TCP socket connection to QuestDB and write lines like "minute_bars,ticker=SPY high=380.50,low=370.03,open=375.02,close=372.01,volume=10000i,oi=1000i,num_trds=100i $unix_timestamp\n" as I receive the bars)
4) wait for the asynchonous process to finish (ie wait till we have all data from the vendor)
end

This works fine on the first iteration of the loop. The problem is the second iteration of the loop the SQL query times out (ie. step 1). Looking at the logs questdb is busy processing all the minutes bars I sent over ILP and the SQL query in step 1 fails.

I can fix this by changing the config to
query.timeout.sec=240

However I feel I’m not doing something right here. I’m just sending a load of minute bars to quest DB (about 10 years worth), then running a query on the database. I feel like it shouldn’t choke. Is there a better design (or config settings) for this kind of task?

I’m a bit surprised the SQL query times out - just cause there’s new table rows being ingested.

Hi @AUK1939,

Please could you share server logs, and machine specs? Also table schema.

It is possible that you are submitting single row transactions, or have another performance hit like one of your symbol capacities not being set appropriately.

We can help debug this!

Hi @nwoolmer

I think you might have already hit the nail on the head here - I’m sending the records line by line over an ILP tcp socket - I take it this is not optimal? Is there an alternative approach?

The computer spec is Intel i9 13th gen, 128Gigg DDR5, 4TB M2 NVME SSD.

The table schema is

CREATE TABLE minute_bars (
	time_stamp TIMESTAMP,
    symbol SYMBOL,
    high DOUBLE,
    low DOUBLE,
	open DOUBLE,
	close DOUBLE,
	volume LONG,
	oi LONG, 
	num_trds LONG,
	is_daily_close BOOLEAN
) TIMESTAMP(time_stamp) PARTITION BY DAY WAL
DEDUP UPSERT KEYS(time_stamp, symbol);

Here are the relevant parts of the log (let me know if you need the whole thing). Here you can see lot’s of writes and a select query (which times out later on in the logs)

2024-11-18T04:41:16.476643Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-10T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=139001, srcOooHi=139646, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-10T04:01:00.000000Z, newSize=646, maxTs=2024-08-07T11:15:00.000000Z, pCount=46, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.476649Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-11T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=139647, srcOooHi=140272, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-11T04:01:00.000000Z, newSize=626, maxTs=2024-08-07T11:15:00.000000Z, pCount=47, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.476654Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-14T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=140273, srcOooHi=140960, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-14T04:01:00.000000Z, newSize=688, maxTs=2024-08-07T11:15:00.000000Z, pCount=48, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.476660Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-15T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=140961, srcOooHi=141617, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-15T04:01:00.000000Z, newSize=657, maxTs=2024-08-07T11:15:00.000000Z, pCount=49, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.476665Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-16T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=141618, srcOooHi=142248, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-16T04:01:00.000000Z, newSize=631, maxTs=2024-08-07T11:15:00.000000Z, pCount=50, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.476671Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-17T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=142249, srcOooHi=142960, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-17T04:03:00.000000Z, newSize=712, maxTs=2024-08-07T11:15:00.000000Z, pCount=51, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.476676Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-18T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=142961, srcOooHi=143607, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-18T04:01:00.000000Z, newSize=647, maxTs=2024-08-07T11:15:00.000000Z, pCount=52, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.476681Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-21T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=143608, srcOooHi=144270, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-21T04:01:00.000000Z, newSize=663, maxTs=2024-08-07T11:15:00.000000Z, pCount=53, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.476687Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-22T00:00:00.000000Z, partitionIndex=-17397, last=false, append=false, ro=false, srcOooLo=144271, srcOooHi=144861, srcOooMax=144862, o3RowCount=37588, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-22T04:01:00.000000Z, newSize=591, maxTs=2024-08-07T11:15:00.000000Z, pCount=54, flattenTs=false, memUsed=966.602 MiB, rssMemUsed=623.701 MiB]
2024-11-18T04:41:16.546842Z I i.q.c.TableWriter switched partition [path=\minute_bars~6\2024-10-22.62]
2024-11-18T04:41:16.548420Z I i.q.c.w.ApplyWal2TableJob job finished [table=minute_bars~6, seqTxn=64, transactions=1, rows=37588, time=99ms, rate=378316rows/s, physicalWrittenRowsMultiplier=1.0]
2024-11-18T04:41:16.548490Z I i.q.c.p.WriterPool << [table=`minute_bars~6`, thread=64]
2024-11-18T04:41:16.638206Z I tcp-line-server disconnected [ip=127.0.0.1, fd=13469017440541, src=queue]
2024-11-18T04:41:16.638403Z I i.q.c.l.t.TableUpdateDetails closing table writer [tableName=TableToken{tableName=minute_bars, dirName=minute_bars~6, tableId=6, isWal=true, isSystem=false}]
2024-11-18T04:41:16.638434Z I i.q.c.l.t.TableUpdateDetails closing table parsers [tableName=TableToken{tableName=minute_bars, dirName=minute_bars~6, tableId=6, isWal=true, isSystem=false}]
2024-11-18T04:41:16.638669Z I i.q.c.w.WalWriter committed data block [wal=\minute_bars~6\wal1\10, segmentTxn=4, seqTxn=65, rowLo=144862, roHi=158915, minTs=2024-10-22T16:50:00.000000Z, maxTs=2024-11-15T20:00:00.000000Z]
2024-11-18T04:41:16.638997Z I i.q.c.p.WriterPool >> [table=`minute_bars~6`, thread=64]
2024-11-18T04:41:16.646121Z I i.q.c.TableWriter processing WAL [path=\minute_bars~6\wal1\10, roLo=144862, roHi=158915, seqTxn=65, tsMin=2024-10-22T16:50:00.000000Z, tsMax=2024-11-15T20:00:00.000000Z, commitToTs=294247-01-10T04:00:54.775807Z]
2024-11-18T04:41:16.664580Z I i.q.g.e.QueryProgress exe [id=25, sql=`SELECT MAX(time_stamp) FROM minute_bars WHERE symbol = 'GOOGL'`, principal=admin, cache=false, jit=true]
2024-11-18T04:41:16.664837Z I i.q.c.TableReader open partition [path=\minute_bars~6\2024-10-22.62, rowCount=591, partitionIndex=4401, partitionCount=4402]
2024-11-18T04:41:16.673263Z I i.q.c.TableWriter WAL dedup sorted commit index [table=TableToken{tableName=minute_bars, dirName=minute_bars~6, tableId=6, isWal=true, isSystem=false}, totalRows=14053, lagRows=0]
2024-11-18T04:41:16.673569Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-22T00:00:00.000000Z, partitionIndex=17604, last=true, append=true, ro=false, srcOooLo=144862, srcOooHi=144950, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=591, o3Ts=2024-10-22T16:50:00.000000Z, newSize=680, maxTs=2024-10-22T16:49:00.000000Z, pCount=1, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
2024-11-18T04:41:16.673631Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-23T00:00:00.000000Z, partitionIndex=-17609, last=false, append=false, ro=false, srcOooLo=144951, srcOooHi=145666, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-23T04:01:00.000000Z, newSize=716, maxTs=2024-10-22T16:49:00.000000Z, pCount=2, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
2024-11-18T04:41:16.673641Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-24T00:00:00.000000Z, partitionIndex=-17609, last=false, append=false, ro=false, srcOooLo=145667, srcOooHi=146360, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-24T04:01:00.000000Z, newSize=694, maxTs=2024-10-22T16:49:00.000000Z, pCount=3, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
2024-11-18T04:41:16.673647Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-25T00:00:00.000000Z, partitionIndex=-17609, last=false, append=false, ro=false, srcOooLo=146361, srcOooHi=147099, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-25T04:01:00.000000Z, newSize=739, maxTs=2024-10-22T16:49:00.000000Z, pCount=4, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
2024-11-18T04:41:16.673654Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-28T00:00:00.000000Z, partitionIndex=-17609, last=false, append=false, ro=false, srcOooLo=147100, srcOooHi=147848, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-28T04:01:00.000000Z, newSize=749, maxTs=2024-10-22T16:49:00.000000Z, pCount=5, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
2024-11-18T04:41:16.673660Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-29T00:00:00.000000Z, partitionIndex=-17609, last=false, append=false, ro=false, srcOooLo=147849, srcOooHi=148617, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-29T04:01:00.000000Z, newSize=769, maxTs=2024-10-22T16:49:00.000000Z, pCount=6, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
2024-11-18T04:41:16.673666Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-30T00:00:00.000000Z, partitionIndex=-17609, last=false, append=false, ro=false, srcOooLo=148618, srcOooHi=149520, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-30T04:01:00.000000Z, newSize=903, maxTs=2024-10-22T16:49:00.000000Z, pCount=7, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
2024-11-18T04:41:16.673672Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-10-31T00:00:00.000000Z, partitionIndex=-17609, last=false, append=false, ro=false, srcOooLo=149521, srcOooHi=150376, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-10-31T04:01:00.000000Z, newSize=856, maxTs=2024-10-22T16:49:00.000000Z, pCount=8, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
2024-11-18T04:41:16.673677Z I i.q.c.TableWriter o3 partition task [table=minute_bars, partitionTs=2024-11-01T00:00:00.000000Z, partitionIndex=-17609, last=false, append=false, ro=false, srcOooLo=150377, srcOooHi=151287, srcOooMax=158915, o3RowCount=14053, o3LagRowCount=0, srcDataMax=0, o3Ts=2024-11-01T04:01:00.000000Z, newSize=911, maxTs=2024-10-22T16:49:00.000000Z, pCount=9, flattenTs=false, memUsed=964.699 MiB, rssMemUsed=622.923 MiB]
....
2024-11-18T04:43:13.636761Z I i.q.c.TableReader open partition [path=\minute_bars~6\2007-05-01, rowCount=391, partitionIndex=2, partitionCount=4402]
2024-11-18T04:43:13.655539Z I i.q.c.TableReader open partition [path=\minute_bars~6\2007-04-30, rowCount=391, partitionIndex=1, partitionCount=4402]
2024-11-18T04:43:13.678141Z I i.q.c.TableReader open partition [path=\minute_bars~6\2007-04-27, rowCount=317, partitionIndex=0, partitionCount=4402]
2024-11-18T04:43:13.705650Z E i.q.g.e.QueryProgress err [id=25, sql=`SELECT MAX(time_stamp) FROM minute_bars WHERE symbol = 'GOOGL'`, principal=admin, cache=false, jit=true, time=117041070300, msg=timeout, query aborted, errno=-1, pos=0]

Quick update on this. I tried to send the lines in batches over ILP

I tried with batchsizes of 100, 1000, 10000 and 100000 but the problem always persists; the subsequent select query after the ILP insert fails due to a timeout

What did work for me is sending the data over many sockets (in batches of 100k). So the ingestion happens in parallel

Hey @AUK1939 ,

The transaction sizes didn’t seem too bad, actually. Its strange that sending in parallel resolved this. With that machine spec, I wouldn’t expect a major slowdown in either case.

Let us know if you hit any trouble again, and we can try to debug it further!

Hi @nwoolmer

Actually what made the difference is the PARTITION size. Setting the partition size to WEEK, MONTH or YEAR works - it avoids the query time out problem.

I’m not sure if the batching of lines and or using many sockets is helping.

Ah in this case, perhaps you had too many partitions. For example, if you use DAY partitions for 25 years of data, that’s >9000 partitions. The query time may then by dominated by how long it takes to open and close all the files!

With monthly partitions, you’re already reducing this by a factor of 30x or so!

Partition sizing is a trade off between what’s good for ingestion, and what’s good for querying. If you ingest out-of-order (O3) data into larger partitions, then it will cost more CPU and IOPS to handle, as this is a copy-on-write process. But for appending data, you shouldn’t notice much difference.

1 Like