Long list capacity overflow

djfsheehan · September 25, 2024, 6:50pm

Hi,
I’m doing a SQL sample by 15m over about 2 billion rows of time-series data and getting: ERROR: long list capacity overflow - after about 2 minutes. I assume I’m missing something in the server.conf - most settings currently commented out. Can someone suggest a fix?

Thanks
Dom

djfsheehan · September 25, 2024, 7:30pm

from log files:

2024-09-25T18:40:38.683865Z E i.q.g.e.QueryProgress err [id=41, sql=SELECT link_id, speed_in_kph, last_update FROM 'linkspeed' SAMPLE BY 15m;, principal=admin, cache=false, >
2024-09-25T18:40:38.768785Z E i.q.c.p.PGConnectionContext error [msg=long list capacity overflow, errno=-1]

bluestreak01 · September 25, 2024, 7:42pm

it looks as if an aggregate function is missing from your SQL, such as:

SELECT last(speed_in_kph) FROM 'linkspeed' SAMPLE BY 15m

What is the name of your designated timestmap column and what would be the intent behind the SQL you’re running?

djfsheehan · September 25, 2024, 7:49pm

agh, thank you, schoolboy error, missed the avg(speed_in_kph). The timestamp is last_update and the table is partitioned by that column. I’m bucketing the minute values into 5 min average buckets.

djfsheehan · September 25, 2024, 8:22pm

though now I do seem to have hit a resource limitation:

qdb=> SELECT link_id, avg(speed_in_kph), last_update FROM ‘linkspeed’ SAMPLE BY 15m;
ERROR: global RSS memory limit exceeded [usage=30175843264, RSS_MEM_LIMIT=30280613040, size=4294967296, memoryTag=18]
LINE 1: SELECT link_id, avg(speed_in_kph), last_update FROM 'linkspe…

nwoolmer · September 30, 2024, 3:27pm

Hi @djfsheehan ,

If you can’t increase your resource limits, then you probably need to reduce the time range of your query. SAMPLE BY is a grouping query, meaning it will use a hashmap, which requires RAM. With large queries, it can grow pretty large.

If you have a huge amount of data (years) and a small stride (15m), you can run out of memory. Reducing the time range will help.

djfsheehan · September 30, 2024, 4:11pm

Thanks Nick, makes sense, when you say limits, do you mean just host memory, or is there also a corresponding config setting that limits RAM use?

nwoolmer · September 30, 2024, 7:26pm

Host RAM i.e on the box or assigned to the container.

djfsheehan · September 30, 2024, 10:06pm

Thanks for confirming. I’ve got 2 billion data points, so will test with more memory and also consider batching month by month etc.

djfsheehan · October 9, 2024, 1:57pm

Hi,
I am trying the same sample query above on a host with 128gb and sure enough I don’t exceed the global RSS memory limit and I’m getting impressive performance.

However, when I reduce the sample stride to 5m I get the first issue again: ERROR: long list capacity overflow. This is a big query (2bn rows to sample). Is there a config setting I should add/change or is this a hard limit? Alternatively, I can sample month by month and stich together.

Thanks again for any help.

nwoolmer · October 9, 2024, 3:21pm

Hi @djfsheehan, how many buckets will this use?

For example, if you have 1 year of data, that’s 525,600 minutes. So 5m buckets means 105,120 buckets. If this errors then you should stitch together smaller time ranges!

nwoolmer · October 9, 2024, 3:31pm

A colleague has suggested using a GROUP BY directly:

This is an example on demo:

SELECT timestamp_floor('15m', timestamp) ts, side, symbol, avg(price) 
FROM trades GROUP BY ts, side, symbol

In your case, it could be something like:

SELECT timestamp_floor('5m', last_update) ts, avg(speed_in_kph) avg_speed
FROM 'linkspeed'
GROUP BY ts, avg_speed

or similar!

Its possible that your query is struggling with an implicit ORDER BY.

djfsheehan · October 9, 2024, 4:24pm

I’ll try this out - thank you v. much!

I currently have 3 months of data points approx every 2 to 3min. I did succeed sampling into 10m buckets but prefer 5m. I’ll find reasonable max batch size and stich as needed.