Long list capacity overflow

Hi,
I’m doing a SQL sample by 15m over about 2 billion rows of time-series data and getting: ERROR: long list capacity overflow - after about 2 minutes. I assume I’m missing something in the server.conf - most settings currently commented out. Can someone suggest a fix?

Thanks
Dom

from log files:

2024-09-25T18:40:38.683865Z E i.q.g.e.QueryProgress err [id=41, sql=SELECT link_id, speed_in_kph, last_update FROM 'linkspeed' SAMPLE BY 15m;, principal=admin, cache=false, >
2024-09-25T18:40:38.768785Z E i.q.c.p.PGConnectionContext error [msg=long list capacity overflow, errno=-1]

it looks as if an aggregate function is missing from your SQL, such as:

SELECT last(speed_in_kph) FROM 'linkspeed' SAMPLE BY 15m

What is the name of your designated timestmap column and what would be the intent behind the SQL you’re running?

agh, thank you, schoolboy error, missed the avg(speed_in_kph). The timestamp is last_update and the table is partitioned by that column. I’m bucketing the minute values into 5 min average buckets.

though now I do seem to have hit a resource limitation:

qdb=> SELECT link_id, avg(speed_in_kph), last_update FROM ‘linkspeed’ SAMPLE BY 15m;
ERROR: global RSS memory limit exceeded [usage=30175843264, RSS_MEM_LIMIT=30280613040, size=4294967296, memoryTag=18]
LINE 1: SELECT link_id, avg(speed_in_kph), last_update FROM 'linkspe…

Hi @djfsheehan ,

If you can’t increase your resource limits, then you probably need to reduce the time range of your query. SAMPLE BY is a grouping query, meaning it will use a hashmap, which requires RAM. With large queries, it can grow pretty large.

If you have a huge amount of data (years) and a small stride (15m), you can run out of memory. Reducing the time range will help.

Thanks Nick, makes sense, when you say limits, do you mean just host memory, or is there also a corresponding config setting that limits RAM use?

Host RAM i.e on the box or assigned to the container.

Thanks for confirming. I’ve got 2 billion data points, so will test with more memory and also consider batching month by month etc.

Hi,
I am trying the same sample query above on a host with 128gb and sure enough I don’t exceed the global RSS memory limit and I’m getting impressive performance.

However, when I reduce the sample stride to 5m I get the first issue again: ERROR: long list capacity overflow. This is a big query (2bn rows to sample). Is there a config setting I should add/change or is this a hard limit? Alternatively, I can sample month by month and stich together.

Thanks again for any help.

Hi @djfsheehan, how many buckets will this use?

For example, if you have 1 year of data, that’s 525,600 minutes. So 5m buckets means 105,120 buckets. If this errors then you should stitch together smaller time ranges!

A colleague has suggested using a GROUP BY directly:

This is an example on demo:

SELECT timestamp_floor('15m', timestamp) ts, side, symbol, avg(price) 
FROM trades GROUP BY ts, side, symbol

In your case, it could be something like:

SELECT timestamp_floor('5m', last_update) ts, avg(speed_in_kph) avg_speed
FROM 'linkspeed'
GROUP BY ts, avg_speed

or similar!

Its possible that your query is struggling with an implicit ORDER BY.

I’ll try this out - thank you v. much!

I currently have 3 months of data points approx every 2 to 3min. I did succeed sampling into 10m buckets but prefer 5m. I’ll find reasonable max batch size and stich as needed.