Hi Jarym. I see a few issues with your query. The first one is that QuestDB does not allow following
on the window frame, but even if it did, you are sorting first by timestamp, then by m desc, and then defining the window, so the window would not have the frames you think it would have.
We can solve this in a couple of ways, which I am not sure will be super efficient at scale. The simplest (but probably less efficient) is using a CROSS JOIN
. And the second is using window functions and a CASE
.
For both scenarios I created a table t like this. You can create it yourself and since I am using a seed for the pseudo random generation we should have the same data.
create table t AS (SELECT timestamp_sequence(to_timestamp('2024-06-21', 'yyyy-MM-dd'), 10000000L) as ts
, concat('bla', rnd_int(1,10,0)) as id
, rnd_int(1,100,0) as m
from long_sequence(18, 19000L, 21000L)
) timestamp(ts);
And then I ran this
WITH
t_pos AS (
select *, row_number() over(order by ts) as pos from t
)
select a.ts, a.id, a.m, min(b.m) from t_pos a CROSS JOIN t_pos b
where b.pos >= a.pos AND b.pos - a.pos <= 6;
Which is telling my query to first add a row_number to every row, then do a CROSS JOIN
against the same table, but filter only rows where the position is 6 rows ahead of me, and with that we do a MIN
grouping by all the columns, so we get the minimum value. This works fine
The query above is straight forward, but a CROSS JOIN on a large dataset might be too much.
So I thought of a second option. I first get the next 6 values, by doing a window function by descending timestamp, then I do a second query with a CASE
to find the smallest one for each row.
WITH t_future AS
(
select *,
first_value(m) OVER (ORDER BY ts DESC ROWS 1 PRECEDING) as m1,
first_value(m) OVER (ORDER BY ts DESC ROWS 2 PRECEDING) as m2,
first_value(m) OVER (ORDER BY ts DESC ROWS 3 PRECEDING) as m3,
first_value(m) OVER (ORDER BY ts DESC ROWS 4 PRECEDING) as m4,
first_value(m) OVER (ORDER BY ts DESC ROWS 5 PRECEDING) as m5,
first_value(m) OVER (ORDER BY ts DESC ROWS 6 PRECEDING) as m6
FROM t
)
SELECT ts, id, m,
CASE
WHEN m <= m1 AND m <= m2 AND m <= m3 AND m <= m4 AND m <= m5 AND m <= m6 THEN m
WHEN m1 <= m AND m1 <= m2 AND m1 <= m3 AND m1 <= m4 AND m1 <= m5 AND m1 <= m6 THEN m1
WHEN m2 <= m AND m2 <= m1 AND m2 <= m3 AND m2 <= m4 AND m2 <= m5 AND m2 <= m6 THEN m2
WHEN m3 <= m AND m3 <= m1 AND m3 <= m2 AND m3 <= m4 AND m3 <= m5 AND m3 <= m6 THEN m3
WHEN m4 <= m AND m4 <= m1 AND m4 <= m2 AND m4 <= m3 AND m4 <= m5 AND m4 <= m6 THEN m4
WHEN m5 <= m AND m5 <= m1 AND m5 <= m2 AND m5 <= m3 AND m5 <= m4 AND m5 <= m6 THEN m5
ELSE m6
END::int as min
FROM t_future;
Ands this also works as expected
Now, to make things interesting I changed my table to have 100K entries and… well… CROSS JOIN takes over 200 seconds on my laptop, while the case query took 77 milliseconds, so I guess we have a winner .
Did this help?