Window function to get future rows?

Jarym · July 3, 2024, 5:46pm

Hi, I’d like to query the maximum value in the following 6 timesteps so I tried this:

select m,
first_value(m) over (order by timestamp, m desc rows between current row and 6 following)
from t

but I get the error frame end supports _number_ PRECEDING and CURRENT ROW only.

Can someone point me to the correct way to do this in QuestDB?

javier · July 4, 2024, 11:44am

Hi Jarym. I see a few issues with your query. The first one is that QuestDB does not allow following on the window frame, but even if it did, you are sorting first by timestamp, then by m desc, and then defining the window, so the window would not have the frames you think it would have.

We can solve this in a couple of ways, which I am not sure will be super efficient at scale. The simplest (but probably less efficient) is using a CROSS JOIN. And the second is using window functions and a CASE.

For both scenarios I created a table t like this. You can create it yourself and since I am using a seed for the pseudo random generation we should have the same data.

create table t AS (SELECT timestamp_sequence(to_timestamp('2024-06-21', 'yyyy-MM-dd'), 10000000L) as ts
, concat('bla', rnd_int(1,10,0)) as id
, rnd_int(1,100,0) as m 
from long_sequence(18, 19000L, 21000L)
) timestamp(ts);

And then I ran this

WITH

t_pos AS (

select *, row_number() over(order by ts) as pos from t

)

select a.ts, a.id, a.m, min(b.m) from t_pos a CROSS JOIN t_pos b

where b.pos >= a.pos AND b.pos - a.pos <= 6;

Which is telling my query to first add a row_number to every row, then do a CROSS JOIN against the same table, but filter only rows where the position is 6 rows ahead of me, and with that we do a MIN grouping by all the columns, so we get the minimum value. This works fine

The query above is straight forward, but a CROSS JOIN on a large dataset might be too much.

So I thought of a second option. I first get the next 6 values, by doing a window function by descending timestamp, then I do a second query with a CASE to find the smallest one for each row.


WITH t_future AS
(
select *, 
first_value(m) OVER (ORDER BY ts DESC ROWS 1 PRECEDING) as m1,
first_value(m) OVER (ORDER BY ts DESC ROWS 2 PRECEDING) as m2,
first_value(m) OVER (ORDER BY ts DESC ROWS 3 PRECEDING) as m3,
first_value(m) OVER (ORDER BY ts DESC ROWS 4 PRECEDING) as m4,
first_value(m) OVER (ORDER BY ts DESC ROWS 5 PRECEDING) as m5,
first_value(m) OVER (ORDER BY ts DESC ROWS 6 PRECEDING) as m6
FROM t
)
SELECT ts, id, m,
CASE
  WHEN  m <= m1 AND m <= m2 AND m <= m3 AND m <= m4 AND m <= m5 AND m <= m6 THEN m
  WHEN m1 <= m AND m1 <= m2 AND m1 <= m3 AND m1 <= m4 AND m1 <= m5 AND m1 <= m6 THEN m1
  WHEN m2 <= m AND m2 <= m1 AND m2 <= m3 AND m2 <= m4 AND m2 <= m5 AND m2 <= m6 THEN m2
  WHEN m3 <= m AND m3 <= m1 AND m3 <= m2 AND m3 <= m4 AND m3 <= m5 AND m3 <= m6 THEN m3
  WHEN m4 <= m AND m4 <= m1 AND m4 <= m2 AND m4 <= m3 AND m4 <= m5 AND m4 <= m6 THEN m4
  WHEN m5 <= m AND m5 <= m1 AND m5 <= m2 AND m5 <= m3 AND m5 <= m4 AND m5 <= m6 THEN m5
  ELSE m6
END::int as min
FROM t_future;

Ands this also works as expected

Now, to make things interesting I changed my table to have 100K entries and… well… CROSS JOIN takes over 200 seconds on my laptop, while the case query took 77 milliseconds, so I guess we have a winner .

Did this help?

Jarym · July 9, 2024, 8:05pm

Thank you for taking the time to explain - I’ll give this a go!