As the title says, when using a window function, is there a way to find out the elapsed time?
Looking at this query:
SELECT timestamp, symbol, price,
first_value(timestamp::long) OVER (
PARTITION BY SYMBOL ORDER BY timestamp
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
) as previous_ts
from trades
WHERE symbol IN ('AVAX-BTC', 'UNI-USD') AND timestamp in '2024-07-02'
In this query I am using a window function, I am going to explain that part in detail
first_value(timestamp::long) OVER (
PARTITION BY SYMBOL
ORDER BY timestamp
ROWS
BETWEEN 1 PRECEDING
AND CURRENT ROW EXCLUDE CURRENT ROW
) as previous_ts
first_value(timestamp::long) OVER (
← A window function is usually an aggregated function over a frame/window of rows relative to the current row. In this case I am going to request the aggregated function first_value(timestamp::long)
, which will give me the timestamp in the first row in the window definition.
You have probably noticed I am casting the timestamp to a long value. This is only because at the moment the first_value
function operates only in numeric columns. There is a github issue I filled on this a while ago and at some point it will support timestamps, but for the time being we can just use the cast https://github.com/questdb/questdb/issues/4395
So, the first_value(timestamp::long)
will give me the timestamp of the first row of a window relative to the current row. All we need to do now is to define a window which goes back only 1 row before, and with this we have the timestamp from the previous row. We do it with this code
OVER (
PARTITION BY SYMBOL
ORDER BY timestamp
ROWS
BETWEEN 1 PRECEDING
AND CURRENT ROW EXCLUDE CURRENT ROW
)
PARTITION BY SYMBOL
means that the window will be relative to the current row but also as long as the symbol is the same. If you want to calculate the value without taking into consideration the SYMBOL
you could omit this. You can also add several values comma separated, for example we could have PARTITION BY SYMBOL, side
and we would get elapsed time relative for the symbol and for the buy or sell operations independently
ORDER BY timestamp
means when we are defining the window, the rows relative to the current one will be sorted by ascending timestamp. In this case that’s what we need, but you could also order by descending timestamp. ORDER BY can have some implications when using window functions, but more often than not you want ORDER BY timestamp
. Specifically for elapsed time this is what you need
ROWS
indicates the window is going to be defined by a number of rows relative to the current one. We could also use RANGE
if you wanted to scope the window to a time range instead (like, for example, moving average over the last minute, independently of how many rows I have within the minute). In this case for elapsed time since last record, ROWS
is the right strategy
BETWEEN 1 PRECEDING
defined the upper bound of the window. We want the value from the previous row, so we say 1 PRECEDING
. You could use UNBOUNDED PRECEDING
to specify since the beginning of any rows within this query and partition. Or you could say BETWEEN 2 PRECEDING
, for example, if you wanted to get the timestamp of 2 rows before the current one. In this case, we use 1.
AND CURRENT ROW EXCLUDE CURRENT ROW
This means we want to end the window at the current row, but we want to exclude the value of the current row. In this particular case, the first row for each symbol will have the result of first_value(timestamp:long)
returning a null, because the window goes from 1 row before to 1 row before, and since we are at the first row, there are no values.
If you omit the EXCLUDE CURRENT ROW
, then the first row for each symbol would return the timestamp of the row itself. Depending on what you want, you can do one or the other.
We could also write
ROWS 1 PRECEDING EXCLUDE CURRENT ROW
or
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
and they would be equivalent to
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
I am using the longer form as I find it more explicit and easier to understand.
Full syntax available at https://questdb.io/docs/reference/function/window/#frame-clause—range-or-rows
So… by now we have a query that returns, for each row, the timestamp of the row before (partitioned by symbol) with the column name previous_ts
. We can now calculate the elapsed time by wrapping this into a subquery, as in
WITH trades_with_prev AS (
SELECT timestamp, symbol, price,
first_value(timestamp::long) OVER (
PARTITION BY SYMBOL ORDER BY TIMESTAMP
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
) as previous_ts
from trades
WHERE symbol IN ('AVAX-BTC', 'UNI-USD') AND timestamp in '2024-07-02'
) SELECT *, (timestamp::long - previous_ts) AS elapsed_micros from trades_with_prev;
As you see, I am doing this (timestamp::long - previous_ts) AS elapsed_micros
. I case the timestamp for the row to a long, representing the epoch in microseconds, so when I subtract the previous timestamp I get the difference in microseconds. Of course you could divide by 1000 if you prefer millis (
timestamp
::
long
- previous_ts)/1000 AS elapsed_millis.
Or you could have cast the previous_ts as a timestamp and use the built-in datediff
function, as in datediff('u',
timestamp
, previous_ts::
timestamp
)
.
No matter how you do it, you will end up with the elapsed time.
I hope this helped!!