How to find elapsed time using window functions

system · July 3, 2024, 3:52pm

As the title says, when using a window function, is there a way to find out the elapsed time?

javier · July 3, 2024, 3:54pm

Looking at this query:

SELECT timestamp, symbol, price, 
   first_value(timestamp::long) OVER (
    PARTITION BY SYMBOL ORDER BY timestamp 
    ROWS BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
    ) as previous_ts
    from trades
    WHERE symbol IN ('AVAX-BTC', 'UNI-USD') AND timestamp in '2024-07-02'

In this query I am using a window function, I am going to explain that part in detail

first_value(timestamp::long) OVER (
    PARTITION BY SYMBOL 
    ORDER BY timestamp 
    ROWS 
         BETWEEN 1 PRECEDING 
         AND CURRENT ROW EXCLUDE CURRENT ROW
    ) as previous_ts

first_value(timestamp::long) OVER ( ← A window function is usually an aggregated function over a frame/window of rows relative to the current row. In this case I am going to request the aggregated function first_value(timestamp::long), which will give me the timestamp in the first row in the window definition.

You have probably noticed I am casting the timestamp to a long value. This is only because at the moment the first_value function operates only in numeric columns. There is a github issue I filled on this a while ago and at some point it will support timestamps, but for the time being we can just use the cast https://github.com/questdb/questdb/issues/4395

So, the first_value(timestamp::long) will give me the timestamp of the first row of a window relative to the current row. All we need to do now is to define a window which goes back only 1 row before, and with this we have the timestamp from the previous row. We do it with this code

OVER (
    PARTITION BY SYMBOL 
    ORDER BY timestamp 
    ROWS 
         BETWEEN 1 PRECEDING 
         AND CURRENT ROW EXCLUDE CURRENT ROW
    )

PARTITION BY SYMBOL means that the window will be relative to the current row but also as long as the symbol is the same. If you want to calculate the value without taking into consideration the SYMBOL you could omit this. You can also add several values comma separated, for example we could have PARTITION BY SYMBOL, side and we would get elapsed time relative for the symbol and for the buy or sell operations independently

ORDER BY timestamp means when we are defining the window, the rows relative to the current one will be sorted by ascending timestamp. In this case that’s what we need, but you could also order by descending timestamp. ORDER BY can have some implications when using window functions, but more often than not you want ORDER BY timestamp. Specifically for elapsed time this is what you need

ROWS indicates the window is going to be defined by a number of rows relative to the current one. We could also use RANGE if you wanted to scope the window to a time range instead (like, for example, moving average over the last minute, independently of how many rows I have within the minute). In this case for elapsed time since last record, ROWS is the right strategy

BETWEEN 1 PRECEDING defined the upper bound of the window. We want the value from the previous row, so we say 1 PRECEDING. You could use UNBOUNDED PRECEDING to specify since the beginning of any rows within this query and partition. Or you could say BETWEEN 2 PRECEDING, for example, if you wanted to get the timestamp of 2 rows before the current one. In this case, we use 1.

AND CURRENT ROW EXCLUDE CURRENT ROW This means we want to end the window at the current row, but we want to exclude the value of the current row. In this particular case, the first row for each symbol will have the result of first_value(timestamp:long) returning a null, because the window goes from 1 row before to 1 row before, and since we are at the first row, there are no values.

If you omit the EXCLUDE CURRENT ROW, then the first row for each symbol would return the timestamp of the row itself. Depending on what you want, you can do one or the other.

We could also write

   ROWS 1 PRECEDING  EXCLUDE CURRENT ROW

or

 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING

and they would be equivalent to

ROWS BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW

I am using the longer form as I find it more explicit and easier to understand.

Full syntax available at https://questdb.io/docs/reference/function/window/#frame-clause—range-or-rows

So… by now we have a query that returns, for each row, the timestamp of the row before (partitioned by symbol) with the column name previous_ts. We can now calculate the elapsed time by wrapping this into a subquery, as in

WITH trades_with_prev AS (
SELECT timestamp, symbol, price, 
   first_value(timestamp::long) OVER (
    PARTITION BY SYMBOL ORDER BY TIMESTAMP
    ROWS BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW
    ) as previous_ts
    from trades
    WHERE symbol IN ('AVAX-BTC', 'UNI-USD') AND timestamp in '2024-07-02'
) SELECT *, (timestamp::long - previous_ts) AS elapsed_micros from trades_with_prev;

As you see, I am doing this (timestamp::long - previous_ts) AS elapsed_micros . I case the timestamp for the row to a long, representing the epoch in microseconds, so when I subtract the previous timestamp I get the difference in microseconds. Of course you could divide by 1000 if you prefer millis (timestamp::long - previous_ts)/1000 AS elapsed_millis.

Or you could have cast the previous_ts as a timestamp and use the built-in datediff function, as in datediff('u', timestamp, previous_ts::timestamp).

No matter how you do it, you will end up with the elapsed time.

I hope this helped!!