Values generated with timestamp_sequence() behave strangely on comparison/conversion

I have a table (of fake data) with a timestamp column. These values behave exactly as I expect.

SELECT timestamp, to_timezone(timestamp, 'EST') as timezone_converted, dateadd('d', 1, timestamp) as day_added from events;

https : // imgur . com / XJDnJaX

I have generated a reference series, with one value per day (I basically just want to figure out the number of events per day), starting from today and counting backwards, as follows:

    SELECT
        timestamp_sequence(date_trunc('day', now()), -86400000000L) as reference_date
    FROM
        long_sequence(300)

So far so good:

Imgur

When I try to operate on the generated dates I have no idea what’s going on.

WITH d AS (
    SELECT
        timestamp_sequence(date_trunc('day', now()), -86400000000L) as reference_date
    FROM
        long_sequence(300)
)
SELECT 
    reference_date,
    to_timezone(reference_date, 'EST') as timezone_converted,
    dateadd('d', 1, reference_date) as day_added
FROM
    d
ORDER BY
    reference_date;

Imgur

The series is generated exactly the same way, and it should be all values 1 day apart going back the past 300 days. But somehow, operating on it changes the values?

Here is the kind of thing I really want to do:

WITH d AS (
    SELECT
        timestamp_sequence(date_trunc('day', now()), -86400000000L) as reference_date
    FROM
        long_sequence(300)
)
SELECT
    d.reference_date,
    COALESCE(SUM(e.amount), 0) AS total_amount
FROM
    d
LEFT JOIN
    events e
ON
    date_trunc('day', e.timestamp) = d.reference_date 
GROUP BY
    d.reference_date
ORDER BY
    d.reference_date;

https : // imgur . com / GJUpEfx

I have no idea what’s going on. What am I misunderstanding? I get the same results (dates going back to 1777) using BETWEEN ... AND .... Is there something about the way the generated timestamps are represented that I need to know?

Apologies for obfuscated imgur links, my account is new and my options are limited.

Hi @mkoo21 ,

timestamp_sequence() is a function for generating timestamps, yes. But it generates the next timestamp on each access. It does not generate a temporary table. Each access increments the timestamp.

So the order of calls seems to the function seems to be timezone_converted -> day_added -> reference_date. Then a call that is not printed, and the pattern repeats. For example:

15 -> 16 -> 17 -> (18)
then
19 -> 20 -> 21 -> (22)

I would suggest first inserting the timestamps into a table. Then perform the second query. By inserting into a table, the timestamps will be preserved.

Let me know if you have any more questions or trouble with this!

EDIT: Here is a related issue :slight_smile: ORDER BY in timestamp_sequence() doubles distance between timestamps · Issue #4909 · questdb/questdb · GitHub

1 Like

Thank you, that would explain it. Am I correct in reading that QuestDB doesn’t support temp tables à la CREATE TEMP TABLE t ( .... currently? What would be the recommended way to set up this temp table?

I have tried using the sequence as a subquery, and setting up a (permanent) temporary table… but it seems like I cannot do DELETE FROM t either.

Of course, if there is a smarter way to do per-day summary statistics than timestamp_sequence I am open to suggestions as well.

You can use CREATE AS SELECT i.e (a little hacked)

CREATE TABLE tmp AS (
SELECT
        timestamp_sequence(date_trunc('day', now()), -86400000000L / 2) as reference_date
    FROM
        long_sequence(300)
    ORDER BY reference_date
) timestamp(reference_date)

I halved the time period because its called twice per iteration - workaround.

After that query is run, run your second query separately.

When you want to drop or clear the table, use:

TRUNCATE TABLE tmp;

or:

DROP TABLE tmp;

Truncate will clear all data, drop will remove the table entirely.

If you have absolute dates, you can also use something like this:

SELECT timestamp FROM trades
SAMPLE BY 1d FROM '2024-01-01' TO '2024-11-25'
FILL(0)

You have to point it at a real table and use the designated timestamp as your select - but you can generate timestamps with it. If you use now() or similar in the FROM-TO then you will need to pass an aggregate function too, like count.

I’ve found some edge cases around the second one to be resolved but its decent workaround if you can use exact dates (i.e passing them from Grafana or another tool).

Can you expand a little on ‘per-day summary statistics’? If you want to aggregate per day, you can use SAMPLE BY.

SELECT timestamp, count FROM trades SAMPLE BY 1d
1 Like

This works! I was trying to avoid missing rows, but SAMPLE BY with a FILL does the trick. I do need a specific number of rows, but I think I can fill that gap outside of the query. Thank you very much for your help.

Awesome! Glad we ended up somewhere simpler :slight_smile: