Is it possible to JOIN data from a separate Postgres database?

Toruitas · July 23, 2024, 9:42am

I’m evaluating moving a few large tables worth of time-series data into QuestDB from Postgres. The data is strongly relational. For example, each row in a spreadsheet displayed also shows data from 3-5 of the foreign key relationships on the data. Only the data from these time-series tables would migrate to QuestDB, the rest would stay in Postgres.

Is there a supported or recommended way to JOIN, import, or otherwise emulate relationships like these?

nwoolmer · July 24, 2024, 12:11pm

Hi @Toruitas,

There is no ‘remote table’ concept in QuestDB at the moment.

On the application side, you can connect to both databases using Postgres connections, and then join it locally. You could then put an API in front of this so its opaque to the user.

The corresponding foreign keys can be stored in whatever format is appropriate - LONG, UUID, SYMBOl etc.

For data import, you can use copy sql or send multiple CSVs in parallel over REST. Alternatively, you can read from the Postgres database and submit data through one of other the ingestion APIs.

In the next release (imminent), we will release a function to read Parquet files. If your setup allows for Parquet export, you could try exporting to a Parquet file, creating a new table, and then ingesting it using an INSERT INTO SELECT statement, something like:

INSERT INTO my_table SELECT * FROM read_parquet('file.parquet')

Let us know how you get on and if you have any more questions!

Toruitas · July 26, 2024, 2:19pm

Could you elaborate a bit more on your suggested local joins approach? Does this indicate creating a temporary table in either Postgres or QuestDB, or something a bit closer to the application like loading data from both DBs into one Pandas dataframe?

Thanks for your tips!

nwoolmer · July 26, 2024, 2:49pm

Yes, loading it into the application (assuming dataset fits in memory) and joining with Pandas/Polars/your library of choice.

You can use SAMPLE BY to downsample data from QuestDB first and reduce how much data you need to transfer. For example, you might have data every 1 second, but only really need it every 30 seconds. So you can SAMPLE BY 30s and transfer 30x less data.

If data does not fit in memory, then for now you’d need to transfer the data (or sub-sampled data) into another persistent store for querying. We are actively working on better options for the future.

Toruitas · July 26, 2024, 3:17pm

Thank you very much! That’s very helpful, enough to get started with.

nwoolmer · July 26, 2024, 4:42pm

Let me know if you have any further questions!