What's the best way to upload an ILP file into QuestDB?

bob3150 · September 4, 2024, 1:35pm

There was a question with the same title 1 year and 7 months ago on stack overflow, and this question is closed.

I used the script from Javier:

import socket
import sys

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

def send_utf8(msg):
    print(msg)
    sock.sendall(msg.encode())

if __name__ == '__main__':
    try:
        sock.connect(('localhost', 9009))
        with open("YOUR_FILE") as infile:
            for line in infile:
                # print(line)
                send_utf8(line)
    except socket.error as e:
        sys.stderr.write(f'Got error: {e}')
    sock.close()

it works but after some seconds I got an error:

Got error: [Errno 32] Broken pipe

I tried to slow down the script with a sleep(0.01), so I got a little more data into questDB but after some minutes there comes also the error again. And I also tried something with “signal” I found on stack overflow, but I din’t get it work for my whole file with 18GB. Can somebody help me to extend the script for continuous work.

nwoolmer · September 4, 2024, 2:27pm

Hi @bob3150 ,

One way to achieve this is using the Time-Series Benchmark Suite (TSBS). GitHub - questdb/tsbs: Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data

This is a CLI tool which allows you to benchmark ingestion and querying performance for QuestDB, and some other databases.

You can use the tsbs_load_questdb tool to load an ILP file into QuestDB. See instructions here: tsbs/docs/questdb.md at a090657be3c89811c968bb3ef6fa17f2d3489eb3 · questdb/tsbs · GitHub

You should be able to skip the ‘generate data’ steps, as you already have a file to send. Also, you can skip the gzip steps.

Alternatively, you can modify your original approach and instead send the lines in POST requests to /write HTTP endpoint i.e at localhost:9000/write. This endpoint accepts ILP text and inserts it into your table.

bob3150 · September 4, 2024, 11:13pm

Hi,

I tried the TSBS-Tool with my data

./tsbs_load_questdb --file ~/Downloads/test.lp --workers 4
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
2024/09/05 01:04:19 parse error: line does not have 3 tuples, has 4

and get a parse error, do you know what I can do?

The lines of the .lp looks like the following

Füllstände Endlager=3.0931365489959717 1642082238000000000

bob3150 · September 6, 2024, 5:59pm

I tried a little bit more and found out that there is a problem with number of lines:

60.000 lines work

head -n 60000 ~/Downloads/biogas.lp >> ~/Downloads/test.lp./tsbs_load_questdb --file ~/Downloads/test.lp --workers 4
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s

Summary:
loaded 60000 metrics in 0.007sec with 4 workers (mean rate 9171857.68 metrics/sec)
loaded 60000 rows in 0.007sec with 4 workers (mean rate 9171857.68 rows/sec)

60.001 do not work, and there is no problem in line 60.001 → I also tried do delete line 1 from 60.001 and the file work’s.

head -n 60001 ~/Downloads/biogas.lp >> ~/Downloads/test.lp
./tsbs_load_questdb --file ~/Downloads/test.lp --workers 4
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
2024/09/06 18:08:35 parse error: line does not have 3 tuples, has 4

Who knows what can be the problem?

nwoolmer · September 12, 2024, 1:24pm

Hi @bob3150 ,

Perhaps you could post the line that’s broken, and the surrounding lines on each side? Its hard to say without a bit more context.

bob3150 · September 13, 2024, 6:50am

It has definitely nothing to do with broken lines, because when I extend the batch-size I get more lines in questdb, but there is an other error “broken pipe”

rm -r ~/Downloads/test.lp; head -n 100000 ~/Downloads/biogas.lp >> ~/Downloads/test.lp
./tsbs_load_questdb --batch-size 100000 --file ~/Downloads/test.lp --workers 4        
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s

Summary:
loaded 100000 metrics in 0.049sec with 4 workers (mean rate 2039129.18 metrics/sec)
loaded 100000 rows in 0.049sec with 4 workers (mean rate 2039129.18 rows/sec)

rm -r ~/Downloads/test.lp; head -n 200000 ~/Downloads/biogas.lp >> ~/Downloads/test.lp
./tsbs_load_questdb --batch-size 100000 --file ~/Downloads/test.lp --workers 4        
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
2024/09/13 08:43:48 Error writing: write tcp 127.0.0.1:64833->127.0.0.1:9009: write: broken pipe

The data looks like the following lines:

Füllstände Endlager=3.0931365489959717 1642082238000000000
Füllstände Gassack=73.64004516601562 1642308316000000000
Füllstände Gassack=73.4375 1642308346000000000
Füllstände Kellerwasser=199.78839111328125 1642095254000000000
Gasanalyse Fermenter\ H2=33 1642366800000000000
Gasanalyse Fermenter\ H2S=251 1642082400000000000
Stromproduktion MAN\ 1=326.504638671875 1642256144000000000
Stromproduktion MAN\ 2=328.58795166015625 1642347282000000000

nwoolmer · September 13, 2024, 10:16am

Perhaps you can try it with the Influx loader, tsbs_load_influxdb. This will use the ILP/HTTP endpoint, and I wouldn’t expect you to get the same TCP broken pipe errors with that. You’ll need to adjust the port to 9000, which backs the HTTP endpoints.

bob3150 · September 13, 2024, 4:50pm

Thanks, I have tried it with localhost:9000/write and so I get my data from influxdb to questdb. But now, I am really disappointed with speed when using Grafana (it is much slower with much better hardware) and second point ist disk space, influxdb uses approx. 800 Mb for my data and questdb uses 50 Gb.
Influxdb 2.7.10 with Grafana 11.2.0 is running on Intel(R) Celeron(R) N5095 @ 2.00GHz
questdb 8.1.1 with Grafana 11.2.0 is running on MacBook Pro M3 with home-brew

Do you know what can be the main problems?

nwoolmer · September 15, 2024, 5:58pm

Let’s work through it!

Please can you post your schema, query, and how much data you have in the table?

InfluxDB usually uses a sparse schema. Its likely that we need to reshape the data into a dense, column-wise schema in order to resolve your issue.

If you use tools like Telegraf to transfer data, you can do this transformation inline.

bob3150 · September 15, 2024, 6:36pm

That’s the schema of the table:

CREATE TABLE 'Füllstände' (
  Endlager DOUBLE,
  timestamp TIMESTAMP,
  Gassack DOUBLE,
  Kellerwasser DOUBLE,
  Mischerwaage DOUBLE,
  Nachfermenter DOUBLE,
  Brunnen Endlager DOUBLE,
  Zisterne DOUBLE,
  Dieseltank DOUBLE,
  Fermenter DOUBLE,
  Sickergrube DOUBLE,
  Ölgrube lang DOUBLE,
  Ölgrube rund DOUBLE
) timestamp (timestamp) PARTITION BY DAY WAL;

When I make a SELECT * FROM "Füllstände" it counts 27.577.299 rows

The query in Grafan is:

SELECT timestamp as Zeitstempel, avg(Endlager) as Endlager, avg(Gassack) as Gassack, avg(Kellerwasser) as Kellerwasser, avg(Mischerwaage) as Mischerwaage, avg(Nachfermenter) as Nachfermenter, avg("Brunnen Endlager") as "Brunnen Endlager", avg(Zisterne) as Zisterne, avg(Dieseltank) as Dieseltank, avg(Fermenter) as Fermeter, avg(Sickergrube) as Sickergrube, avg("Ölgrube lang") as "Ölgrube lang", avg("Ölgrube rund") as "Ölgrube rund" FROM "Füllstände" WHERE $__timeFilter(timestamp) SAMPLE BY $__sampleByInterval ALIGN TO CALENDAR

This request takes 2.4 s for first request if I repeat the request several times the time decreases to 0,4 s but after some minutes it takes again 2-3 s

The same request on InfluxDB with the small hardware always takes 0.4 s.

And I also get an error in questdb that there is one table suspended.

As I tried around something today I was thinking about the reshape because of the null values in the table, but not really know how to solve this.

How can I handle this with Telegraf? Can I extract the data online from InfluxDB and transfer it to questDB in the correct structure?

nwoolmer · September 16, 2024, 12:04pm

Hi @bob3150

Query speed
Your schema looks fine, assuming those rows are dense (mostly filled, non-null values). From your description, you are saying that we are at least as fast as Influx (which version by the way?), but sometimes the query slows down?

We use mmap to read table columns. The OS handles caching the column data in-memory. After some time of not being used, it may choose to evict some pages from the cache.

On the first query to that table, all columns will be cold and not in-memory, leading to the 2.4s query time. After this query, the data is hot, and has a 0.4s query time. Then, you leave the table for a while, meaning the now hot data slowly becomes cold again, and the OS drops it from memory. You re-query, and its slow again, as it re-reads data from disk.

You can query the data more frequently to keep it in cache. You can also run your query using touch() ahead of time, to warm up the cache: Touch function | QuestDB

You can also provision more RAM so that the OS is under less pressure to free up RAM for other applications.

Suspended table

Tables can become suspended for a number of reasons. Please can you post a screenshot of the suspended message? (you can click on the suspended table to get expanded information).

This may say that you have too little RAM, or too little storage. Depending on the issue, you can resolve this in different ways.

If its caused by lack of storage/hardware, simply add more storage and then use ALTER TABLE RESUME WAL: ALTER TABLE RESUME WAL | QuestDB
If its a corrupted transaction (often caused by power loss during ingestion), you may need to skip it. You can do this using ALTER TABLE RESUME WAL FROM: ALTER TABLE RESUME WAL | QuestDB

If you expect the database to run on unstable hardware, you may need to run the database in sync mode to reduce the risk of data loss, at the cost of performance: Storage model | QuestDB

We also have WIP from the community to automatically skip corrupted transactions on restart: Add an auto resume for suspended (WAL) tables · Issue #4829 · questdb/questdb · GitHub

bob3150 · September 16, 2024, 5:52pm

bob3150 · September 16, 2024, 5:54pm

bob3150 · September 16, 2024, 6:04pm

Version of InfluxDB is 2.7.10, I have sent a photo with the suspend message, and the lose filling of the table. I think this would be an important thing to dense the table → I think it would save 10 times of rows (probably save disk space and speed up requests).
Whit RAM it can be I have only 16 GB and a lot of stuff running on my MacBook during testing.

nwoolmer · September 17, 2024, 12:16pm

Hi @bob3150 ,

Certainly seems to be an issue with sparse data. There are two routes to solve this.

Reingest using Telegraf to transform the data into a dense format. @javier could help with this another time.
Transform and insert it into a new table in QuestDB.

For this second route, it looks like your data is grouped into 1s intervals. It also looks like you have only one valid entry for each column per timestamp.

Therefore, we should be able to resolve this with a sample by. Simply create a new table with the same schema, then try a query such as:

INSERT INTO new_table_name
SELECT first(Endlager), timestamp, first(Gassack), first(Kellerwasser), first(Mischerwaage), first(Nachfermenter), first("Brunnen Endlager"), first(Zisterne), first(Dieseltank), first(Fermenter), first(Sickergrube), first("Ölgrube lang"), first("Ölgrube rund")
FROM Füllstände
SAMPLE BY 1s

Essentially, select the first value for each of your columns for each 1s period, and then insert that as a single row into the new table.

For reference, the reason QuestDB is taking extra space is because QuestDB stores null values as sentinels. So a null value takes the same space as a normal value. Since you have sparse rows, you are using many times more storage than normal because you all the nulls are being stored.

bbytyqi · September 19, 2024, 2:32pm

To load the data using tsbs_load_questdb make sure to process a little bit the ilp file. The reason why you might get a broken pipe, is because the column names are not allowed, for instance having a - sign in the column name. Also in case the values are not supported, for instance unsigned integers marked explicitly. In my case I had to do a sed to fix those issues. Using the tsbs loader is much better than using a python script because it scans first the rows than it stores them in batches, all that by using multiple cores. Besides that, I had to modify the code a little bit so that it doesn’t split the lines when there is a \ space. File scan.go line 55-59:

for i := 0; i < len(that); i++ {
			if that[i] == byte(' ') && i != 0 && that[i-1] != byte('\\') {
				tuples++
			}
			// On the middle element, we split by comma to count number of fields added.
			if tuples == 2 && that[i] == byte(',') {
				metrics++
			}
		}
``
Hopefully that helps other people who have to deal with a similar problem that I did.