I've been deep in Kafka + DuckDB development the past few weeks, but here are some late-night...
🆕 Created a Kafka topic with 8 partitions on my M3 MacBook Air (one per core) ⚡ Inserted 50M messages from DuckDB (47 bytes each) in 11.62s (4.3 million...
🆕 Created a Kafka topic with 8 partitions on my M3 MacBook Air (one per core) ⚡ Inserted 50M messages from DuckDB (47 bytes each) in 11.62s (4.3 million msgs/sec) 👀 Scanned those messages directly from Kafka in DuckDB in 17.54s (2.85 million msgs/sec) 🐤 Inserted them into a local DuckDB table in 29.63s (1.69 million msgs/sec)
More CPU cores and more Kafka partitions make it even faster.
A lot of this speed comes from DuckDB’s ability to facilitate zero-data-copy with the Kafka client library — no need to copy messages around instead just attach them to DuckDB vectors as auxiliary data.
🐤 DuckDB + Kafka is fast 🚀 — and this is just the beginning!
Next up: benchmarking Avro, Protobuf, and JSON Schema registry support. The code is written — it’s been quite the adventure.
This will be Query.Farm’s first commercially licensed DuckDB extension. The new website is launching soon.
https://query.farm - Meet with us to get a demo.
Originally posted on LinkedIn.