Blog
Writing about data engineering, infrastructure, and the tools I build.
Streaming Data into DuckDB with Arrow and Python Generators
How to use Python generators and Apache Arrow's RecordBatchReader to stream large datasets into DuckDB without loading everything into memory at once.
Cron Expressions For DuckDB
Introducing a DuckDB extension that interprets cron expressions to generate scheduled timestamps directly in SQL, built with the Rust croner crate.
Enhancing DuckDB with Unix Pipe Integration: Introducing the shellfs Extension
Introducing shellfs, a DuckDB extension that enables seamless integration with Unix pipes for both input and output, allowing command-line programs to be used directly within DuckDB queries.
Turn The Tape Recorder On And Keep It Running
How to build a kitchen-sink observability event stream using AWS EventBridge, Firehose, and Apache Iceberg to capture and query every event in your distributed system.
The Bare Minimum of Metadata For Any Data
A practical guide to the essential metadata columns every data platform should store alongside its data, enabling traceability back to source files and loader versions using Apache Parquet's efficient encoding.
DuckDB is Strategically Important
Why DuckDB is becoming the go-to tool for data tasks in 2024, along with key areas where improvements are still needed including Iceberg support, Parquet encodings, and remote data throughput.
Images, Emails and Content Addressable Storage
An investigation about how to store a large number of referenced URLs while considering WebP and AVIF recompression for images.
Cost-effective S3 for Subscription Data Distribution
Solving the problem where you have a lot of data you want to allow customers to obtain (provided they purchased access), but you don't want to pay for their internet transfer fees can be tricky. This architecture makes it easy.
Anomaly Detection for Cities and Airports
Cities and airports stream their radio traffic to the internet, you can listen to police or firetrucks being dispatched or airplanes being given headings and altitudes. By analyzing the duration of transmissions over time with machine learning, we make be able to achieve automated detection of emergencies or abnormal events.
Web Chat Survey
Looking into various web chat front end implementations
Showing 1–10 of 19
Short Updates
Quick posts originally shared on LinkedIn.
I built a thing for runners and walkers who are tired of checking five weather apps before...
I'm teaching a hands-on DuckDB Extension Development Workshop in Amsterdam on January 30th
Announcing the ADBC Scanner Extension for DuckDB
The DuckDB extension workshop in Amsterdam this January is already completely full—it filled up...
Deep Dive into DuckDB Extensions — Let's Workshop in Person!
Big Update to the DuckDB crypto Extension!
Excited to share something very cool with the DuckDB + Arrow Flight community.
I've been deep in Kafka + DuckDB development the past few weeks, but here are some late-night...
Excited to announce the Inflector Extension for DuckDB by Query.Farm!
Introducing the JSONata Extension for DuckDB by Query.Farm!
Showing 1–10 of 41