Blog

Writing about data engineering, infrastructure, and the tools I build.

VGI Injector: A Tiny HTTPS Download-and-Execute Binary in Zig

I needed a self-contained binary that downloads a program over HTTPS and exec's it — small enough to run in a FROM scratch container with nothing else. Go and Rust couldn't get small enough. Zig could.

TIME Data Type Compatibility Across Databases

A survey of the TIME data type across 14 databases, comparing supported ranges, maximum values, and whether the special value 24:00:00 is accepted.

Telemetry for DuckDB Extensions Without the Pain

I open-sourced the telemetry client I use across Query.Farm's DuckDB extensions. It's two files, one function call, and it only collects platform and version info.

Releasing vgi-rpc: An RPC Framework Built on Apache Arrow

I built an RPC framework for Python that uses Apache Arrow IPC as the wire format and Python Protocol classes as the interface definition. No .proto files, no codegen — just type annotations.

Acronym-Aware Case Conversions in the DuckDB Inflector Extension

The Inflector extension for DuckDB now supports configurable acronyms, so case conversions preserve terms like HTML, API, and URL as fully uppercase — configured through a native DuckDB setting.

Streaming Data into DuckDB with Arrow and Python Generators

How to use Python generators and Apache Arrow's RecordBatchReader to stream large datasets into DuckDB without loading everything into memory at once.

Cron Expressions For DuckDB

Introducing a DuckDB extension that interprets cron expressions to generate scheduled timestamps directly in SQL, built with the Rust croner crate.

Enhancing DuckDB with Unix Pipe Integration: Introducing the shellfs Extension

Introducing shellfs, a DuckDB extension that enables seamless integration with Unix pipes for both input and output, allowing command-line programs to be used directly within DuckDB queries.

Turn The Tape Recorder On And Keep It Running

How to build a kitchen-sink observability event stream using AWS EventBridge, Firehose, and Apache Iceberg to capture and query every event in your distributed system.

The Bare Minimum of Metadata For Any Data

A practical guide to the essential metadata columns every data platform should store alongside its data, enabling traceability back to source files and loader versions using Apache Parquet's efficient encoding.

Showing 1–10 of 24

Short Updates

Quick posts originally shared on LinkedIn.

I built a thing for runners and walkers who are tired of checking five weather apps before...

I'm teaching a hands-on DuckDB Extension Development Workshop in Amsterdam on January 30th

Announcing the ADBC Scanner Extension for DuckDB

The DuckDB extension workshop in Amsterdam this January is already completely full—it filled up...

Big Update to the DuckDB crypto Extension!

Deep Dive into DuckDB Extensions — Let's Workshop in Person!

Excited to share something very cool with the DuckDB + Arrow Flight community.

I've been deep in Kafka + DuckDB development the past few weeks, but here are some late-night...

Excited to announce the Inflector Extension for DuckDB by Query.Farm!

Help support Query.Farm!

Showing 1–10 of 41