All Articles

Multi-dimensional Indexing

I’ve had a particular itch for a while to work on multi-dimensional indexing strategies; I have a project that I’ve been slowly working on since the start of the year that would benefit if I had a workable solution this problem.

Use case: Sensor data often contains a timestamp, a latitude and, a longitude. I want to be able to query it by one or more than one of those dimensions.

The standard strategy to accomplish this is to map the multi-dimensional data into one dimension and then apply the traditional indexing (i.e., using a B-Tree).

Another approach commonly used is the R-Tree; there are upsides and downsides to the R-Tree. This post won’t be discussing them at great length.

Z-order curves can perform the mapping to one dimension. A Z-order value is produced by interleaving bits of the various dimensions (in my example, these are time, latitude, and longitude). The process to produced these values is also called Morton encoding.

The downside of using a Z-order curve is that there are large jumps along the curve. You can see the consequences of these jumps by viewing

Hilbert Curve Packing

In response to the jumps are suggested Hilbert curves. Hilbert curves don’t have the same jump behavior that Z-order curves exhibit.

Of course, someone thought of this before.

I found this Ph.D. thesis: The Application of Space-Filling Curves to the Storage and Retrieval of Multi-dimensional Data by Jonathan Lawder. I also found the expired patent on the technique.

Looking into PostgreSQL, I haven’t seen any index types implemented using a Hilbert curve mapping of multi-dimensional values.

The use of the Hilbert curve isn’t without some conditions:

  1. The size of each of the encoded dimensions in bits must be equal.

    • If one of the dimensions has a smaller cardinality than the other dimensions, you’re likely to be wasting bits.
  2. The values of the dimensions encoded in the Hilbert curve must be an integer and > 0.

    • If floats (such as a latitude or longitude) are indexed, they must have changed into an integer.

Existing Implementations

Then there is also this project:

Java utilities for transforming distance along N-dimensional Hilbert Curve to a point and back. Also supports range splitting queries on the Hilbert Curve

https://github.com/davidmoten/hilbert-curve

Java library to create and search random access files (including in S3) using the space-filling Hilbert index (sparse)

https://github.com/davidmoten/sparse-hilbert-index

Those sound very close to what I’m looking for, but I’m interested in them implemented in Rust.