All Articles

Thoughts on Purgeable Memory and Caching

I’ve used and written distributed caching systems everything from two node Redis clusters up to thousands of instances providing terrabytes of cache capacity.

A few months ago I read about the https://github.com/skeeto/purgeable repo. It implements allocations in Linux that can be reclaimed when there is memory pressure. This sounds perfect for a implementing a distributed cache. If there is memory available it will be used, if there is memory pressure pages will be evicted as needed. It would be a fantastic way to utilize resources that aren’t being used.

Of course there is no free lunch, the purgeable memory allocations are performed using mmap() which normally isn’t a big deal. Glibc allocates memory using mmap() for large sized allocations. Since caches typically cache small objects, it would burdensome to a kernel to manage potentially millions of mapped memory ranges in a single process.

Linux controls the maximum number of maps a single process can have with the vm.max_map_count sysctl. By default its set to 64,384, which would be too low if every object in a cache had a distinct allocation.

If the cache is primarily dealing with small objects, as found in a A large scale analysis of hundreds of in-memory cache clusters at Twitter, this could be a problem. The next step would be to build some type of allocator to perform allocations for small objects inside the larger purgeable_alloc() allocations.

Creating allocations within a larger allocation has some downsides:

  1. Before the small allocation would be accessed, the entire range of the large allocation needs to be verified that has not been reclaimed by the kernel. If the allocation returned by purgeable_alloc() is large this could cause many page faults.
  2. There is additional bookkeeping to perform to track how much of a particular purgeable_alloc() allocation is free.
  3. It will be difficult to compact allocations.

If there were a way to verify and lock a smaller byte range of a purgeable_alloc() based allocation, that could change the situation.

I’d love to swap in the purgable memory allocator into a project like mini-redis.