Fixed-Size Memory Pool Allocator
Design a fixed-size memory pool allocator optimized for a low-latency trading system. The allocator manages blocks of a fixed size (e.g., 64 bytes for order objects).
Your allocator must satisfy the following requirements:
- $O(1)$ allocation and deallocation.
- No system calls (
malloc/free) during trading hours -- all memory is pre-allocated at startup. - A single large contiguous chunk is allocated once at initialization and partitioned into fixed-size blocks.
Describe the data structure you would use and implement the core alloc() and free() operations in C++.
Constraints: - Block size $\geq 8$ bytes (must be large enough to hold a pointer). - The pool holds up to $N$ blocks where
Example: ``` MemoryPool pool(64, 1000); // 1000 blocks of 64 bytes void* a = pool.alloc(); // returns pointer to first block void* b = pool.alloc(); // returns pointer to second block pool.free(a); // a's block is now available void* c = pool.alloc(); // returns a's former block (LIFO reuse) ```
Hints
- Think about what data structure gives you $O(1)$ push and pop -- and where you can store that structure without using extra memory.
- Since free blocks are not in use, their memory is available for bookkeeping. You can embed a singly-linked list pointer in the first bytes of each free block.
- At startup, walk through the contiguous allocation and thread each block into the free list. After that,
allocis just popping the head andfreeis just pushing onto the head.
Worked Solution
How to Think About It: The core challenge is achieving $O(1)$ alloc and free with zero runtime overhead. The standard library malloc is general-purpose -- it handles variable sizes, thread safety, fragmentation -- and all of that generality costs latency. On a trading hot path, you cannot afford even a single system call or lock acquisition. The key observation is that when all blocks are the same size, allocation becomes trivially simple: just maintain a stack (linked list) of free blocks. Pop to allocate, push to free. The clever trick is that free blocks are not in use, so you can store the linked list pointers inside the blocks themselves -- zero extra memory overhead.
Algorithm:
- At startup, allocate one large contiguous region of
block_size * num_blocksbytes using a singlemalloccall. - Partition this region into
num_blockschunks and thread them into a singly-linked free list. Each free block's firstsizeof(pointer)bytes stores a pointer to the next free block. alloc(): Pop the head of the free list and return it. If the list is empty, returnnullptr(pool exhausted).free(ptr): Push the returned block onto the head of the free list.
Both operations are a single pointer swap -- no branching, no searching, no system calls.
Code:
```cpp #include <cstdlib> #include <cstddef> #include <cstdint> #include <algorithm>
class MemoryPool { private: // When a block is free, its first bytes hold a pointer to the next free block. struct FreeBlock { FreeBlock* next; };
void* pool_; // base pointer to the contiguous allocation FreeBlock* free_head_; // head of the intrusive free list size_t block_size_; // size of each block (bytes) size_t num_blocks_; // total number of blocks
public: MemoryPool(size_t block_size, size_t num_blocks) : block_size_(std::max(block_size, sizeof(FreeBlock))), num_blocks_(num_blocks), free_head_(nullptr) { // One-time allocation -- the only malloc in the system. pool_ = std::malloc(block_size_ * num_blocks_);
// Thread all blocks into the free list. free_head_ = reinterpret_cast<FreeBlock*>(pool_); FreeBlock* current = free_head_; for (size_t i = 1; i < num_blocks_; ++i) { FreeBlock* next_block = reinterpret_cast<FreeBlock*>( static_cast<uint8_t*>(pool_) + i * block_size_ ); current->next = next_block; current = next_block; } current->next = nullptr; // sentinel }
~MemoryPool() { std::free(pool_); }
// O(1) allocation: pop the free list head. void* alloc() { if (!free_head_) return nullptr; // pool exhausted void* block = free_head_; free_head_ = free_head_->next; return block; }
// O(1) deallocation: push onto the free list head. void dealloc(void* ptr) { FreeBlock* block = reinterpret_cast<FreeBlock*>(ptr); block->next = free_head_; free_head_ = block; } }; ```
Key design details:
- Intrusive free list: The
nextpointer lives inside the free block itself, so there is zero memory overhead for bookkeeping. When the block is allocated and in use, the application data overwrites that pointer -- which is fine, because a block in use is not on the free list. - Contiguous layout: All blocks sit in one
malloc'd region. Sequential allocations from a fresh pool walk through memory linearly, which is cache-line friendly. - LIFO reuse:
freepushes to the head,allocpops from the head. This means recently freed (and therefore cache-warm) blocks get reused first. - No fragmentation: Every block is the same size, so there is no external fragmentation. The pool either has capacity or it does not.
Why this matters on a trading desk:
- A system
malloccan take$-00 \, \mu s$ due to lock contention, page faults, or heap fragmentation. A pool allocator runs in under0 \, ns$ -- just two pointer operations.- Deterministic latency is critical.
mallochas unpredictable worst-case timing; the pool allocator's worst case is the same as its best case.- In production, you would typically size the pool for peak capacity (e.g., max open orders), allocate at process startup, and never touch the system allocator again on the hot path.
Complexity: $O(1)$ time and $O(1)$ extra space for both
allocanddealloc. Initialization is $O(N)$ where $N$ is the number of blocks.Answer: Use an intrusive singly-linked free list threaded through a pre-allocated contiguous block array.
allocpops the list head;freepushes to the list head. Both are $O(1)$ with zero memory overhead, no system calls, and deterministic latency.Intuition
This problem illustrates a universal principle in low-latency systems: when you know your access pattern at design time, you can eliminate almost all runtime overhead. General-purpose allocators like
mallocsolve a hard problem -- managing variable-size allocations across many threads with bounded fragmentation. But on a trading hot path, you do not need any of that generality. All your objects are the same size, allocation is single-threaded, and you know the maximum capacity up front. By exploiting these constraints, you reduce allocation to a single pointer swap.The intrusive free list pattern appears everywhere in systems programming -- from Linux kernel slab allocators to game engine object pools. The key insight is that unused memory is free storage for metadata. Once you internalize this, you start seeing opportunities to eliminate overhead in all sorts of data structures. In a trading context, this same thinking applies to pre-allocated ring buffers for market data, lock-free queues for order routing, and memory-mapped I/O for logging -- the common thread is: do the expensive work once at startup, and make the hot path as close to bare metal as possible.
- Deterministic latency is critical.