Everything About Memory Allocators: Write A Simple Memory Allocator

Introduction
Memory Layout Fundamentals
Memory Management Basics
Building a Memory Allocator
Core Functions Implementation
- malloc()
- free()
- calloc()
- realloc()
Thread Safety Considerations
Testing and Usage
Sequence Diagrams
Further Reading
Conclusion

Introduction

Memory allocation is a fundamental aspect of systems programming that every developer should understand. This article provides an in-depth exploration of implementing a basic memory allocator in C, covering malloc(), free(), calloc(), and realloc(). While our implementation won’t match the sophistication of production allocators like ptmalloc or jemalloc, it will help you understand the core concepts.

Memory Layout Fundamentals

Before diving into implementation details, let’s understand how memory is organized in a Linux process. A program’s virtual address space typically consists of several segments:

Text Segment (Code Segment)
- Contains executable instructions
- Read-only
- Shared among processes running the same program
Data Segment
- Contains initialized global and static variables
- Split into read-only and read-write sections
BSS (Block Started by Symbol)
- Contains uninitialized global and static variables
- Automatically initialized to zero by the kernel
- Takes no actual space in the executable file
Heap
- Dynamically allocated memory
- Grows upward (toward higher addresses)
- Managed by malloc/free
- Starts after BSS
Stack
- Automatic variables
- Function call frames
- Grows downward (toward lower addresses)
- Automatically managed

The program break (brk) marks the end of the heap segment. Our memory allocator will primarily interact with this boundary using the sbrk() system call.

Memory Management Basics

The Program Break

The program break is a virtual address that marks the end of the process’s data segment. We can manipulate it using two system calls:

brk(void *addr) - Sets the program break to a specific address
sbrk(intptr_t increment) - Adjusts the program break by a specified increment

Key points about sbrk():

sbrk(0) returns the current program break address
sbrk(positive_value) increases heap size
sbrk(negative_value) decreases heap size
Returns (void*)-1 on error

Building a Memory Allocator

Core Data Structures

Our allocator needs to track allocated memory blocks. We’ll use a linked list approach with each block containing a header structure.

typedef char ALIGN[16];

union header {
    struct {
        size_t size;        // Size of the block
        unsigned is_free;   // Block status flag
        union header *next; // Next block in list
    } s;
    ALIGN stub;            // Force 16-byte alignment
};
typedef union header header_t;

// Global variables
header_t *head = NULL, *tail = NULL;
pthread_mutex_t global_malloc_lock;

The header structure serves several purposes:

Tracks block size
Maintains free/allocated status
Links blocks together
Ensures proper memory alignment

Memory Block Headers

Every allocated block has an associated header that precedes the actual memory block:

+----------------+------------------+
| Header         | Actual Memory    |
+----------------+------------------+
^                ^
|                |
Block Start      Returned Pointer

The header is hidden from the user but crucial for our allocator’s operation. When a user requests N bytes:

We allocate N + sizeof(header_t) bytes
Place the header at the start
Return a pointer to the memory after the header

Core Functions Implementation

malloc()

The malloc() implementation follows these steps:

Check for valid size request
Look for existing free block of sufficient size
If found, mark as used and return
If not found, request new memory from OS
Setup header and return pointer

Here’s the implementation:

void *malloc(size_t size) {
    if (!size) return NULL;
    
    pthread_mutex_lock(&global_malloc_lock);
    
    // Try to find a free block
    header_t *header = get_free_block(size);
    if (header) {
        header->s.is_free = 0;
        pthread_mutex_unlock(&global_malloc_lock);
        return (void*)(header + 1);
    }
    
    // Need new block from OS
    size_t total_size = sizeof(header_t) + size;
    void *block = sbrk(total_size);
    if (block == (void*)-1) {
        pthread_mutex_unlock(&global_malloc_lock);
        return NULL;
    }
    
    // Initialize new block
    header = block;
    header->s.size = size;
    header->s.is_free = 0;
    header->s.next = NULL;
    
    // Update list pointers
    if (!head) head = header;
    if (tail) tail->s.next = header;
    tail = header;
    
    pthread_mutex_unlock(&global_malloc_lock);
    return (void*)(header + 1);
}

free()

The free() implementation handles memory deallocation:

void free(void *block) {
    if (!block) return;
    
    pthread_mutex_lock(&global_malloc_lock);
    
    header_t *header = (header_t*)block - 1;
    void *programbreak = sbrk(0);
    
    // Check if block is at the end of heap
    if ((char*)block + header->s.size == programbreak) {
        // Remove from list
        if (head == tail) {
            head = tail = NULL;
        } else {
            header_t *tmp = head;
            while (tmp) {
                if (tmp->s.next == tail) {
                    tmp->s.next = NULL;
                    tail = tmp;
                    break;
                }
                tmp = tmp->s.next;
            }
        }
        
        // Return memory to OS
        sbrk(0 - sizeof(header_t) - header->s.size);
    } else {
        // Mark block as free
        header->s.is_free = 1;
    }
    
    pthread_mutex_unlock(&global_malloc_lock);
}

calloc()

calloc() allocates memory and initializes it to zero:

void *calloc(size_t num, size_t nsize) {
    size_t size;
    void *block;
    
    if (!num || !nsize) return NULL;
    
    size = num * nsize;
    // Check for multiplication overflow
    if (nsize != size / num) return NULL;
    
    block = malloc(size);
    if (!block) return NULL;
    
    // Zero out the memory
    memset(block, 0, size);
    return block;
}

realloc()

realloc() changes the size of an existing allocation:

void *realloc(void *block, size_t size) {
    if (!block || !size) return malloc(size);
    
    header_t *header = (header_t*)block - 1;
    if (header->s.size >= size) return block;
    
    void *ret = malloc(size);
    if (ret) {
        memcpy(ret, block, header->s.size);
        free(block);
    }
    return ret;
}

Thread Safety Considerations

Our implementation uses a global mutex to ensure thread safety:

pthread_mutex_t global_malloc_lock;

// Initialize in main or constructor
pthread_mutex_init(&global_malloc_lock, NULL);

However, this approach has limitations:

Single lock creates contention
sbrk() isn’t thread-safe
No protection against foreign sbrk() calls

Testing and Usage

To compile and use the allocator:

# Compile as shared library
gcc -o memalloc.so -fPIC -shared memalloc.c

# Use with existing programs
export LD_PRELOAD=$PWD/memalloc.so

Conclusion

This implementation demonstrates the fundamental concepts of memory allocation while keeping things simple enough to understand. Production allocators add many optimizations:

Multiple allocation strategies
Memory pooling
Thread-local caches
Sophisticated fragmentation handling
Memory mapping for large allocations

Key takeaways:

Memory allocation involves both userspace and kernel space interaction
Thread safety requires careful consideration
Memory layout understanding is crucial
Simple implementations help learn fundamentals
Production allocators are much more complex

Remember that this implementation is for educational purposes. Production systems should use established allocators that have been thoroughly tested and optimized.

Table of Contents