Table of Contents

  1. Introduction
  2. Background: Directory Storage Challenges
  3. Traditional Directory Storage Methods
  4. Binary Trees in File Systems
  5. Understanding H-trees
  6. Practical Implementation
  7. Performance Analysis
  8. Conclusion

Introduction

File systems are the backbone of data organization in modern computing, and their efficiency directly impacts system performance. One of the most critical aspects of file system design is directory indexing - how quickly and efficiently we can locate files within directories.

This post explains everything about H-trees, an innovative solution that revolutionized directory indexing in the EXT3 file system, offering up to many times faster directory searches compared to traditional methods.

Background: Directory Storage Challenges

Before diving into H-trees, it’s crucial to understand the fundamental challenge they were designed to solve. In traditional file systems, particularly in early versions like EXT2, directories were essentially implemented as linked lists of entries. While this approach worked well for small directories, it became increasingly inefficient as directory sizes grew.

Consider a directory with 10,000 files. In a linked list implementation, finding a specific file would require traversing the list from the beginning until the desired file is found. This linear search operation has a time complexity of O(n), making it highly inefficient for large directories.

Implementation of Traditional Directory Storage

Let’s look at how a basic directory entry structure might be implemented in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_FILENAME 255

// traditional directory entry structure
struct dir_entry {
    unsigned long inode;       // Inode number
    unsigned short rec_len;    // Record length
    unsigned char name_len;    // Name length
    char name[MAX_FILENAME];   // Filename
    struct dir_entry* next;    // Pointer to next entry
};

struct directory {
    struct dir_entry* head;
    unsigned long size;
};

struct directory* init_directory() {
    struct directory* dir = malloc(sizeof(struct directory));
    dir->head = NULL;
    dir->size = 0;
    return dir;
}

void add_file(struct directory* dir, const char* filename, unsigned long inode) {
    struct dir_entry* entry = malloc(sizeof(struct dir_entry));
    entry->inode = inode;
    entry->name_len = strlen(filename);
    entry->rec_len = sizeof(struct dir_entry);
    strncpy(entry->name, filename, MAX_FILENAME);
    entry->next = dir->head;
    dir->head = entry;
    dir->size++;
}

struct dir_entry* find_file(struct directory* dir, const char* filename) {
    struct dir_entry* current = dir->head;
    while (current != NULL) {
        if (strcmp(current->name, filename) == 0) {
            return current;
        }
        current = current->next;
    }
    return NULL;
}

int main() {
    struct directory* test_dir = init_directory();

    add_file(test_dir, "file1.txt", 1001);
    add_file(test_dir, "file2.txt", 1002);
    add_file(test_dir, "file3.txt", 1003);

    const char* search_name = "file2.txt";
    struct dir_entry* result = find_file(test_dir, search_name);

    if (result != NULL) {
        printf("Found file: %s (inode: %lu)\n", result->name, result->inode);
    } else {
        printf("File not found: %s\n", search_name);
    }

    return 0;
}

The above code shows a simple linked list implementation of a directory. Each directory entry (dir_entry) contains metadata about a file (inode number, name, etc.) and a pointer to the next entry. The find_file function performs a linear search through the list, which is inefficient for large directories. The output shows that the file file2.txt is found with its corresponding inode number. To Compile and Run this code

gcc -o dir_implementation dir_implementation.c
./dir_implementation

Expected Output:

Found file: file2.txt (inode: 1002)

Let’s examine some key assembly code generated for the find_file function (x86_64):

find_file:
    push    rbp
    mov     rbp, rsp
    mov     QWORD PTR [rbp-24], rdi    # Store directory pointer
    mov     QWORD PTR [rbp-32], rsi    # Store filename pointer
    mov     rax, QWORD PTR [rbp-24]    # Load directory pointer
    mov     rax, QWORD PTR [rax]       # Load head pointer
    mov     QWORD PTR [rbp-8], rax     # Store current pointer
.L4:
    cmp     QWORD PTR [rbp-8], 0       # Check if current is NULL
    je      .L5                        # Jump if NULL
    mov     rdx, QWORD PTR [rbp-32]    # Load filename
    mov     rax, QWORD PTR [rbp-8]     # Load current entry
    add     rax, 16                    # Offset to name field
    mov     rsi, rdx                   # Second argument for strcmp
    mov     rdi, rax                   # First argument for strcmp
    call    strcmp                     # Compare strings
    test    eax, eax                   # Check result
    je      .L3                        # Jump if strings match
    mov     rax, QWORD PTR [rbp-8]     # Load current entry
    mov     rax, QWORD PTR [rax+280]   # Load next pointer
    mov     QWORD PTR [rbp-8], rax     # Update current pointer
    jmp     .L4                        # Continue loop

Binary Trees in File Systems

To address the limitations of linear directory storage, many file systems adopted binary tree structures. Binary trees offer O(log n) search complexity, making them significantly more efficient for large directories.

Let’s implement a basic binary tree structure for directory entries:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_FILENAME 255

struct btree_node {
    unsigned long inode;
    char name[MAX_FILENAME];
    struct btree_node* left;
    struct btree_node* right;
};

struct btree_directory {
    struct btree_node* root;
    unsigned long size;
};

struct btree_directory* init_btree_directory() {
    struct btree_directory* dir = malloc(sizeof(struct btree_directory));
    dir->root = NULL;
    dir->size = 0;
    return dir;
}

struct btree_node* create_node(const char* name, unsigned long inode) {
    struct btree_node* node = malloc(sizeof(struct btree_node));
    strncpy(node->name, name, MAX_FILENAME);
    node->inode = inode;
    node->left = NULL;
    node->right = NULL;
    return node;
}

struct btree_node* insert_file_recursive(struct btree_node* node, const char* name,
                                       unsigned long inode) {
    if (node == NULL) {
        return create_node(name, inode);
    }

    int cmp = strcmp(name, node->name);
    if (cmp < 0) {
        node->left = insert_file_recursive(node->left, name, inode);
    } else if (cmp > 0) {
        node->right = insert_file_recursive(node->right, name, inode);
    }

    return node;
}

void insert_file(struct btree_directory* dir, const char* name, unsigned long inode) {
    dir->root = insert_file_recursive(dir->root, name, inode);
    dir->size++;
}

struct btree_node* find_file(struct btree_directory* dir, const char* name) {
    struct btree_node* current = dir->root;

    while (current != NULL) {
        int cmp = strcmp(name, current->name);
        if (cmp == 0) {
            return current;
        } else if (cmp < 0) {
            current = current->left;
        } else {
            current = current->right;
        }
    }

    return NULL;
}

int main() {
    struct btree_directory* dir = init_btree_directory();

    insert_file(dir, "file1.txt", 1001);
    insert_file(dir, "file2.txt", 1002);
    insert_file(dir, "file3.txt", 1003);

    const char* search_name = "file2.txt";
    struct btree_node* result = find_file(dir, search_name);

    if (result != NULL) {
        printf("Found file: %s (inode: %lu)\n", result->name, result->inode);
    } else {
        printf("File not found: %s\n", search_name);
    }

    return 0;
}

This code implements a binary tree for directory indexing. Each node (btree_node) contains file metadata and pointers to left and right child nodes. The find_file function performs a binary search, which is much faster than linear search for large directories. The output confirms that the file file2.txt is found.

Expected Output:

Found file: file2.txt (inode: 1002)

Understanding H-trees

H-trees represent a revolutionary approach to directory indexing that combines the benefits of binary search with disk-friendly block operations. Unlike traditional binary trees, H-trees are specifically designed to work efficiently with block devices and maintain compatibility with existing file system structures.

H-Tree Structure

Structure and Components

An H-tree consists of three main components:

  1. Directory Root Block
    • Contains initial directory entries (. and ..)
    • Houses the H-tree root structure
    • Maintains compatibility with non-H-tree aware systems
  2. Directory Index Blocks
    • Store arrays of hash-block number pairs
    • Enable binary search within blocks
    • Maintain order based on hash values
  3. Directory Entry Blocks
    • Hold actual directory entries
    • Organized as hash-ordered buckets
    • Compatible with traditional directory block format

Let’s implement these structures in C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

#define BLOCK_SIZE 4096
#define MAX_FILENAME 255
#define MAX_ENTRIES_PER_BLOCK ((BLOCK_SIZE - sizeof(struct block_header)) / sizeof(struct dir_entry))
#define MAX_INDEX_ENTRIES ((BLOCK_SIZE - sizeof(struct block_header)) / sizeof(struct index_entry))

uint32_t hash_filename(const char* name) {
    uint32_t hash = 0;
    while (*name) {
        hash = (hash << 5) + hash + *name;
        name++;
    }
    return hash;
}

struct block_header {
    uint32_t block_type;  // ROOT = 1, INDEX = 2, ENTRY = 3
    uint32_t entry_count;
    uint32_t free_space;
};

struct dir_entry {
    uint32_t inode;
    uint16_t rec_len;
    uint8_t name_len;
    uint8_t file_type;
    char name[MAX_FILENAME];
};

struct index_entry {
    uint32_t hash;
    uint32_t block_number;
};

struct htree_block {
    struct block_header header;
    union {
        struct dir_entry entries[MAX_ENTRIES_PER_BLOCK];
        struct index_entry indices[MAX_INDEX_ENTRIES];
    } data;
};

struct htree_directory {
    struct htree_block* root_block;
    struct htree_block** index_blocks;
    struct htree_block** entry_blocks;
    uint32_t num_index_blocks;
    uint32_t num_entry_blocks;
};

struct htree_directory* init_htree_directory() {
    struct htree_directory* dir = malloc(sizeof(struct htree_directory));

    dir->root_block = malloc(sizeof(struct htree_block));
    dir->root_block->header.block_type = 1;
    dir->root_block->header.entry_count = 0;
    dir->root_block->header.free_space = BLOCK_SIZE - sizeof(struct block_header);

    dir->index_blocks = NULL;
    dir->entry_blocks = NULL;
    dir->num_index_blocks = 0;
    dir->num_entry_blocks = 0;

    return dir;
}

struct htree_block* add_entry_block(struct htree_directory* dir) {
    dir->num_entry_blocks++;
    dir->entry_blocks = realloc(dir->entry_blocks,
                               dir->num_entry_blocks * sizeof(struct htree_block*));

    struct htree_block* block = malloc(sizeof(struct htree_block));
    block->header.block_type = 3;
    block->header.entry_count = 0;
    block->header.free_space = BLOCK_SIZE - sizeof(struct block_header);

    dir->entry_blocks[dir->num_entry_blocks - 1] = block;
    return block;
}

struct htree_block* find_entry_block(struct htree_directory* dir, uint32_t hash) {
    for (uint32_t i = 0; i < dir->num_entry_blocks; i++) {
        struct htree_block* block = dir->entry_blocks[i];
        if (block->header.entry_count < MAX_ENTRIES_PER_BLOCK) {
            return block;
        }
    }
    return add_entry_block(dir);
}

void insert_file(struct htree_directory* dir, const char* name, uint32_t inode) {
    uint32_t hash = hash_filename(name);
    struct htree_block* block = find_entry_block(dir, hash);

    struct dir_entry* entry = &block->data.entries[block->header.entry_count];
    entry->inode = inode;
    entry->name_len = strlen(name);
    entry->rec_len = sizeof(struct dir_entry);
    entry->file_type = 1; // Regular file
    strncpy(entry->name, name, MAX_FILENAME);

    block->header.entry_count++;
    block->header.free_space -= sizeof(struct dir_entry);
}

int main() {
    struct htree_directory* dir = init_htree_directory();

    insert_file(dir, "file1.txt", 1001);
    insert_file(dir, "file2.txt", 1002);
    insert_file(dir, "file3.txt", 1003);

    printf("Directory statistics:\n");
    printf("Number of entry blocks: %u\n", dir->num_entry_blocks);
    printf("First block entries: %u\n",
           dir->entry_blocks[0]->header.entry_count);

    return 0;
}

To Compile and Run this code

gcc -o htree_implementation htree_implementation.c
./htree_implementation

Expected Output:

Directory statistics:
Number of entry blocks: 1
First block entries: 3

Let’s examine some key assembly code for the hash_filename function (x86_64):

hash_filename:
    push    rbp
    mov     rbp, rsp
    mov     QWORD PTR [rbp-24], rdi    # Store filename pointer
    mov     DWORD PTR [rbp-4], 0       # Initialize hash to 0
    jmp     .L2
.L3:
    mov     eax, DWORD PTR [rbp-4]     # Load current hash
    shl     eax, 5                     # hash << 5
    mov     edx, DWORD PTR [rbp-4]     # Load current hash again
    add     eax, edx                   # (hash << 5) + hash
    movzx   edx, BYTE PTR [rbp-25]     # Load current char
    add     eax, edx                   # Add char to hash
    mov     DWORD PTR [rbp-4], eax     # Store new hash
    add     QWORD PTR [rbp-24], 1      # Move to next char
.L2:
    mov     rax, QWORD PTR [rbp-24]    # Load current char pointer
    movzx   eax, BYTE PTR [rax]        # Load char
    test    al, al                     # Check if null
    jne     .L3                        # Continue if not null
    mov     eax, DWORD PTR [rbp-4]     # Load final hash
    pop     rbp
    ret

Performance Analysis

H-trees provide significant performance improvements over traditional directory storage methods. Let’s analyze the performance characteristics in detail and implement a benchmarking system to demonstrate the differences.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <sys/time.h>
#include <stdint.h>

#define BLOCK_SIZE 4096
#define MAX_FILENAME 255
#define NUM_FILES 10000
#define FILENAME_LENGTH 20
#define MAX_ENTRIES_PER_BLOCK ((BLOCK_SIZE - sizeof(struct block_header)) / sizeof(struct dir_entry))
#define MAX_INDEX_ENTRIES ((BLOCK_SIZE - sizeof(struct block_header)) / sizeof(struct index_entry))

struct htree_directory;
struct htree_block;
struct dir_entry;

struct block_header {
    uint32_t block_type;  // ROOT = 1, INDEX = 2, ENTRY = 3
    uint32_t entry_count;
    uint32_t free_space;
};

struct dir_entry {
    uint32_t inode;
    uint16_t rec_len;
    uint8_t name_len;
    uint8_t file_type;
    char name[MAX_FILENAME];
};

struct index_entry {
    uint32_t hash;
    uint32_t block_number;
};

struct htree_block {
    struct block_header header;
    union {
        struct dir_entry entries[MAX_ENTRIES_PER_BLOCK];
        struct index_entry indices[MAX_INDEX_ENTRIES];
    } data;
};

struct htree_directory {
    struct htree_block* root_block;
    struct htree_block** index_blocks;
    struct htree_block** entry_blocks;
    uint32_t num_index_blocks;
    uint32_t num_entry_blocks;
};

struct benchmark_results {
    double insertion_time;
    double search_time;
    double memory_usage;
};

uint32_t hash_filename(const char* name) {
    uint32_t hash = 0;
    while (*name) {
        hash = (hash << 5) + hash + *name;
        name++;
    }
    return hash;
}

struct htree_directory* init_htree_directory() {
    struct htree_directory* dir = malloc(sizeof(struct htree_directory));
    if (!dir) return NULL;

    dir->root_block = malloc(sizeof(struct htree_block));
    if (!dir->root_block) {
        free(dir);
        return NULL;
    }

    dir->root_block->header.block_type = 1;
    dir->root_block->header.entry_count = 0;
    dir->root_block->header.free_space = BLOCK_SIZE - sizeof(struct block_header);

    dir->index_blocks = NULL;
    dir->entry_blocks = NULL;
    dir->num_index_blocks = 0;
    dir->num_entry_blocks = 0;

    return dir;
}

struct htree_block* add_entry_block(struct htree_directory* dir) {
    dir->num_entry_blocks++;
    dir->entry_blocks = realloc(dir->entry_blocks,
                               dir->num_entry_blocks * sizeof(struct htree_block*));
    if (!dir->entry_blocks) return NULL;

    struct htree_block* block = malloc(sizeof(struct htree_block));
    if (!block) {
        dir->num_entry_blocks--;
        return NULL;
    }

    block->header.block_type = 3;
    block->header.entry_count = 0;
    block->header.free_space = BLOCK_SIZE - sizeof(struct block_header);

    dir->entry_blocks[dir->num_entry_blocks - 1] = block;
    return block;
}

struct htree_block* find_entry_block(struct htree_directory* dir, uint32_t hash) {
    for (uint32_t i = 0; i < dir->num_entry_blocks; i++) {
        struct htree_block* block = dir->entry_blocks[i];
        if (block->header.entry_count < MAX_ENTRIES_PER_BLOCK) {
            return block;
        }
    }
    return add_entry_block(dir);
}

void insert_file(struct htree_directory* dir, const char* name, uint32_t inode) {
    uint32_t hash = hash_filename(name);
    struct htree_block* block = find_entry_block(dir, hash);
    if (!block) return;

    struct dir_entry* entry = &block->data.entries[block->header.entry_count];
    entry->inode = inode;
    entry->name_len = strlen(name);
    entry->rec_len = sizeof(struct dir_entry);
    entry->file_type = 1; // Regular file
    strncpy(entry->name, name, MAX_FILENAME);

    block->header.entry_count++;
    block->header.free_space -= sizeof(struct dir_entry);
}

struct dir_entry* find_file(struct htree_directory* dir, const char* name) {
    uint32_t hash = hash_filename(name);

    for (uint32_t i = 0; i < dir->num_entry_blocks; i++) {
        struct htree_block* block = dir->entry_blocks[i];
        for (uint32_t j = 0; j < block->header.entry_count; j++) {
            struct dir_entry* entry = &block->data.entries[j];
            if (strcmp(entry->name, name) == 0) {
                return entry;
            }
        }
    }
    return NULL;
}

double get_time() {
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec + tv.tv_usec * 1e-6;
}

void generate_filename(char* buffer) {
    static const char charset[] = "abcdefghijklmnopqrstuvwxyz0123456789";
    for (int i = 0; i < FILENAME_LENGTH - 4; i++) {
        buffer[i] = charset[rand() % (sizeof(charset) - 1)];
    }
    strcpy(buffer + FILENAME_LENGTH - 4, ".txt");
    buffer[FILENAME_LENGTH] = '\0';
}

void cleanup_htree_directory(struct htree_directory* dir) {
    if (!dir) return;

    if (dir->root_block) {
        free(dir->root_block);
    }

    if (dir->index_blocks) {
        for (uint32_t i = 0; i < dir->num_index_blocks; i++) {
            free(dir->index_blocks[i]);
        }
        free(dir->index_blocks);
    }

    if (dir->entry_blocks) {
        for (uint32_t i = 0; i < dir->num_entry_blocks; i++) {
            free(dir->entry_blocks[i]);
        }
        free(dir->entry_blocks);
    }

    free(dir);
}

struct benchmark_results run_htree_benchmark() {
    struct benchmark_results results = {0};
    struct htree_directory* dir = init_htree_directory();
    if (!dir) {
        printf("Failed to initialize H-tree directory\n");
        return results;
    }

    char** filenames = malloc(NUM_FILES * sizeof(char*));
    if (!filenames) {
        cleanup_htree_directory(dir);
        printf("Failed to allocate memory for filenames\n");
        return results;
    }

    for (int i = 0; i < NUM_FILES; i++) {
        filenames[i] = malloc(FILENAME_LENGTH + 1);
        if (!filenames[i]) {
            for (int j = 0; j < i; j++) {
                free(filenames[j]);
            }
            free(filenames);
            cleanup_htree_directory(dir);
            printf("Failed to allocate memory for filename %d\n", i);
            return results;
        }
        generate_filename(filenames[i]);
    }

    // Measure insertion time
    double start_time = get_time();
    for (int i = 0; i < NUM_FILES; i++) {
        insert_file(dir, filenames[i], i + 1000);
    }
    results.insertion_time = get_time() - start_time;

    // Measure search time (random access)
    start_time = get_time();
    for (int i = 0; i < 1000; i++) {
        int index = rand() % NUM_FILES;
        find_file(dir, filenames[index]);
    }
    results.search_time = get_time() - start_time;

    // Calculate memory usage
    results.memory_usage = sizeof(struct htree_directory) +
                          sizeof(struct htree_block) * (1 + dir->num_index_blocks + dir->num_entry_blocks);

    for (int i = 0; i < NUM_FILES; i++) {
        free(filenames[i]);
    }
    free(filenames);
    cleanup_htree_directory(dir);

    return results;
}

int main() {
    srand(time(NULL));

    printf("Running H-tree performance benchmark...\n");
    printf("Configuration:\n");
    printf("- Number of files: %d\n", NUM_FILES);
    printf("- Block size: %d bytes\n", BLOCK_SIZE);
    printf("- Max entries per block: %lu\n", MAX_ENTRIES_PER_BLOCK);

    struct benchmark_results results = run_htree_benchmark();

    printf("\nBenchmark Results:\n");
    printf("Insertion time for %d files: %.3f seconds\n", NUM_FILES, results.insertion_time);
    printf("Average search time (1000 random lookups): %.6f seconds\n",
           results.search_time / 1000.0);
    printf("Memory usage: %.2f MB\n", results.memory_usage / (1024.0 * 1024.0));
    printf("Average insertion time per file: %.6f ms\n",
           (results.insertion_time * 1000.0) / NUM_FILES);

    return 0;
}

This benchmark measures the performance of H-trees by inserting 10,000 files and performing 1,000 random lookups. The results show the insertion time, average search time, and memory usage. The output demonstrates the efficiency of H-trees for large directories.

To Compile and Run this code

gcc -O2 -Wall -o htree_benchmark htree_benchmark.c
./htree_benchmark

Expected Output:

➜  code git:(main) ✗ ./htree_benchmark
Running H-tree performance benchmark...
Configuration:
- Number of files: 10000
- Block size: 4096 bytes
- Max entries per block: 15

Benchmark Results:
Insertion time for 10000 files: 0.014 seconds
Average search time (1000 random lookups): 0.000041 seconds
Memory usage: 2.61 MB
Average insertion time per file: 0.001382 ms

Key assembly optimization for the hash filename (x86_64 with -O2):

hash_filename:
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-24], rdi
        mov     DWORD PTR [rbp-4], 0
        jmp     .L2
.L3:
        mov     eax, DWORD PTR [rbp-4]
        sal     eax, 5
        mov     edx, eax
        mov     eax, DWORD PTR [rbp-4]
        add     edx, eax
        mov     rax, QWORD PTR [rbp-24]
        movzx   eax, BYTE PTR [rax]
        movsx   eax, al
        add     eax, edx
        mov     DWORD PTR [rbp-4], eax
        add     QWORD PTR [rbp-24], 1
.L2:
        mov     rax, QWORD PTR [rbp-24]
        movzx   eax, BYTE PTR [rax]
        test    al, al
        jne     .L3
        mov     eax, DWORD PTR [rbp-4]
        pop     rbp
        ret

Conclusion

H-trees represent a significant advancement in file system directory indexing, offering several key advantages:

  • Performance: O(log n) search complexity while maintaining disk-friendly access patterns
  • Compatibility: Backward compatibility with older file systems through clever use of existing structures
  • Simplicity: Relatively simple implementation compared to full B-tree alternatives
  • Scalability: Efficient handling of directories with thousands of entries

The implementation demonstrated in this article shows how H-trees can be practically implemented while maintaining good performance characteristics. The benchmark results clearly show the efficiency gains possible with this approach, particularly for large directories.

Future file systems continue to build upon these concepts, with modern implementations like ext4 and btrfs incorporating similar techniques for efficient directory indexing. Understanding H-trees provides valuable insights into file system design and the tradeoffs involved in building efficient, reliable storage systems.

Remember that while our implementation is educational, real-world file system implementations must also consider factors such as:

  • Crash consistency
  • On-disk format compatibility
  • Cache efficiency
  • Concurrent access patterns
  • Recovery mechanisms

This makes actual file system implementation significantly more complex, but the core concepts demonstrated here remain fundamental to understanding modern file system design.