Elf Format Unpacked: A Deep Dive Into Executable And Linkable Format

Introduction
Understanding ELF: More Than Just Executables
The Anatomy of an ELF File
ELF in Action: From Compilation to Execution
Advanced Topics in ELF
Security Implications of ELF
Conclusion

Introduction

The Executable and Linkable Format (ELF) is a standard file format for executable files, object code, shared libraries, and core dumps. Originally developed by Unix System Laboratories (USL) as part of the Application Binary Interface (ABI), ELF has become the de facto standard binary format for Unix and Unix-like systems, including Linux. In this comprehensive guide, we’ll explore the intricacies of the ELF format, its structure, and its significance in modern computing.

Understanding ELF: More Than Just Executables

While many developers encounter ELF files primarily as executable binaries, the format’s versatility extends far beyond simple program execution. ELF serves as a unified format for various types of binary files, including:

Executable files
Object files
Shared libraries
Core dumps

This unified approach simplifies the toolchain for compilers, linkers, and debuggers, providing a consistent interface for working with binary files across different stages of program development and execution.

The Anatomy of an ELF File

An ELF file consists of several key components, each serving a specific purpose in the overall structure. Let’s break down these components and examine their roles:

1. ELF Header

The ELF header is the starting point of any ELF file. It contains crucial metadata about the file, including:

Magic number: A sequence that identifies the file as an ELF file
File class: 32-bit or 64-bit
Data encoding: Little-endian or big-endian
ELF version
Operating system and ABI information
File type (executable, shared object, etc.)
Machine architecture
Entry point address

Here’s a simplified C structure representing the ELF header for a 64-bit system:

#include <stdint.h>

typedef struct {
    unsigned char e_ident[16];  // ELF identification bytes
    uint16_t e_type;            // Object file type
    uint16_t e_machine;         // Machine type
    uint32_t e_version;         // Object file version
    uint64_t e_entry;           // Entry point address
    uint64_t e_phoff;           // Program header offset
    uint64_t e_shoff;           // Section header offset
    uint32_t e_flags;           // Processor-specific flags
    uint16_t e_ehsize;          // ELF header size
    uint16_t e_phentsize;       // Size of program header entry
    uint16_t e_phnum;           // Number of program header entries
    uint16_t e_shentsize;       // Size of section header entry
    uint16_t e_shnum;           // Number of section header entries
    uint16_t e_shstrndx;        // Section name string table index
} Elf64_Ehdr;

2. Program Header Table

The program header table is crucial for executable and shared object files. It contains information that the system needs to prepare the program for execution. Each entry in this table describes a segment or other information the system needs to prepare the program for execution.

A simplified C structure for a 64-bit program header entry might look like this:

typedef struct {
    uint32_t p_type;    // Type of segment
    uint32_t p_flags;   // Segment attributes
    uint64_t p_offset;  // Offset in file
    uint64_t p_vaddr;   // Virtual address in memory
    uint64_t p_paddr;   // Reserved
    uint64_t p_filesz;  // Size of segment in file
    uint64_t p_memsz;   // Size of segment in memory
    uint64_t p_align;   // Alignment of segment
} Elf64_Phdr;

3. Section Header Table

The section header table provides a detailed view of the sections within the ELF file. While not necessary for program execution, this table is crucial for linking and debugging. Each entry in the section header table describes a single section in the file.

Here’s a simplified C structure for a 64-bit section header entry:

typedef struct {
    uint32_t sh_name;      // Section name
    uint32_t sh_type;      // Section type
    uint64_t sh_flags;     // Section attributes
    uint64_t sh_addr;      // Virtual address in memory
    uint64_t sh_offset;    // Offset in file
    uint64_t sh_size;      // Size of section
    uint32_t sh_link;      // Link to other section
    uint32_t sh_info;      // Miscellaneous information
    uint64_t sh_addralign; // Address alignment boundary
    uint64_t sh_entsize;   // Size of entries, if section has table
} Elf64_Shdr;

4. Data

The actual contents of the file, including code and data, are stored in various sections. Some common sections include:

.text: Contains executable code
.data: Contains initialized data
.bss: Contains uninitialized data
.rodata: Contains read-only data
.symtab: Symbol table
.strtab: String table

ELF in Action: From Compilation to Execution

To better understand how ELF files are created and used, let’s walk through the process of compiling a simple C program and examining its ELF structure.

Step 1: Writing a Simple C Program

Let’s start with a basic “Hello, World!” program:

#include <stdio.h>

int main() {
    printf("Hello, World!\n");
    return 0;
}

Step 2: Compilation

We’ll compile this program using GCC:

gcc -o hello hello.c

This command creates an executable ELF file named “hello”.

Step 3: Examining the ELF File

Now, let’s use some tools to examine the structure of our ELF file:

Using readelf to view the ELF header:

readelf -h hello

This command will display information about the ELF header, including the entry point, program header offset, and section header offset.

Viewing program headers:

readelf -l hello

This shows the program headers, which describe how the program should be loaded into memory.

Examining section headers:

readelf -S hello

This command displays information about the various sections in the ELF file.

Step 4: Disassembly

To see the actual machine code generated from our C program, we can use objdump:

objdump -d hello

This will show us the disassembled machine code, which might look something like this (simplified for clarity):

0000000000001149 <main>:
    1149:       55                      push   %rbp
    114a:       48 89 e5                mov    %rsp,%rbp
    114d:       48 8d 3d b0 0e 00 00    lea    0xeb0(%rip),%rdi
    1154:       e8 f7 fe ff ff          call   1050 <puts@plt>
    1159:       b8 00 00 00 00          mov    $0x0,%eax
    115e:       5d                      pop    %rbp
    115f:       c3                      ret

This assembly code represents the machine instructions that will be executed when our program runs.

Advanced Topics in ELF

Dynamic Linking

ELF’s support for dynamic linking is one of its most powerful features. Dynamic linking allows programs to use shared libraries, which are loaded at runtime. This mechanism saves memory and disk space, as multiple programs can share the same library code.

The dynamic linker uses information stored in the .dynamic section of the ELF file to resolve symbols and load necessary shared libraries. Here’s a simplified view of how dynamic linking works:

The program is loaded into memory.
The dynamic linker (usually /lib64/ld-linux-x86-64.so.2 on 64-bit Linux systems) is invoked.
The dynamic linker reads the .dynamic section to determine which shared libraries are needed.
It loads the required shared libraries into memory.
It performs any necessary relocations, resolving symbols between the program and the shared libraries.
Control is transferred to the program’s entry point.

Relocations

Relocations are a crucial part of both static and dynamic linking. They allow the linker to adjust memory addresses in the code and data sections to reflect the actual runtime locations of various symbols.

Here’s a simplified C structure representing a relocation entry:

typedef struct {
    uint64_t r_offset;  // Location to apply the relocation action
    uint64_t r_info;    // Symbol table index and type of relocation
} Elf64_Rel;

Symbol Tables

Symbol tables are essential for linking and debugging. They contain information about functions and variables in the program. Each symbol table entry typically includes:

Name (index into the string table)
Value (often an address)
Size
Type and binding information
Associated section

Here’s a simplified C structure for a symbol table entry:

typedef struct {
    uint32_t st_name;   // Symbol name (index into string table)
    uint8_t  st_info;   // Symbol type and binding
    uint8_t  st_other;  // Symbol visibility
    uint16_t st_shndx;  // Section index
    uint64_t st_value;  // Symbol value
    uint64_t st_size;   // Symbol size
} Elf64_Sym;

Security Implications of ELF

Understanding the ELF format is crucial for security professionals and system administrators. Many security features and exploit mitigation techniques are implemented at the ELF level:

Address Space Layout Randomization (ASLR): This technique randomizes the memory addresses used by a program, making it harder for attackers to predict where specific parts of a program will be in memory.
Stack Canaries: These are values placed on the stack to detect stack buffer overflows. They’re typically implemented by the compiler and runtime, but understanding ELF is crucial for analyzing how they work.
Relocation Read-Only (RELRO): This technique makes some binary sections read-only after relocations have been performed, reducing the attack surface for certain types of exploits.
Non-Executable Stack: This security feature, often implemented through the NX bit, prevents code execution from the stack, mitigating certain types of buffer overflow attacks.

Conclusion

The ELF format is a cornerstone of modern Unix-like operating systems, providing a flexible and powerful format for executable files, shared libraries, and object code. Its design allows for efficient loading and execution of programs, while also supporting advanced features like dynamic linking and position-independent code.

Understanding ELF is crucial for systems programmers, security professionals, and anyone working on low-level software development for Unix-like systems. As we’ve seen, ELF is not just a file format—it’s a complex ecosystem that ties together compilation, linking, loading, and execution in a cohesive and efficient manner.

As computing continues to evolve, the principles embodied in ELF—modularity, flexibility, and efficiency—will undoubtedly continue to influence the design of binary formats and system software for years to come.

Table of Contents