Real-time operating systems (RTOS) are crucial for applications that require deterministic behavior and guaranteed response times. Linux, with its PREEMPT_RT patch, has become a viable option for real-time computing. This blog post provides a comprehensive technical overview of how real-time support is implemented in Linux, focusing on the key components and mechanisms that enable deterministic behavior.

Table of Contents

  1. Introduction to Real-Time Systems
  2. The PREEMPT_RT Patch
  3. Kernel Preemption
  4. High-Resolution Timers
  5. Threaded Interrupt Handlers
  6. Real-Time Mutexes (rt_mutex)
  7. Priority Inheritance
  8. Sleeping Spinlocks
  9. RCU (Read-Copy-Update)
  10. local_lock Mechanism
  11. Scheduling Latency and Testing
  12. Recent Developments and Mainline Integration
  13. Conclusion

Introduction to Real-Time Systems

Real-time systems are characterized by their ability to respond to events within strict time constraints. These systems are essential in various domains, including industrial automation, robotics, automotive systems, and aerospace applications. The key requirements for a real-time operating system include:

  1. Determinism: The system’s behavior must be predictable, with guaranteed response times for critical tasks.
  2. Low latency: The time between an event occurring and the system responding should be minimized and bounded.
  3. Preemptibility: Higher-priority tasks should be able to interrupt lower-priority tasks quickly.

Traditional Linux kernels, while offering good average-case performance, were not designed with these strict real-time requirements in mind. The PREEMPT_RT patch aims to address these limitations and transform Linux into a fully preemptible kernel suitable for real-time applications.

The PREEMPT_RT Patch

The PREEMPT_RT patch, also known as the Real-Time Linux patch, is a set of modifications to the Linux kernel that enhances its real-time capabilities. The main goals of this patch are:

  1. To make the kernel fully preemptible
  2. To minimize non-deterministic behavior
  3. To reduce latency and increase predictability

The patch achieves these goals through several key mechanisms, which we’ll explore in detail in the following sections.

Kernel Preemption

At the heart of real-time Linux is the concept of a fully preemptible kernel. This means that kernel code can be interrupted at almost any point to allow higher-priority tasks to run. The PREEMPT_RT patch makes several key changes to achieve this:

1. Replacing Spinlocks with RT-Mutexes

Most spinlocks in the kernel are converted to mutexes that support priority inheritance. This helps prevent priority inversion, a common problem in real-time systems. Here’s a simplified example of how a spinlock might be converted to an RT-mutex:

// Non-RT kernel
spin_lock(&my_lock);
// Critical section
spin_unlock(&my_lock);

// RT kernel
rt_mutex_lock(&my_rt_mutex);
// Critical section
rt_mutex_unlock(&my_rt_mutex);

2. Preemptible Critical Sections

Even code running in critical sections (protected by spinlocks in the non-RT kernel) becomes preemptible, with only a few exceptions. This is achieved by reimplementing locking primitives using rtmutexes.

3. Threaded Interrupt Handlers

Interrupt handlers are moved to their own kernel threads, allowing them to be preempted like any other thread. This significantly reduces the time during which interrupts are disabled.

4. Forced Thread Context

The PREEMPT_RT patch forces all interrupt handlers to run in thread context, except for those marked with the IRQF_NO_THREAD flag. This behavior can be enabled in mainline Linux without the PREEMPT_RT patch using the threadirqs kernel command line option, but there are some differences in the resulting behavior.

High-Resolution Timers

High-resolution timers are a crucial component of real-time Linux. They allow for precise timed scheduling and remove the dependency on the periodic scheduler tick (jiffies). The implementation includes:

  1. Separate infrastructures for high-resolution kernel timers and timeouts.
  2. User-space POSIX timers with high resolution.

The high-resolution timer subsystem provides nanosecond precision, which is essential for many real-time applications. It replaces the old timer wheel with a red-black tree of timers, allowing for more efficient insertion and removal of timer events.

Threaded Interrupt Handlers

In the PREEMPT_RT patch, most interrupt handlers are converted to run in their own kernel threads. This has several advantages:

  1. Interrupt handlers can be preempted, reducing latency for higher-priority tasks.
  2. Priority inheritance can be applied to interrupt handlers, preventing priority inversion.
  3. Interrupt handlers can sleep, simplifying their implementation in many cases.

The implementation works as follows:

  1. When an interrupt occurs, a small hard IRQ handler is executed, which wakes up the corresponding threaded IRQ handler.
  2. The threaded IRQ handler then runs as a normal kernel thread, subject to scheduling like any other thread.

This approach significantly reduces the time spent in hard IRQ context, where the system is non-preemptible.

Real-Time Mutexes (rt_mutex)

The rt_mutex is a fundamental building block of the PREEMPT_RT patch. It implements priority inheritance to prevent priority inversion. The key features of rt_mutex include:

  1. Support for priority inheritance
  2. Ability to sleep while waiting for the mutex
  3. Deadlock detection

The implementation of rt_mutex is complex, involving several helper functions and data structures. Here’s a simplified overview of how priority inheritance is implemented:

void rt_mutex_adjust_prio_chain(struct task_struct *task,
                                int deadlock_detect,
                                struct rt_mutex *orig_lock,
                                struct rt_mutex_waiter *orig_waiter,
                                struct task_struct *top_task)
{
    // Implementation details...
}

This function is called whenever the priority of a task holding an rt_mutex needs to be adjusted due to a higher-priority task waiting for the mutex.

Priority Inheritance

Priority inheritance is a key mechanism for preventing priority inversion in real-time systems. In the context of the PREEMPT_RT patch, priority inheritance is implemented for:

  1. RT-mutexes
  2. Sleeping spinlocks
  3. RW-locks

The basic idea is that when a high-priority task is blocked waiting for a resource held by a lower-priority task, the priority of the lower-priority task is temporarily boosted to that of the waiting task. This ensures that the resource is released as quickly as possible.

Sleeping Spinlocks

In non-PREEMPT_RT Linux, spinlocks are busy-wait locks that disable preemption. In the PREEMPT_RT patch, most spinlocks are converted to “sleeping spinlocks,” which are essentially rt_mutexes. The key differences are:

  1. Non-RT spinlock:
    • Task spins (busy-waits) until the lock is available
    • Preemption is disabled while holding the lock
    • Cannot sleep
  2. RT sleeping spinlock:
    • Task sleeps if the lock is not immediately available
    • Preemption is still possible while holding the lock
    • Can sleep

The implementation looks something like this:

// Non-RT kernel
spin_lock(&lock);
// Critical section
spin_unlock(&lock);

// RT kernel
rt_spin_lock(&lock);
// Critical section
rt_spin_unlock(&lock);

In the RT kernel, rt_spin_lock() and rt_spin_unlock() are mapped to rt_mutex operations, allowing for preemption and priority inheritance.

RCU (Read-Copy-Update)

RCU is a synchronization mechanism that allows for efficient read-mostly access to shared data structures. In the context of the PREEMPT_RT patch, RCU undergoes several changes:

  1. Preemptible RCU: The RCU implementation is made preemptible, allowing for lower latencies.
  2. RCU callbacks: Processing of RCU callbacks is moved to a separate thread, further reducing latency in critical paths.

The PREEMPT_RT patch eliminates RCU handling from all intermediate states and processes it only in its own thread. This change significantly reduces the potential for RCU operations to introduce latency in real-time tasks.

local_lock Mechanism

The local_lock is a synchronization primitive introduced in the PREEMPT_RT patch. It provides a way to protect per-CPU data structures while remaining preemptible. The key features of local_lock are:

  1. Per-CPU protection: Each CPU has its own lock, reducing contention.
  2. Preemptible: Unlike raw spinlocks, local_lock allows preemption.
  3. Lightweight: Designed for low-overhead synchronization of per-CPU data.

The implementation of local_lock can be found in kernel/locking/local_lock_internal.h. Here’s a simplified example of its usage:

DEFINE_LOCAL_IRQ_LOCK(my_lock);

local_lock(&my_lock);
// Critical section accessing per-CPU data
local_unlock(&my_lock);

Scheduling Latency and Testing

Scheduling latency is a critical metric for real-time systems. It represents the delay between an event occurring (such as an interrupt) and the corresponding task being scheduled to run. Several tools are available for measuring and analyzing scheduling latency in real-time Linux systems:

cyclictest

cyclictest is a widely used tool for measuring scheduling latency. It works by creating a high-priority thread that sleeps for a specified interval and then measures the actual time that elapsed. The difference between the expected and actual wake-up times gives the scheduling latency.

Here’s an example of running cyclictest:

# cyclictest --smp -p98 -m
T: 0 (23124) P:98 I:1000 C: 645663 Min:      2 Act:    4 Avg:    4 Max:      23
T: 1 (23125) P:98 I:1500 C: 430429 Min:      2 Act:    5 Avg:    3 Max:      23
T: 2 (23126) P:98 I:2000 C: 322819 Min:      2 Act:    4 Avg:    3 Max:      15
T: 3 (23127) P:98 I:2500 C: 258247 Min:      2 Act:    5 Avg:    4 Max:      32

This output shows the minimum, average, and maximum latencies observed on each CPU core.

hackbench

hackbench is a benchmark tool that simulates a chat server workload. It’s useful for generating a realistic load on the system while measuring latency. It creates multiple pairs of threads or processes that pass data between themselves over sockets or pipes.

hwlatdetect

hwlatdetect is used to detect hardware-induced latencies, particularly those caused by System Management Interrupts (SMIs). It works by disabling all interrupts and measuring gaps in the CPU timestamp counter.

Recent Developments and Mainline Integration

The PREEMPT_RT patch has been in development for many years, and significant portions of it have been gradually integrated into the mainline Linux kernel. Some key milestones include:

  1. Real-Time Clock (RTC) and high-resolution timers were merged in kernel 2.6.21 (2007).
  2. Priority inheritance for mutexes was added in kernel 2.6.18 (2006).
  3. Threaded interrupt handlers were introduced in kernel 2.6.30 (2009).

A major development occurred with the release of Linux 5.15 in 2021, which included a significant portion of the PREEMPT_RT patch. This integration allows users to enable full preemption support by setting the CONFIG_PREEMPT_RT kernel configuration option.

The commit that enabled this feature can be found in the Linux kernel Git repository:

commit 31d113a5fa44010a5c845f8f0c45b82abt72d1a1
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon May 10 17:03:48 2021 +0200

    Merge branch 'for-5.15/preempt-rt-core'

    Enable PREEMPT_RT by default on 5.15-rt and adjust the rest of the
    configuration accordingly.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

This commit marks a significant milestone in the integration of real-time features into the mainline Linux kernel, making it easier for users to access real-time capabilities without applying external patches.

Conclusion

Real-time Linux, enabled by the PREEMPT_RT patch and its gradual integration into the mainline kernel, provides a powerful platform for developing and running time-critical applications. The key features, including fully preemptible kernels, high-resolution timers, threaded interrupt handlers, and priority inheritance, work together to minimize latency and improve determinism.

While achieving true hard real-time performance on a general-purpose operating system like Linux remains challenging, the PREEMPT_RT patch and related developments have significantly improved Linux’s capabilities in this area. As more of these features are integrated into the mainline kernel, we can expect to see wider adoption of Linux in real-time and embedded systems.

For developers and system administrators working with real-time Linux, understanding these underlying mechanisms is crucial for optimizing system performance and ensuring that timing constraints are met. By leveraging the tools and techniques discussed in this post, you can effectively measure, analyze, and improve the real-time behavior of your Linux-based systems.