Skip to content

BPFtrace and eBPF Tools Guide

Table of Contents

  1. Introduction
  2. System Layer Overview
  3. Tool Categories
  4. Detailed Tool Analysis
  5. Command Reference

Introduction

This guide covers the comprehensive set of bpftrace and eBPF tools available for Linux system analysis and performance monitoring across different layers of the system stack.

System Layer Overview

The tools are organized across these main layers: 1. Applications & Runtimes 2. System Libraries 3. System Call Interface 4. Kernel Subsystems: - VFS (Virtual File System) - Network Stack (Sockets, TCP/UDP, IP) - Scheduler - Virtual Memory - Device Drivers

Tool Categories

Application Level Tools

Tool Purpose Layer
opensnoop Trace file opens Application
statsnoop Trace stat() syscalls Application
syncsnoop Trace sync operations Application
bashreadline Trace bash commands Application
gethostlatency DNS latency analysis System Libraries

System Call Interface Tools

Tool Purpose Layer
syscount Count syscalls System Call
execsnoop Trace new processes System Call
killsnoop Trace kill() syscalls System Call
pidpersec New processes per second System Call

File System Tools

Tool Purpose Layer
vfscount VFS operation counts VFS
vfsstat VFS operation stats VFS
writeback Trace file writeback File Systems
xfsdist XFS operation latency File Systems
mdflush Trace md RAID flush events Volume Manager

Block Device Tools

Tool Purpose Layer
biosnoop Trace block I/O Block Device
biolatency Block I/O latency Block Device
bitesize Block I/O size analysis Block Device

Network Tools

Tool Purpose Layer
tcpconnect Trace TCP connections TCP/UDP
tcpaccept Trace TCP accepts TCP/UDP
tcpretrans Trace TCP retransmits TCP/UDP
tcpdrop Trace TCP drops TCP/UDP

CPU/Scheduler Tools

Tool Purpose Layer
cpuwalk CPU instruction analysis Scheduler
runqlat Run queue latency Scheduler
runqlen Run queue length Scheduler
offcputime Off-CPU analysis Scheduler

Memory Management Tools

Tool Purpose Layer
oomkill Trace OOM killer Virtual Memory
capable Trace capability checks System

Detailed Tool Analysis

Application Monitoring Tools

opensnoop

# Trace all file opens
opensnoop

# Trace specific process
opensnoop -p 1234

# Include stack traces
opensnoop --stack

# Filter by file name
opensnoop -n "*.txt"

statsnoop

# Trace all stat() calls
statsnoop

# Show failed stats only
statsnoop -x

# Filter by process name
statsnoop -n "nginx"

# Include extended details
statsnoop -v

bashreadline

# Trace all bash commands
bashreadline

# Include timestamps
bashreadline -t

# Trace specific shell PID
bashreadline -p 1234

Network Analysis Tools

tcpconnect

# Trace all TCP connections
tcpconnect

# Show port numbers
tcpconnect -p

# Include timestamps
tcpconnect -t

# Filter by port
tcpconnect -P 80

tcpretrans

# Trace TCP retransmissions
tcpretrans

# Include TCP state
tcpretrans -s

# Show stack traces
tcpretrans --stack

# Filter by IP
tcpretrans -i 192.168.1.1

File System Analysis

vfscount

# Count VFS operations
vfscount

# Group by operation type
vfscount -g

# Include stack traces
vfscount --stack

writeback

# Trace file writeback
writeback

# Show per-device stats
writeback -d

# Include process info
writeback -p

Block Device Analysis

biosnoop

# Trace block I/O
biosnoop

# Show queued time
biosnoop -q

# Filter by device
biosnoop -d sda

# Include process info
biosnoop -p

biolatency

# Show block I/O latency
biolatency

# Use microsecond units
biolatency -u

# Create histogram
biolatency -h

# Filter by device
biolatency -d sda

CPU and Scheduler Analysis

runqlat

# Show run queue latency
runqlat

# Use microsecond units
runqlat -u

# Filter by CPU
runqlat -c 0

# Create histogram
runqlat --hist

offcputime

# Trace off-CPU time
offcputime

# Filter by process
offcputime -p 1234

# Set duration
offcputime -d 10

# Include user stacks
offcputime -u

Command Reference

General Options

Most bpftrace tools support these common options:

-h          # Show help message
-v          # Verbose output
-d          # Debug output
-p PID      # Filter by process ID
-t          # Include timestamps
--stack     # Show stack traces

Advanced Usage

Custom Scripts

# Create custom bpftrace script
cat > custom.bt << 'EOF'
#!/usr/bin/bpftrace
tracepoint:syscalls:sys_enter_open
{
    printf("%s opened %s\n", comm, str(args->filename));
}
EOF

# Run custom script
bpftrace custom.bt

Performance Monitoring

# Monitor system calls
syscount -i 1

# Monitor process creation
pidpersec -i 5

# Track OOM kills
oomkill -t

Best Practices

  1. Resource Usage
  2. Be cautious with stack traces in production
  3. Use sampling for high-frequency events
  4. Monitor overhead with top/htop

  5. Filtering

  6. Use specific filters to reduce overhead
  7. Combine multiple conditions when possible
  8. Consider using time-based filters

  9. Output Control

  10. Use appropriate output formats
  11. Consider logging to files for analysis
  12. Use aggregation for high-volume data

  13. Troubleshooting

  14. Start with broad tools
  15. Narrow down to specific events
  16. Use multiple tools for correlation

Performance Considerations

Overhead Management

# Reduce overhead with sampling
biolatency --sample-rate 10

# Use efficient filters
opensnoop -n '*.log'

# Limit stack traces
tcpconnect --stack --stack-storage-size 1024

Production Usage

  1. Test tools in development first
  2. Use appropriate filtering
  3. Monitor system impact
  4. Set appropriate buffer sizes
  5. Use time-based execution limits

Common Use Cases

Performance Analysis

# Analyze disk I/O
biolatency -h
biosnoop -p

# Network performance
tcpretrans -s
tcpconnect -t

# CPU scheduling
runqlat --hist
offcputime -p 1234

Troubleshooting

# File system issues
opensnoop -t
vfscount

# Network problems
tcpdrop
tcpretrans

# Memory issues
oomkill -t

Security Monitoring

# Track capability checks
capable -v

# Monitor process creation
execsnoop -t

# Track file access
opensnoop -t