Chapter 10: Aliasing and Optimization
Pointer aliasing is one of the most significant barriers to compiler optimization. When multiple pointers potentially reference the same memory location, the compiler must make conservative assumptions that limit its ability to reorder operations, hoist calculations out of loops, or eliminate redundant memory accesses. This chapter explores the aliasing problem and how to overcome it.
Understanding Pointer Aliasing
Aliasing occurs when different pointers or references access the same memory location:
void update_values(int* a, int* b) {
*a = 10;
*b = 20; // If a and b alias, this modifies *a as well
int c = *a + 5; // Could be 15 or 25 depending on aliasing
}
Types of Aliasing
- Direct Aliasing: Multiple pointers explicitly point to the same memory location
- Parameter Aliasing: Function parameters point to overlapping memory regions
- Type-Based Aliasing: Different pointer types access the same memory
- Array Aliasing: Pointers to different elements in the same array
The Impact on Optimization
Loop Optimization Barriers
void saxpy(float* x, float* y, float a, int n) {
for (int i = 0; i < n; i++) {
y[i] = a * x[i] + y[i];
}
}
If x
and y
potentially overlap, the compiler cannot safely:
- Vectorize the loop (memory dependencies might exist)
- Reorder memory operations
- Hoist invariant calculations out of the loop
Parallelization Challenges
void process_array(int* input, int* output, int size) {
#pragma omp parallel for
for (int i = 0; i < size; i++) {
output[i] = process(input[i]);
}
}
Parallelization requires confidence that input
and output
don’t alias. Otherwise, parallel threads could modify data that other threads are reading, causing race conditions.
Complete Working Example: Aliasing Impact
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#define SIZE 10000000
#define ITERATIONS 10
// Version vulnerable to aliasing assumptions
void transform_standard(float* output, float* input, int size) {
for (int i = 0; i < size; i++) {
output[i] = input[i] * 2.0f + 1.0f;
}
}
// Version with restrict keyword (C99)
void transform_restrict(float* restrict output, float* restrict input, int size) {
for (int i = 0; i < size; i++) {
output[i] = input[i] * 2.0f + 1.0f;
}
}
// Version that manually avoids aliasing through copying
void transform_copy(float* output, float* input, int size) {
float* temp = malloc(size * sizeof(float));
memcpy(temp, input, size * sizeof(float));
for (int i = 0; i < size; i++) {
output[i] = temp[i] * 2.0f + 1.0f;
}
free(temp);
}
// Benchmark function
double benchmark_function(void (*func)(float*, float*, int), float* output, float* input, int size) {
clock_t start = clock();
for (int i = 0; i < ITERATIONS; i++) {
func(output, input, size);
}
clock_t end = clock();
return ((double)(end - start)) / CLOCKS_PER_SEC;
}
int main() {
float* data1 = malloc(SIZE * sizeof(float));
float* data2 = malloc(SIZE * sizeof(float));
// Initialize data
for (int i = 0; i < SIZE; i++) {
data1[i] = (float)rand() / RAND_MAX;
}
// Case 1: No aliasing
printf("Testing without aliasing:\n");
double standard_time = benchmark_function(transform_standard, data2, data1, SIZE);
double restrict_time = benchmark_function(transform_restrict, data2, data1, SIZE);
double copy_time = benchmark_function(transform_copy, data2, data1, SIZE);
printf("Standard version: %f seconds\n", standard_time);
printf("Restrict version: %f seconds\n", restrict_time);
printf("Copy version: %f seconds\n", copy_time);
// Case 2: With aliasing (in-place transformation)
printf("\nTesting with aliasing (in-place):\n");
standard_time = benchmark_function(transform_standard, data1, data1, SIZE);
restrict_time = benchmark_function(transform_restrict, data1, data1, SIZE);
copy_time = benchmark_function(transform_copy, data1, data1, SIZE);
printf("Standard version: %f seconds\n", standard_time);
printf("Restrict version: %f seconds (INVALID RESULT - violates restrict)\n", restrict_time);
printf("Copy version: %f seconds\n", copy_time);
free(data1);
free(data2);
return 0;
}
Strict Aliasing Rules
Most modern compilers implement “strict aliasing” rules, which define when pointers of different types are allowed to alias.
The Rules (C99/C++)
- Two pointers of the same type may alias
- A pointer to a type may alias with a pointer to a qualified (
const
,volatile
) version of the same type - A pointer to
char
orunsigned char
may alias with any other pointer type - A pointer to a structure type may alias with a pointer to any of its members
- A pointer to a union type may alias with a pointer to any of its members
Violating Strict Aliasing
// This violates strict aliasing rules
float f = 3.14f;
int* p = (int*)&f; // Accessing a float through an int pointer
printf("%d\n", *p); // Undefined behavior under strict aliasing
Compiler Flags and Strict Aliasing
# GCC/Clang: Disable strict aliasing optimizations
gcc -fno-strict-aliasing -O3 program.c
# Enable warnings about potential violations
gcc -Wstrict-aliasing -O3 program.c
The restrict
Keyword
The restrict
keyword, introduced in C99, provides a way to tell the compiler that pointers do not alias:
void vector_add(float* restrict a,
float* restrict b,
float* restrict result,
int size) {
for (int i = 0; i < size; i++) {
result[i] = a[i] + b[i];
}
}
With restrict
, the compiler can safely:
- Vectorize the loop
- Pipeline memory operations
- Reorder calculations
C++ and restrict
C++ doesn’t officially support restrict
, but most compilers provide equivalent extensions:
// GCC/Clang
void vector_add(__restrict__ float* a,
__restrict__ float* b,
__restrict__ float* result,
int size);
// MSVC
void vector_add(__restrict float* a,
__restrict float* b,
__restrict float* result,
int size);
Common Aliasing Pitfalls
1. Type Punning through Unions
// Common type punning pattern
union Converter {
float f;
int i;
};
float int_bits_to_float(int bits) {
union Converter c;
c.i = bits;
return c.f; // Accessing a union through different members
}
While unions provide a legal way to circumvent strict aliasing in C, they remain problematic in C++ before C++20.
2. Overlapping Arrays
void process_overlapping(int* a, int size, int offset) {
for (int i = 0; i < size; i++) {
a[i] = a[i + offset]; // Self-referential with potential overlap
}
}
When arrays overlap, the direction of the loop can matter:
// Safe for forward copying when dst < src
void memcpy_forward(char* dst, const char* src, size_t size) {
for (size_t i = 0; i < size; i++) {
dst[i] = src[i];
}
}
// Safe for backward copying when dst > src
void memcpy_backward(char* dst, const char* src, size_t size) {
for (size_t i = size; i > 0; i--) {
dst[i-1] = src[i-1];
}
}
3. Casting Between Incompatible Types
// Accessing a double array as a byte array is allowed
void zero_array(double* array, size_t size) {
unsigned char* bytes = (unsigned char*)array;
for (size_t i = 0; i < size * sizeof(double); i++) {
bytes[i] = 0;
}
}
// But accessing a double array as an int array violates strict aliasing
void bad_zero_array(double* array, size_t size) {
int* ints = (int*)array;
for (size_t i = 0; i < size * sizeof(double) / sizeof(int); i++) {
ints[i] = 0; // Undefined behavior
}
}
Compiler Optimization with Aliasing Information
Complete Working Example: Matrix Multiplication
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#define N 1000
// Standard matrix multiplication
void matrix_multiply_standard(double A[N][N], double B[N][N], double C[N][N]) {
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
double sum = 0.0;
for (int k = 0; k < N; k++) {
sum += A[i][k] * B[k][j];
}
C[i][j] = sum;
}
}
}
// Matrix multiplication with restrict
void matrix_multiply_restrict(double A[N][N], double B[N][N], double C[N][N]) {
double* restrict Ap = &A[0][0];
double* restrict Bp = &B[0][0];
double* restrict Cp = &C[0][0];
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
double sum = 0.0;
for (int k = 0; k < N; k++) {
sum += Ap[i*N + k] * Bp[k*N + j];
}
Cp[i*N + j] = sum;
}
}
}
// Matrix multiplication with tiling and restrict
void matrix_multiply_tiled_restrict(double A[N][N], double B[N][N], double C[N][N]) {
double* restrict Ap = &A[0][0];
double* restrict Bp = &B[0][0];
double* restrict Cp = &C[0][0];
#define TILE_SIZE 32
for (int i = 0; i < N; i += TILE_SIZE) {
for (int j = 0; j < N; j += TILE_SIZE) {
for (int k = 0; k < N; k += TILE_SIZE) {
// Process tile
for (int ii = i; ii < i + TILE_SIZE && ii < N; ii++) {
for (int jj = j; jj < j + TILE_SIZE && jj < N; jj++) {
double sum = Cp[ii*N + jj];
for (int kk = k; kk < k + TILE_SIZE && kk < N; kk++) {
sum += Ap[ii*N + kk] * Bp[kk*N + jj];
}
Cp[ii*N + jj] = sum;
}
}
}
}
}
}
int main() {
double (*A)[N] = malloc(N * N * sizeof(double));
double (*B)[N] = malloc(N * N * sizeof(double));
double (*C)[N] = malloc(N * N * sizeof(double));
// Initialize matrices
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
A[i][j] = (double)rand() / RAND_MAX;
B[i][j] = (double)rand() / RAND_MAX;
}
}
// Zero the result matrix
memset(C, 0, N * N * sizeof(double));
// Time standard multiplication
clock_t start = clock();
matrix_multiply_standard(A, B, C);
clock_t end = clock();
double standard_time = ((double)(end - start)) / CLOCKS_PER_SEC;
// Zero the result matrix
memset(C, 0, N * N * sizeof(double));
// Time multiplication with restrict
start = clock();
matrix_multiply_restrict(A, B, C);
end = clock();
double restrict_time = ((double)(end - start)) / CLOCKS_PER_SEC;
// Zero the result matrix
memset(C, 0, N * N * sizeof(double));
// Time multiplication with tiling and restrict
start = clock();
matrix_multiply_tiled_restrict(A, B, C);
end = clock();
double tiled_restrict_time = ((double)(end - start)) / CLOCKS_PER_SEC;
printf("Standard multiplication: %f seconds\n", standard_time);
printf("Restrict multiplication: %f seconds\n", restrict_time);
printf("Tiled restrict multiplication: %f seconds\n", tiled_restrict_time);
printf("Speedup from restrict: %.2fx\n", standard_time / restrict_time);
printf("Speedup from tiling+restrict: %.2fx\n", standard_time / tiled_restrict_time);
free(A);
free(B);
free(C);
return 0;
}
Practical Strategies for Dealing with Aliasing
1. Use the restrict
Keyword
void safe_memcpy(void* restrict dest, const void* restrict src, size_t n) {
char* d = dest;
const char* s = src;
for (size_t i = 0; i < n; i++) {
d[i] = s[i];
}
}
2. Local Copies to Avoid Aliasing
void process_data(int* input, int* output, int size) {
// Create local copies to avoid aliasing concerns
int local_input[256];
memcpy(local_input, input, size * sizeof(int));
for (int i = 0; i < size; i++) {
output[i] = process(local_input[i]);
}
}
3. Avoid Type Punning
// Instead of type punning with casts:
float f = 3.14f;
int bits = *(int*)&f; // Bad - strict aliasing violation
// Use memcpy for type punning:
float f = 3.14f;
int bits;
memcpy(&bits, &f, sizeof(float)); // Good - well-defined
4. Use Standard Library Functions
// Instead of manually copying:
for (int i = 0; i < size; i++) {
dst[i] = src[i];
}
// Use the standard library:
memcpy(dst, src, size * sizeof(int)); // The compiler knows memcpy doesn't allow aliasing
5. Leverage Compiler-Specific Attributes
// GCC/Clang: Inform the compiler about non-aliasing
__attribute__((malloc)) void* my_malloc(size_t size) {
return malloc(size);
}
// MSVC: Similar functionality
__declspec(restrict) void* my_malloc(size_t size) {
return malloc(size);
}
Real-World Example: BLAS-like Optimization
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#define N 10000000
// SAXPY implementation (y = ax + y) without alias handling
void saxpy_standard(float* y, float* x, float a, int n) {
for (int i = 0; i < n; i++) {
y[i] = a * x[i] + y[i];
}
}
// SAXPY implementation with restrict
void saxpy_restrict(float* restrict y, float* restrict x, float a, int n) {
for (int i = 0; i < n; i++) {
y[i] = a * x[i] + y[i];
}
}
// SAXPY implementation with vectorization hints
void saxpy_vectorized(float* restrict y, float* restrict x, float a, int n) {
#pragma GCC ivdep
for (int i = 0; i < n; i++) {
y[i] = a * x[i] + y[i];
}
}
int main() {
float* x = malloc(N * sizeof(float));
float* y = malloc(N * sizeof(float));
float* y_copy = malloc(N * sizeof(float));
// Initialize data
for (int i = 0; i < N; i++) {
x[i] = (float)rand() / RAND_MAX;
y[i] = (float)rand() / RAND_MAX;
}
// Keep a copy of y for verification
memcpy(y_copy, y, N * sizeof(float));
// Time standard SAXPY
clock_t start = clock();
saxpy_standard(y, x, 2.0f, N);
clock_t end = clock();
double standard_time = ((double)(end - start)) / CLOCKS_PER_SEC;
// Restore y for the next test
memcpy(y, y_copy, N * sizeof(float));
// Time SAXPY with restrict
start = clock();
saxpy_restrict(y, x, 2.0f, N);
end = clock();
double restrict_time = ((double)(end - start)) / CLOCKS_PER_SEC;
// Restore y for the next test
memcpy(y, y_copy, N * sizeof(float));
// Time SAXPY with vectorization hints
start = clock();
saxpy_vectorized(y, x, 2.0f, N);
end = clock();
double vectorized_time = ((double)(end - start)) / CLOCKS_PER_SEC;
printf("Standard SAXPY: %f seconds\n", standard_time);
printf("Restrict SAXPY: %f seconds\n", restrict_time);
printf("Vectorized SAXPY: %f seconds\n", vectorized_time);
printf("Speedup from restrict: %.2fx\n", standard_time / restrict_time);
printf("Speedup from vectorization: %.2fx\n", standard_time / vectorized_time);
// Test with aliasing case (x and y are the same)
printf("\nTesting with aliasing (x = y):\n");
// Time standard SAXPY with aliasing
memcpy(y, y_copy, N * sizeof(float));
start = clock();
saxpy_standard(y, y, 2.0f, N);
end = clock();
standard_time = ((double)(end - start)) / CLOCKS_PER_SEC;
// Time SAXPY with restrict (will produce incorrect results)
memcpy(y, y_copy, N * sizeof(float));
start = clock();
saxpy_restrict(y, y, 2.0f, N); // Violates restrict semantics!
end = clock();
restrict_time = ((double)(end - start)) / CLOCKS_PER_SEC;
printf("Standard SAXPY (aliased): %f seconds\n", standard_time);
printf("Restrict SAXPY (aliased, INCORRECT): %f seconds\n", restrict_time);
printf("WARNING: The restrict version with aliasing violates the contract\n");
printf(" and produces incorrect results!\n");
free(x);
free(y);
free(y_copy);
return 0;
}
Summary
Mastering aliasing is essential for high-performance code:
- Understand Aliasing Rules
- Learn and follow strict aliasing rules
- Be aware of language-specific aliasing semantics
- Know when aliasing can occur in your code
- Use Aliasing Guarantees
- Apply the
restrict
keyword appropriately - Leverage compiler-specific options and attributes
- Provide aliasing information where possible
- Apply the
- Implement Safe Practices
- Avoid type punning through pointer casts
- Use
memcpy()
for type conversions - Create local copies when aliasing might occur
- Prefer standard library functions with well-defined aliasing behavior
- Be Aware of Performance Implications
- Aliasing assumptions can significantly impact performance
- Consider aliasing when optimizing performance-critical code
- Watch for compiler warnings about aliasing issues
- Test and Measure
- Verify code correctness with and without aliasing
- Compare performance of different aliasing strategies
- Use compiler flags to control aliasing optimizations
By understanding and addressing aliasing concerns, you can unlock significant compiler optimizations and improve the performance of your code.