Let’s break this down in two parts:

  • What do we mean by “hardware support for 128-bit integers”?
  • How do modern CPUs (particularly x86-64) handle 128-bit values?

1. What do we mean by “hardware support for 128-bit integers”?

  • Native 128-bit integer operations: If a CPU has true 128-bit registers in its general-purpose register file, it would include single-instruction operations such as 128-bit addition or 128-bit subtraction. This is what we typically mean by “native” support.
  • Partial 128-bit hardware support: Many architectures offer a subset of 128-bit functionality. For instance, an instruction that multiplies two 64-bit values and yields a 128-bit result (stored in two 64-bit registers) is often considered partial hardware support for 128-bit integers.
  • SIMD registers: On several modern architectures (including x86-64), there are vector/SIMD registers of 128 bits or more (SSE, AVX, or AVX-512). While these registers can handle 128-bit or wider integer operations, they are typically optimized for parallel processing of smaller-element vectors rather than acting like a single 128-bit general-purpose integer register.

2. How do modern CPUs (particularly x86-64) handle 128-bit values?

  • 64-bit × 64-bit → 128-bit multiplication
    • x86-64 has single-instruction multiply variants (mul for unsigned, imul for signed) that produce a 128-bit result in RDX:RAX. This is convenient since it handles the entire 128-bit product in one go.
    • Other architectures may not offer this in a single instruction, necessitating multiple “partial” operations to get the full 128-bit result.
  • Other 128-bit integer operations (add, subtract, logical operations, etc.)
    • In x86-64’s general-purpose register file, most instructions top out at 64-bit operands. Doing a full 128-bit addition or subtraction requires combining instructions (e.g., one 64-bit add followed by an add-with-carry instruction).
    • If performance or code density is a concern, compilers will emit sequences of multiple instructions to emulate 128-bit operations.
  • SIMD/AVX registers for 128-bit or wider integers
    • Modern x86-64 processors have SSE (128-bit), AVX (256-bit), and AVX-512 (512-bit) registers. These can perform integer operations on concurrently packed data elements. However, they are typically used as vector registers rather than as single “general-purpose” 128-bit integer registers with a full set of integer instructions (like add, sub, mul, div) in the same sense as 64-bit GPRs.

I was thinking along these lines:

The x86-64 instruction set can do 64-bit × 64-bit to 128-bit using a single instruction. Specifically, you have the mul instruction for unsigned multiplication and the imul instruction for signed multiplication (each taking one operand from RAX and the other as an explicit register/value). In effect, you can say that to some degree there is support for 128-bit integer arithmetic because the hardware can produce a full 128-bit result in RDX:RAX from two 64-bit operands.

If an architecture does not provide such an instruction, you would need multiple steps to emulate 64-bit × 64-bit to 128-bit multiplication. This efficiency is one reason why 128-bit to 128-bit operations can be performed with a relatively small number of instructions on x86-64.

For example, if we look at a simple function in C/C++ that multiplies two 128-bit numbers:

__int128 mul(__int128 a, __int128 b) {
  return a * b;
}

GCC may generate assembly like this:

imulq   %rdx, %rsi
movq    %rdi, %rax
imulq   %rdi, %rcx
mulq    %rdx
addq    %rsi, %rcx
addq    %rcx, %rdx

Here, one of those lines (for instance mulq %rdx) is a 64-bit × 64-bit → 128-bit instruction. The others handle the partial products, storing them in temporary registers, and properly summing them into the final 128-bit value. You’ll notice two additional 64-bit multiplication instructions (imulq) that produce lower 64-bit results and two 64-bit additions (addq). Even so, the presence of a single 64-bit × 64-bit → 128-bit instruction streamlines the process considerably compared to a pure software approach that must work in smaller chunks with no hardware multiply support at all.

Final Answer

Modern x86-64 processors do not provide full native 128-bit general-purpose registers with a comprehensive set of integer operations (add, subtract, multiply, etc. as single instructions). However, they do offer partial hardware support in the form of a single-instruction 64×64 → 128 multiply (mul/imul), and they can handle 128-bit data in SSE/AVX registers, albeit as vectors. So while you can’t say they have complete 128-bit integer hardware support (like they do for 32 or 64 bits), the existence of 64×64 → 128 multiplication instructions and SIMD capabilities make 128-bit arithmetic more efficient than purely software-based emulation.