Let’s break this down in two parts:
mul
for unsigned, imul
for signed) that produce a 128-bit result in RDX:RAX. This is convenient since it handles the entire 128-bit product in one go.The x86-64 instruction set can do 64-bit × 64-bit to 128-bit using a single instruction. Specifically, you have the mul
instruction for unsigned multiplication and the imul
instruction for signed multiplication (each taking one operand from RAX
and the other as an explicit register/value). In effect, you can say that to some degree there is support for 128-bit integer arithmetic because the hardware can produce a full 128-bit result in RDX:RAX
from two 64-bit operands.
If an architecture does not provide such an instruction, you would need multiple steps to emulate 64-bit × 64-bit to 128-bit multiplication. This efficiency is one reason why 128-bit to 128-bit operations can be performed with a relatively small number of instructions on x86-64.
For example, if we look at a simple function in C/C++ that multiplies two 128-bit numbers:
__int128 mul(__int128 a, __int128 b) {
return a * b;
}
GCC may generate assembly like this:
imulq %rdx, %rsi
movq %rdi, %rax
imulq %rdi, %rcx
mulq %rdx
addq %rsi, %rcx
addq %rcx, %rdx
Here, one of those lines (for instance mulq %rdx
) is a 64-bit × 64-bit → 128-bit instruction. The others handle the partial products, storing them in temporary registers, and properly summing them into the final 128-bit value. You’ll notice two additional 64-bit multiplication instructions (imulq
) that produce lower 64-bit results and two 64-bit additions (addq
). Even so, the presence of a single 64-bit × 64-bit → 128-bit instruction streamlines the process considerably compared to a pure software approach that must work in smaller chunks with no hardware multiply support at all.
Final Answer
Modern x86-64 processors do not provide full native 128-bit general-purpose registers with a comprehensive set of integer operations (add, subtract, multiply, etc. as single instructions). However, they do offer partial hardware support in the form of a single-instruction 64×64 → 128 multiply (mul
/imul
), and they can handle 128-bit data in SSE/AVX registers, albeit as vectors. So while you can’t say they have complete 128-bit integer hardware support (like they do for 32 or 64 bits), the existence of 64×64 → 128 multiplication instructions and SIMD capabilities make 128-bit arithmetic more efficient than purely software-based emulation.