The Division Stage-Setter: Pre-computed Tables and Reversing Spatial Geometry
In our previous look at the Multiplicative Tier, we established that multiplication is fundamentally a spatial expansion. When multiplying two numbers, the resulting memory footprint respects an $(X+Y) – 1 + 1$ relationship. The baseline width expands outward from the overlapping identity places, with a final potential carry determining the absolute maximum width.
Division must respect this exact same spatial geometry—but in reverse. Instead of expanding outward, division requires aligning a fixed-width window (the divisor) with the most significant places of the dividend, and systematically sliding it down toward the identity place.
The Grade-School Analogy: Why Division is Hard for a CPU
When we perform long division on paper—for example, $7450 \div 25$—we do not repeatedly subtract 25 from 7450. We align the 25 under the 74 and ask, “How many times does this fit?” We immediately know the answer is roughly 2 or 3 because we have a memorized 1-9 multiplication table in our heads.
A CPU does not have a memorized multiplication table for arbitrary, multi-place base-$2^{32}$ numbers. If we force the computer to blindly guess and repeatedly subtract to find the ratio, division becomes incredibly slow and computationally expensive.
The Mercury Solution: The Pre-computed Table
To solve this, mercuryAbsDiv executes a brilliant algorithmic shortcut: it builds its own “memorized” multiplication table inside the scratch memory before it ever starts dividing.
Creating a full base-$2^{32}$ table would require 4.2 billion entries, which is physically impossible. Instead, Mercury slices the 32-bit places down into 4-bit “nibbles.” This reduces the required search space to just 16 entries ($0$ through $15$).
Here is the exact C code where Mercury learns its “times tables” by repeatedly adding the divisor to populate the base nibble level:
// Setup the base level (q=0) for the first nibble
mercuryLoadZero(stack, Precision, Table[0][0]);
mercuryLoadMercury(stack, Precision, divisor, Table[0][1]);
// Build 2 through 15 by repeatedly adding the divisor
for (int i = 2; i < 16; i++) {
mercuryAdd(stack, Precision, Table[0][i-1], divisor, Table[0][i]);
}
Expanding the Grid and Sliding the Window
Once Mercury knows how the divisor scales from $1$ to $15$, it leverages our spatial alignment rules to expand that knowledge across the entire 32-bit place.
Because a 32-bit place consists of 8 discrete 4-bit nibbles, Mercury simply takes that first 16-entry table and shifts it across the remaining 7 levels.
// Shift the base table across the remaining 7 nibble positions
for (int q = 1; q < 8; q++) {
mercuryLoadZero(stack, Precision, Table[q][0]);
for (int i = 1; i < 16; i++) {
// Shift each pre-multiplied value left by 4 bits (q * 4)
mercuryShift(stack, Precision, Table[0][i], q * 4, Table[q][i]);
}
}
The bit-shift math here is elegantly simple. The variable q represents the nibble index (from $1$ to $7$). Since each nibble is exactly 4 bits wide, shifting the base table by $q \cdot 4$ perfectly aligns the pre-multiplied values with the 4th, 8th, 12th, and eventually the 28th bit of the 32-bit word.
By generating this Table[8][16] grid, Mercury turns guessing into a bounded lookup: at each nibble position, there are only sixteen possible candidates to test. It looks at a nibble of the dividend, checks its custom table, grabs the largest pre-multiplied value that fits, and subtracts it. Because the table was built relative to the divisor, the engine can flawlessly slide these pre-calculated chunks across the dividend until it reaches the end.
A massive, highly complex arbitrary-precision division problem is elegantly reduced to a highly efficient lookup-and-subtract loop.
q * 4 is not arbitrary code. It is the direct mechanical translation of eight 4-bit nibbles inside one 32-bit place.
Ready to Strike We now have a fully populated Table[8][16] sitting safely in our scratch memory. By shifting the base nibble values across the 32-bit geometry, Mercury has effectively memorized its times tables for this specific divisor.
But having the knowledge is only half the battle.
In our next post, we will unleash the execution engine. We will look at the triple-nested loop that puts this table to work, dynamically sliding our pre-computed grid across the dividend to turn a massive arbitrary-precision division problem into a lightning-fast sequence of pure lookups and subtractions.