The Multiplicative Tier: Geometric Scaling and 64-bit Synergy in Mercury

In our last post, we explored the Additive Tier, establishing how Mercury handles direct linear steps. Today, we step up to the second level of the mathematical architecture: The Multiplicative Tier.

At this tier, we move away from simple counting and enter the realm of geometric scaling.

The Theory of Geometric Scaling

It is common to teach multiplication simply as “fast addition”—and while that is functionally true for small integers, it is the wrong mental model for arbitrary-precision engines. Multiplication applies one value as a scale to the entire magnitude of another.

Because multiplication is symmetrical (the order of factors does not change the product), its inverse operation is used to solve for either of the original variables:

$$C = A \cdot B \implies B = \frac{C}{A} \quad \text{and} \quad A = \frac{C}{B}$$

In our Sigma Language notation, we express this structural relationship linearly:

C = A * B; (Multiplicative combination)
B = C / A; (Solving for the right ratio)
A = C / B; (Solving for the left ratio)

Just as subtraction was the tool to find a missing additive difference, division is strictly the tool to find the missing scale. Before we can look at division, however, we need to look at how Mercury handles that massive geometric expansion in silicon.

32-bit Places in a 64-bit World

When Mercury hands a scaling operation off to mercuryAbsMul, the elegance of using a base-$2^{32}$ positional format shines through.

If we multiply two full 32-bit places together, the largest possible product is still less than 2^64, so a 64-bit register can hold the entire intermediate product. Because modern processors feature 64-bit Arithmetic Logic Units (ALUs), we can multiply two base-$2^{32}$ “places” together and capture the entire result natively, without overflowing the hardware register.

Here is the inner engine of mercuryAbsMul that performs this feat:

// Iterate through the places of variable 'b'
for (int bi = bs; bi <= br; bi++) {
    uint bx = b[2 + bi + bq]; // Load the current place of 'b'
    
    int ai;
    ulong reg = 0; // The 64-bit accumulator register

    // Multiply against the places of variable 'a'
    for (ai = as; ai <= ar; ai++) {
        int place = ai + bi - adj;

        if (place >= 0) {
            uint ax = a[2 + ai + aq]; // Load the current place of 'a'

            // Multiply the two 32-bit places, add the carry, and add the existing scratch value
            reg += (ulong) bx * (ulong) ax + scratch[place];

            // Store the bottom 32 bits as the answer for this place
            scratch[place] = (uint) reg;
            
            // Shift the register right to push the top 32 bits forward as the carry
            reg >>= 32;
        } else {
            reg = 0;
        }
    }
    
    // Process any remaining carry for this row
    int place = ai + bi - adj;
    if (place >= 0) {
        reg += scratch[place];
        scratch[place] = (uint) reg;
    }
}

The Accumulator and Scratch Staging

The magic happens inside that innermost for loop. We are not just multiplying bx and ax. We are accumulating three distinct values into our 64-bit reg:

The product of the two current 32-bit places.
The carry over from the previous place.
The value already sitting in the scratch array from earlier multiplication passes.

Once those are added together, the logic mirrors our Additive Tier perfectly. The bottom 32 bits are saved to the scratch array (scratch[place] = (uint) reg;), and the top 32 bits are shifted forward for the next loop (reg >>= 32;).

Once again, by staging all of this geometric expansion safely inside the scratch stack, Mercury ensures that the output variable (val) is never corrupted mid-calculation.

Why Two 32-bit Places Need 64 Bits

There is a deeper reason that Mercury uses a 64-bit register for multiplication. It is not only because the hardware happens to provide one. It is because multiplication combines magnitudes, and the size of the result is measured by adding the places of the factors.

In a positional number system, a digit does not stand alone. A digit at place i represents:

aᵢ × B^i

and a digit at place j represents:

bⱼ × B^j

When those two places are multiplied, their coefficients multiply, but their places add:

(aᵢ × B^i) × (bⱼ × B^j)
= (aᵢ × bⱼ) × B^(i + j)

That addition in the exponent is not a coincidence. The Multiplicative Tier is built on top of the Additive Tier. Multiplication scales magnitudes, but the placement of the resulting product is still governed by additive displacement.

For Mercury, the base is B = 2^32. Each place can hold any value from 0 to 2^32 - 1. The largest possible single-place product is therefore:

(2^32 - 1) × (2^32 - 1)
= 2^64 - 2^33 + 1

That value is smaller than 2^64, so it fits completely inside an unsigned 64-bit register. Mercury can multiply two full 32-bit places without losing a single bit of the product.

Once the product is staged in the 64-bit register, Mercury follows the same carry discipline introduced in the Additive Tier: the low 32 bits become the current output place, and the high 32 bits move forward as carry.

The Bridge to Division

If multiplication is scaling up by multiplying 32-bit places natively, division is the process of scaling down.

However, division is notoriously expensive for a CPU. Instead of guessing factors and repeatedly subtracting (which would take an eternity for arbitrary-precision numbers), Mercury uses a brilliant algorithmic shortcut to solve the inverse ratio: the nibble-sized precomputed table.

In the next post, we will look at how mercuryAbsDiv pre-computes the entire search space of the dividend, turning massive division problems into highly efficient lookup-and-subtract loops.

JWCEssentials on GitHub

JWCEssentials/C/Mercury/Mercury.c