Karim Belabas on Fri, 24 Aug 2012 15:57:44 +0200

 Re: How to do fast integer division

```* Dirk Laurie [2012-08-24 13:31]:
> I have an application that does lots of integer divisions with
> large operands but small quotients.  As a model for that,
> I handcoded the extended Euclidean algorithm as follows:
>
> {mbezout(a,b)= my(X); X=[1,0,a;0,1,b]; if(a<b, i=2;j=1, j=2;i=1);
>   while(X[j,3]>0, X[i,] -= truncate(X[i,3]/X[j,3])*X[j,];
>     i=j; j=3-i);
>   X[i,]
> }
>
> This code is pretty standard

Even implemented properly (see below), this would be quadratic, whereas
PARI's bezout is softly-linear. So the built-in function is bound to be
faster.

>, so I was astounded to see how slow it is.  Timing on 10000-digit numbers:
>
> ? b1=bezout(a1,a2);
> time = 16 ms.
>
> ? b3=mybezout(a1,a2);
> time = 38,622 ms.
>
> That's a factor of over 2000 slower, not to be explained
>
> Suspecting that "truncate(X[i,3]/X[j,3])" hides a multitude
> of sins (starting with reduction to lowest terms)

Indeed, reduction to lowest terms induces an extra gcd at each
iteration. For n-bit numbers the classical gcd algorithm runs in time
O(n^2), because of a "miracle" with a telescoping series ( the naive
cost analysis yields O(n) loops * O(n^2) divisions... ).

Your variant does not benefit from this and should run in time O(n^3);
it achieves slightly better than cubic time because our integer

> I changed it to "truncate(X[i,3]*1./X[j,3])" and the time dropped down
> to 200ms.  There is a tiny chance that this is not exact, so I did
> "divrem(X[i,3],X[j,3])[1]" instead and got 221ms.
>
> Is that the closest one can get to a fast integer quotient in Pari/GP?

X[i,3] \ X[j,3] would be better.

In principle, an even better (still quadratic) implementation would be

BEZOUT(A, B) =
{ my (a = A, b = B);
while(b,
my (q, r, v = divrem(a,b));
q = v[1];
a = b;
b = v[2];

r = u;
u = v;
v = u - q*v;
);
[u, (a - A*u) \ B]
}

It would save
- half the matrix updates in exchange for the final (a - A*u) \ B
- one further multiplication in each iteration; since a - q*b = r, which has
just been computed as part of the Euclidean division.

In practice, this is not much faster than the corrected version

mbezout(a,b)=
{ my(X); X=[1,0,a;0,1,b]; if(a<b, i=2;j=1, j=2;i=1);
while(X[j,3]>0, X[i,] -= (X[i,3] \ X[j,3]) *X[j,];
i=j; j=3-i);
X[i,]
}

Cheers,

K.B.

P.S.: In the 'testing' branch, one can simultaneously assign to more than one
variable using the

[v1, v2, ..., vn] = 'some_vector of length >= n'

construct; as in

[q, r] = divrem(a, b)

or

GCD(a, b) = while(b, [a, b] = [b, a % b]); a;

This alllows to simplify a little the BEZOUT implementation.

--
Karim Belabas, IMB (UMR 5251)  Tel: (+33) (0)5 40 00 26 17
Universite Bordeaux 1          Fax: (+33) (0)5 40 00 69 50
351, cours de la Liberation    http://www.math.u-bordeaux1.fr/~belabas/
F-33405 Talence (France)       http://pari.math.u-bordeaux1.fr/  [PARI/GP]
`

```