Karim Belabas on Fri, 24 Aug 2012 15:57:44 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: How to do fast integer division


* Dirk Laurie [2012-08-24 13:31]:
> I have an application that does lots of integer divisions with
> large operands but small quotients.  As a model for that,
> I handcoded the extended Euclidean algorithm as follows:
> 
> {mbezout(a,b)= my(X); X=[1,0,a;0,1,b]; if(a<b, i=2;j=1, j=2;i=1);
>   while(X[j,3]>0, X[i,] -= truncate(X[i,3]/X[j,3])*X[j,];
>     i=j; j=3-i);
>   X[i,]
> }
> 
> This code is pretty standard

Even implemented properly (see below), this would be quadratic, whereas
PARI's bezout is softly-linear. So the built-in function is bound to be
faster.

>, so I was astounded to see how slow it is.  Timing on 10000-digit numbers:
> 
> ? b1=bezout(a1,a2);
> time = 16 ms.
> 
> ? b3=mybezout(a1,a2);
> time = 38,622 ms.
> 
> That's a factor of over 2000 slower, not to be explained
> by overhead.
> 
> Suspecting that "truncate(X[i,3]/X[j,3])" hides a multitude
> of sins (starting with reduction to lowest terms)

Indeed, reduction to lowest terms induces an extra gcd at each
iteration. For n-bit numbers the classical gcd algorithm runs in time
O(n^2), because of a "miracle" with a telescoping series ( the naive
cost analysis yields O(n) loops * O(n^2) divisions... ).

Your variant does not benefit from this and should run in time O(n^3);
it achieves slightly better than cubic time because our integer
divisions are not quadratic :-)

> I changed it to "truncate(X[i,3]*1./X[j,3])" and the time dropped down
> to 200ms.  There is a tiny chance that this is not exact, so I did
> "divrem(X[i,3],X[j,3])[1]" instead and got 221ms.
> 
> Is that the closest one can get to a fast integer quotient in Pari/GP?

X[i,3] \ X[j,3] would be better.

In principle, an even better (still quadratic) implementation would be

  BEZOUT(A, B) =
  { my (a = A, b = B);
    while(b,
      my (q, r, v = divrem(a,b));
      q = v[1];
      a = b; 
      b = v[2];

      r = u; 
      u = v; 
      v = u - q*v;
    );
    [u, (a - A*u) \ B]
  }

It would save 
- half the matrix updates in exchange for the final (a - A*u) \ B
- one further multiplication in each iteration; since a - q*b = r, which has
  just been computed as part of the Euclidean division.

In practice, this is not much faster than the corrected version

 mbezout(a,b)=
 { my(X); X=[1,0,a;0,1,b]; if(a<b, i=2;j=1, j=2;i=1);
   while(X[j,3]>0, X[i,] -= (X[i,3] \ X[j,3]) *X[j,];
     i=j; j=3-i);
   X[i,]
 }

Cheers,

    K.B.

P.S.: In the 'testing' branch, one can simultaneously assign to more than one
variable using the

  [v1, v2, ..., vn] = 'some_vector of length >= n'

construct; as in 

  [q, r] = divrem(a, b)

or

  GCD(a, b) = while(b, [a, b] = [b, a % b]); a;

This alllows to simplify a little the BEZOUT implementation.

-- 
Karim Belabas, IMB (UMR 5251)  Tel: (+33) (0)5 40 00 26 17
Universite Bordeaux 1          Fax: (+33) (0)5 40 00 69 50
351, cours de la Liberation    http://www.math.u-bordeaux1.fr/~belabas/
F-33405 Talence (France)       http://pari.math.u-bordeaux1.fr/  [PARI/GP]
`