| Karim Belabas on Fri, 24 Aug 2012 15:57:44 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| Re: How to do fast integer division |
* Dirk Laurie [2012-08-24 13:31]:
> I have an application that does lots of integer divisions with
> large operands but small quotients. As a model for that,
> I handcoded the extended Euclidean algorithm as follows:
>
> {mbezout(a,b)= my(X); X=[1,0,a;0,1,b]; if(a<b, i=2;j=1, j=2;i=1);
> while(X[j,3]>0, X[i,] -= truncate(X[i,3]/X[j,3])*X[j,];
> i=j; j=3-i);
> X[i,]
> }
>
> This code is pretty standard
Even implemented properly (see below), this would be quadratic, whereas
PARI's bezout is softly-linear. So the built-in function is bound to be
faster.
>, so I was astounded to see how slow it is. Timing on 10000-digit numbers:
>
> ? b1=bezout(a1,a2);
> time = 16 ms.
>
> ? b3=mybezout(a1,a2);
> time = 38,622 ms.
>
> That's a factor of over 2000 slower, not to be explained
> by overhead.
>
> Suspecting that "truncate(X[i,3]/X[j,3])" hides a multitude
> of sins (starting with reduction to lowest terms)
Indeed, reduction to lowest terms induces an extra gcd at each
iteration. For n-bit numbers the classical gcd algorithm runs in time
O(n^2), because of a "miracle" with a telescoping series ( the naive
cost analysis yields O(n) loops * O(n^2) divisions... ).
Your variant does not benefit from this and should run in time O(n^3);
it achieves slightly better than cubic time because our integer
divisions are not quadratic :-)
> I changed it to "truncate(X[i,3]*1./X[j,3])" and the time dropped down
> to 200ms. There is a tiny chance that this is not exact, so I did
> "divrem(X[i,3],X[j,3])[1]" instead and got 221ms.
>
> Is that the closest one can get to a fast integer quotient in Pari/GP?
X[i,3] \ X[j,3] would be better.
In principle, an even better (still quadratic) implementation would be
BEZOUT(A, B) =
{ my (a = A, b = B);
while(b,
my (q, r, v = divrem(a,b));
q = v[1];
a = b;
b = v[2];
r = u;
u = v;
v = u - q*v;
);
[u, (a - A*u) \ B]
}
It would save
- half the matrix updates in exchange for the final (a - A*u) \ B
- one further multiplication in each iteration; since a - q*b = r, which has
just been computed as part of the Euclidean division.
In practice, this is not much faster than the corrected version
mbezout(a,b)=
{ my(X); X=[1,0,a;0,1,b]; if(a<b, i=2;j=1, j=2;i=1);
while(X[j,3]>0, X[i,] -= (X[i,3] \ X[j,3]) *X[j,];
i=j; j=3-i);
X[i,]
}
Cheers,
K.B.
P.S.: In the 'testing' branch, one can simultaneously assign to more than one
variable using the
[v1, v2, ..., vn] = 'some_vector of length >= n'
construct; as in
[q, r] = divrem(a, b)
or
GCD(a, b) = while(b, [a, b] = [b, a % b]); a;
This alllows to simplify a little the BEZOUT implementation.
--
Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17
Universite Bordeaux 1 Fax: (+33) (0)5 40 00 69 50
351, cours de la Liberation http://www.math.u-bordeaux1.fr/~belabas/
F-33405 Talence (France) http://pari.math.u-bordeaux1.fr/ [PARI/GP]
`