Karim Belabas on Fri, 08 Jul 2022 11:26:03 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Hilbert class polynomial roots mod q


* Andreas Enge [2022-07-07 21:34]:
> Hello Joachim,
> 
> Am Thu, Jul 07, 2022 at 08:16:44PM +0200 schrieb pari@jvdsn.com:
> > * PARI/GP polclass: out of memory after about 20 minutes (PARI stack size
> > set to 9100001280 (8678.438 Mbytes))
> > * Sutherland's classpoly without providing q: 149.9s
> > * Sutherland's classpoly with providing q: 129.2s
> > 
> > As you can see, Sutherland's implementation is somewhat faster if q is
> > provided.
> 
> indeed, somewhat faster, but not radically so. In any case, Sutherland's
> implementation is really fast, with or without q. And if I am not mistaken,
> it is not even parallelised.
> 
> My CM software needs 2700s (on one core of my laptop) for this polynomial
> of degree 38022 and with a precision of 57641 bits (which tends to be very
> close to the size of the largest coefficient, which has 52360 bits here). 
> The text file storing the polynomial in GP format takes about 500MB.
> 
> > Perhaps another question is why the PARI/GP implementation is so
> > inefficient compared to the original implementation it was adapted from.
> 
> It should definitely not run out of memory on this size of polynomial.
> I would call this a bug. Or at least an opportunity for improvement!

I just fixed that memory problem in the 'master' branch:

  polclass(-4506399751,1);

is still slow (23 minutes on my laptop) but requires less than 2GB,
in fact a little more than 1GB. The output size (internal binary format, not
text) is 231MB, so this is OK.

As far as speed is concerned, I see where the bottleneck is but am not
familiar with the code either. We're computing modulo 1681 (32-bit)
primes, which is enough to recognize 53792 bit coefficients, even closer
to the optimal 52360 compared to 'cm'.

The computation modulo each such prime p is "too slow", which in turn
boils down to computing roots mod p of small degree polynomials (say T)
and in fact just computing x^p mod T is the bottleneck (deg T ~ 5, p ~ 2^32);
not sure why we lose a factor 10 or so there, compared to Sutherland's code.

Cheers,

    K.B.
--
Karim Belabas, IMB (UMR 5251), Université de Bordeaux
Vice-président en charge du Numérique
T: (+33) 05 40 00 29 77; http://www.math.u-bordeaux.fr/~kbelabas/
`