Karim Belabas on Fri, 01 Aug 2008 02:43:59 +0200

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[arndt@jjj.de: Re: bugfix for charpoly with finite fields]

Hi pari-dev,  [ sorry, looong mail ]

  I received an interesting test-case from Joerg Arndt (simplified version attached).

  "The used function gauss_poly2() calls only pari/gp internals, else it
   does arithmetic with binary polynomials no more."

Some timings:

  ver 2.3.4:  11.9 s. / 11.8 s [GMP]

  ver 2.4.2:   360 s. / 356 s. [ GMP ]

\\ Anticipating a little bit on the semi-happy ending (the solution is awkward)
  current svn: 12.5s /  11.9s [ GMP ]

A simpler example:
  a = x^1771 + x^82 + x^8 + x;
  b = x^1449 + x^1448 + x^1446 + x^1412 + x^1411 + x^1408 + x^1406 + x^1403 + x^1366 + x^1362 + x^1360 + x^1317 + x^537 + x^500 + x^496 + x^494 + x^460 + x^459 + x^456 + x^454 + x^451 + x^411 + x^410 + x^408;
  T = x^1861 + 1;
  z = Mod(Mod(1,2), Mod(1,2)*T);

  (z * a) * (z * b)

is about 50 times slower in 2.4.2 (and onwards) than in 2.3.*

It's easy to identify the cause, but it can't be reverted:

1) historically, the t_INTMOD and t_POLMOD types were introduced as a
(reasonably) convenient user interface to construct "arbitrary" base rings in an
otherwise typeless language. They were very slow (and still are), but
did the job. All the basic polynomial arithmetic and linear algebra was
written purely in terms of generic elementary operations and no explicit
notion of a "base ring". 

2) with time a number of "heuristic" improvements (read "incorrect" but
OK in most practical circumstances) were added to improve functions like
Euclidean division in a polynomial ring K[X]. In that case, we basically
ignore zeroes and use some kind of sparse representation for the
divisor. Nobody cared too much about "minor" problems like

(01:09) gp > (x^2+Mod(1,3)*x+Mod(1,3)) % (x + Mod(0,2))
%1 = Mod(1, 3)
(01:09) gp > (x^2 + 100) % (x + Mod(0,2))
%2 = 100
(01:09) gp > Mod(0,2)*x
%3 = 0  \\ this is the integer 0 (t_INT). Base ring is lost
(01:09) gp > Mod(0,2)/x
%4 = 0

Some zeroes do carry information, even "exact zeroes".

3) in parallel, a large number of dedicated functions were written in libpari to
re-implement polynomial arithmetic over specific simple base rings
(like Z/nZ or Fp[x]/(T(x))), in a much more efficient way than could be
done using generic operations only, i.e. the function has a predefined
notion of a common base ring containing all coefficients it will handle.
Much simpler to program and optimize.

Unsurprisingly, all of libpari is now written in terms of these
functions ( except the part implementing generic arithmetic, which are
hardly ever used in libpari itself ). Unfortunately, it is hard for 
GP users to access these functions: the simple example above would be written

  \\ a,b,T as above
  Flx_to_ZX(Flxq_mul(ZX_to_Flx(a,2), ZX_to_Flx(b,2), ZX_to_Flx(T,2), 2))

Timings do improve: 539ms --> 16ms in 2.4.2, vs 12ms for the original
code in 2.3.4 (stable remains a little more efficient because Flx_rem
does not use sparse representation but Newton/FFT-based arithmetic in
this example; conversions are negligible)

4) In version 2.4.2, PARI (both libpari and GP) became much stricter
with the zeroes it would actually ignore, restricting severely the old
improvements / hacks such as 2). Fixing a large number of bugs at the
same time of course:

\\ current svn
(01:29) gp >  (x^2+Mod(1,3)*x+Mod(1,3)) % (x + Mod(0,2))
%1 = Mod(0, 1)
(01:29) gp >  (x^2 + 100) % (x + Mod(0,2))
%2 = Mod(0, 2)
(01:29) gp > Mod(0,2)*x
%3 = Mod(0, 2)
(01:29) gp > Mod(0,2)/x
%4 = Mod(0, 2)/x

5) In our case, this slows down immensely Euclidean division in (Z/2Z)[X]
since all the zeroes are actually Mod(0,2), which we are no longer free
to ignore, and we no longer get a useful sparse representation.

6) Of course it would be trivial to solve 5) if our routines knew the base
ring from the start and could answer precisely the question "what is zero ?".

As an experiment, I modified a few generic functions to do just that
(gdiv, gmul, gsqr operating on t_POLMODs) and try to determine a (very) simple
base ring before starting. In most cases we fail immediately and go on.
When we hit the jackpot and recognize Z/nZ, we call specialized
functions [ and unfortunately have to convert between representations,
twice: to get rid of t_INTMODs, then re-introduce them ]

7) This may slow down a little the library: most polynomials we use are
in Z[X], and we have to scan them to the end to make sure no coefficient
belongs to some Z/nZ [ for nothing: this will never happen in libpari:
we don't use INTMODs... ]

In libpari proper, this is not the case: no critical function uses the
generic operations anymore, and certainly not with t_POLMOD arguments,
in any case.

8) This is not a general solution, in the sense that Euclidean division
in slightly more complicated base rings and sparse polynomials of huge degrees
will still be abysmally slower than 2.3.*.  [ As far as I can see,
division is the worse case by far: other operations are much less
sensitive. ]

-- I do not want to revert to the old happy-go-lucky behaviour (always
algebraically acceptable, but very often semantically incorrect), 

-- It would be very, very nice if polynomials (and matrices) carried
with them their coefficient ring, without our trying to guess in each
single function where exactly we are working (often giving up).
Knowing precisely "what is 0" would be useful in many contexts: our
current approach is too conservative and misses simplifications, as
above. But I do not want to force users to specify base rings in advance
or use complicated install-ed functions from libpari; so I'm stuck there.

Maybe adding a "domain" tag to each object whenever we can safely do so
would be doable. But can't be done without wrecking backward compatibility.
(Or introducing completely new types.)

-- Catering for more rings "by hand" will complicate the code, for
probably marginal improvements.

Any bright idea ?

Karim Belabas, IMB (UMR 5251)  Tel: (+33) (0)5 40 00 26 17
Universite Bordeaux 1          Fax: (+33) (0)5 40 00 69 50
351, cours de la Liberation    http://www.math.u-bordeaux.fr/~belabas/
F-33405 Talence (France)       http://pari.math.u-bordeaux.fr/  [PARI/GP]
gauss_poly2(n, t)=
{  /* return field polynomial for type-t Gaussian normal basis */
  /* All computations over GF(2) */
  /* NOTE: slower than the algorithm using complex numbers */
    local(p, M, r, F, t21, t2, Z);
\\    if ( 0==gauss_test(n,t),  return(0) );
    p = t*n + 1;
\\    print("  n=", n, "  t=", t, ":  p=", p);
    r = znprimroot(p)^n;  \\ element of order t mod p
\\    print("  :: r=", r, "  ord(r)=", znorder(r), " ==  t=", t);

\\    M = sum(k=0, p-1, 'x^k);  \\ The polynomial modulus
    M = 'x^p - 1;  \\ Use redundant modulus
    M *= Mod(1,2);  \\ ... over GF(2)

    if ( 1==t,  return( sum(k=0, p-1, 'x^k) ) );  \\ for type 1

\\    print("  :: M =", lift(M));
    F = Mod(1, M);
    t21 = Mod(2,p);  t2 = Mod(1,p);
    for (k=1, n,
\\        print("  :: ------- k=", k);

\\        for(j=0, t-1, print( lift(Mod('x^lift(t2*r^j), M) ) ) );

\\        Z = sum(j=0, t-1, 'x^lift(t2*r^j));  \\ faster (but unclean?)
        Z = Mod( sum(j=0, t-1, 'x^lift(t2*r^j)), M);  \\ faster
\\        Z = sum(j=0, t-1, Mod('x^lift(t2*r^j), M) );  \\ fast
\\        Z = sum(j=0, t-1, Mod('x, M)^lift(t2*r^j) );  \\ slower
\\        print("  :: Z =", lift(component(Z, 2)));

        F = ('x+Z)*F;

\\        print(" :: w=", sum(j=0, t-1, 'x^lift(t2*r^j) ) );
\\        print(" :: w%=", lift(Mod(sum(j=0, t-1, 'x^lift(t2*r^j) ), sum(k=0, p-1, 'x^k) ) ) );
\\        print("  :: F =", lift(component(F,2)) );

        t2 *= t21;
    \\ final reduction for redundant modulus:
\\    M = sum(k=0, p-1, 'x^k);  \\ The polynomial modulus
\\    F = lift( Mod( lift(F), M) );

    \\ final reduction for redundant modulus (simplified):
    F = lift(F);
    if ( 0==polcoeff(F,0), F=sum(k=0, n, (1-polcoeff(F,k))*'x^k) );

    return ( F );
} /* ----- */

gauss_poly2(620, 3);