Igor Schein on Fri, 04 Jun 2004 19:49:20 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: round4 performance


On Thu, Jun 03, 2004 at 07:21:45PM +0200, Karim Belabas wrote:
> * Igor Schein [2004-06-02 22:57]:
> > On Wed, Jun 02, 2004 at 08:50:29PM +0200, Karim Belabas wrote:
> >> * Igor Schein [2004-05-17 17:38]:
> >> > On Wed, May 05, 2004 at 08:35:40PM +0200, Xavier-François Roblot wrote:
> >>>> Well, I have modified update_alpha (after Karim pointed out a strange
> >>>> behavior in this function) and that kind of miraculously speed up
> >>>> dramatically that example!... As you will see, the computing time is now
> >>>> very reasonable and it runs with a small stack too (I hope the result is
> >>>> still correct though, I haven't checked yet). Igor, Karim and I still
> >>>> have some ideas for improvements for nilord but you need some new bad
> >>>> polynomials to test them. Please send me your worst examples!
> >>>
> >>> As of current CVS, I have one:
> >>>
> >>> x^64 + 144*x^62 + 9552*x^60 + 390432*x^58 + 11080200*x^56 + 232989696*x^54 +
> >>>  3780238752*x^52 + 48636265248*x^50 + 505878824736*x^48 + 4313989216800*x^46
> >>>  + 30476092609440*x^44 + 179725400591616*x^42 + 889696224175824*x^40 + 37113
> >>> 75959364288*x^38 + 13078302651873216*x^36 + 38977344315307584*x^34 + 9825210
> >>> 8786134728*x^32 + 209260046783039040*x^30 + 375757773758107200*x^28 + 566964
> >>> 010597622400*x^26 + 715492120542918048*x^24 + 750523839570713088*x^22 + 6491
> >>> 30912300207232*x^20 + 458125942466369664*x^18 + 260295367984115328*x^16 + 11
> >>> 6982277577092224*x^14 + 40621591866960000*x^12 + 10554853128818688*x^10 + 19
> >>> 60600165904448*x^8 + 242910928408320*x^6 + 17820025360128*x^4 + 592019290368
> >>> *x^2 + 1536953616
> >>>
> >>> It did behave decently on 2.2.7, but slowed down considerably after
> >>> all latest changes.
> >>
> >> It is back to decent speed in current CVS [ and (many) further changes behind
> >> the scenes... ].
> >>
> >> The implementation is still far from optimal since some non-modular
> >> computations remain [ two in particular at the end of testb2() / testc2()
> >> in the non-primary case are very expensive ], but I don't want to further
> >> complicate the code before extensive checks.
> >>
> >> Any regression ?
> >
> > Absolutely:
> >
> > ? nfdisc(x^64+2^16);
> >   *** nfdisc: bug in GP (Segmentation Fault), please report
> 
> I have corrected the SEGV above and went on fixing the above-mentioned 2
> inefficiencies.  The code is almost entirely modular now, and often quite a
> bit faster (~ a factor 2 for the big polynomial). Esp. in the large
> degrees you seem to favor :-).
> 
> The current code passes my test-suite [ make test-round4 + all polynomials
> submitted so far in this thread ].
> 
> Please re-check ! I'll try and cleanup everything after that last checkpoint.

One more thing I noticed:

? allocatemem()
  *** allocatemem: Warning: doubling stack size; new stack = 64000000 (61.035 Mbytes).
? nfdisc(x^64 + 16*x^56 - 912*x^48 + 10688*x^40 - 400288*x^32 + 6702848*x^24 + 6866688*x^16 + 295936*x^8 + 256)
54928660240916705829692137343476270101600957315461056543842727794765969610738510667156664877056

Why is 32MB of stack not enough?

Thanks

Igor