Igor Schein on Fri, 04 Jun 2004 19:49:20 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: round4 performance |
On Thu, Jun 03, 2004 at 07:21:45PM +0200, Karim Belabas wrote: > * Igor Schein [2004-06-02 22:57]: > > On Wed, Jun 02, 2004 at 08:50:29PM +0200, Karim Belabas wrote: > >> * Igor Schein [2004-05-17 17:38]: > >> > On Wed, May 05, 2004 at 08:35:40PM +0200, Xavier-François Roblot wrote: > >>>> Well, I have modified update_alpha (after Karim pointed out a strange > >>>> behavior in this function) and that kind of miraculously speed up > >>>> dramatically that example!... As you will see, the computing time is now > >>>> very reasonable and it runs with a small stack too (I hope the result is > >>>> still correct though, I haven't checked yet). Igor, Karim and I still > >>>> have some ideas for improvements for nilord but you need some new bad > >>>> polynomials to test them. Please send me your worst examples! > >>> > >>> As of current CVS, I have one: > >>> > >>> x^64 + 144*x^62 + 9552*x^60 + 390432*x^58 + 11080200*x^56 + 232989696*x^54 + > >>> 3780238752*x^52 + 48636265248*x^50 + 505878824736*x^48 + 4313989216800*x^46 > >>> + 30476092609440*x^44 + 179725400591616*x^42 + 889696224175824*x^40 + 37113 > >>> 75959364288*x^38 + 13078302651873216*x^36 + 38977344315307584*x^34 + 9825210 > >>> 8786134728*x^32 + 209260046783039040*x^30 + 375757773758107200*x^28 + 566964 > >>> 010597622400*x^26 + 715492120542918048*x^24 + 750523839570713088*x^22 + 6491 > >>> 30912300207232*x^20 + 458125942466369664*x^18 + 260295367984115328*x^16 + 11 > >>> 6982277577092224*x^14 + 40621591866960000*x^12 + 10554853128818688*x^10 + 19 > >>> 60600165904448*x^8 + 242910928408320*x^6 + 17820025360128*x^4 + 592019290368 > >>> *x^2 + 1536953616 > >>> > >>> It did behave decently on 2.2.7, but slowed down considerably after > >>> all latest changes. > >> > >> It is back to decent speed in current CVS [ and (many) further changes behind > >> the scenes... ]. > >> > >> The implementation is still far from optimal since some non-modular > >> computations remain [ two in particular at the end of testb2() / testc2() > >> in the non-primary case are very expensive ], but I don't want to further > >> complicate the code before extensive checks. > >> > >> Any regression ? > > > > Absolutely: > > > > ? nfdisc(x^64+2^16); > > *** nfdisc: bug in GP (Segmentation Fault), please report > > I have corrected the SEGV above and went on fixing the above-mentioned 2 > inefficiencies. The code is almost entirely modular now, and often quite a > bit faster (~ a factor 2 for the big polynomial). Esp. in the large > degrees you seem to favor :-). > > The current code passes my test-suite [ make test-round4 + all polynomials > submitted so far in this thread ]. > > Please re-check ! I'll try and cleanup everything after that last checkpoint. One more thing I noticed: ? allocatemem() *** allocatemem: Warning: doubling stack size; new stack = 64000000 (61.035 Mbytes). ? nfdisc(x^64 + 16*x^56 - 912*x^48 + 10688*x^40 - 400288*x^32 + 6702848*x^24 + 6866688*x^16 + 295936*x^8 + 256) 54928660240916705829692137343476270101600957315461056543842727794765969610738510667156664877056 Why is 32MB of stack not enough? Thanks Igor