Ilya Zakharevich on Sun, 2 Sep 2001 11:45:29 -0400 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
[PATCH 2.1.1] 15x speedup of prime tables |
The following patch makes calculations of primes up to 100m (or 435000000) 15x times quickier on UltraSparcs with 256K of cache. I do not have x86 with 256K level=2 cache to check the effects on x86 - but they should be quite similar. This was suggested by a discussion on s.m.research. Results: up to 100m in 1.7s, up to 435m in 7.9s (same as before with >4M caches, and better that before with 4M cache). Enjoy, Ilya --- ./src/basemath/arith2.c~ Mon Jan 15 14:21:55 2001 +++ ./src/basemath/arith2.c Sun Sep 2 11:33:29 2001 @@ -100,7 +100,7 @@ initprimes1(long size, long *lenp, long return (byteptr) gprealloc(p,r-p,size + 2); } -#define PRIME_ARENA (512 * 1024) +#define PRIME_ARENA (200 * 1024) /* No slowdown even with 256K level-2 cache */ /* Here's the workhorse. This is recursive, although normally the first recursive call will bottom out and invoke initprimes1() at once. @@ -108,7 +108,7 @@ initprimes1(long size, long *lenp, long byteptr initprimes0(ulong maxnum, long *lenp, long *lastp) { - long k, size, alloced, asize, psize, rootnum, curlow, last, remains, need; + long k, size, alloced, asize, psize, rootnum, curlow, last, remains; byteptr q,r,s,fin, p, p1, fin1, plast, curdiff; #if 0 @@ -134,12 +134,25 @@ initprimes0(ulong maxnum, long *lenp, lo fin1 = p1 + psize - 1; remains = (maxnum - rootnum) >> 1; /* number of odd numbers to check */ - need = 100 * rootnum; /* Make % overhead negligeable. */ - if (need < PRIME_ARENA) need = PRIME_ARENA; - if (avma - bot < need>>1) { /* need to do our own allocation */ - alloced = 1; asize = need; + /* ARENA_IN_ROOTS below 12: some slowdown starts to be noticable + * when things fit into the cache. + * XXX The choice of 10 gives a slowdown of 1-2% on UltraSparcII, + * but makes calculations even for (the current) maximum of 436273009 + * fit into 256K cache (still common for some architectures). + * + * One may change it when small caches become uncommon, but the win + * is not going to be very noticable... + */ +#ifndef ARENA_IN_ROOTS +# define ARENA_IN_ROOTS 10 +#endif /* ARENA_IN_ROOTS */ + + asize = ARENA_IN_ROOTS * rootnum; /* Make % overhead negligeable. */ + if (asize < PRIME_ARENA) asize = PRIME_ARENA; + if (avma - bot < asize>>1) { /* need to do our own allocation */ + alloced = 1; } else { /* scratch area is free part of PARI stack */ - alloced = 0; asize = avma - bot; + alloced = 0; } if (asize > remains) asize = remains + 1;/* + room for a sentinel byte */ if (alloced)