Peter-Lawrence . Montgomery on Wed, 4 Apr 2001 23:02:28 +0200 (MET DST) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: parallelization |
> From: "Mills, D. Dr. MATH" <ad3943@exmail.usma.army.mil> > I'm using PARI on a supercomputer, and am curious to know any programming > "tricks" or "tips" to aid me in efficiently parallelizing my programs. If it > helps, I am performing a search algorithm on sets of polynomials over finite > fields in order to see which ones satisfy certain properties, and the search > is being performed on sets of coefficients (a1,a2,a3,...,ak) via a for loop. > I am also running programs which factor sets of numbers, none of which are > too large (no larger than 10^12 at the moment), but since I'm looking at > sets of such numbers the runtime is of course slow, and I'm hoping to speed > things up. As long as your numbers are around 10^12, the trial division algorithm needs a table of primes up to about 10^6. There are fewer than 80000 primes below 10^6. You can afford a table with these primes and any convenient information about them. If you have 64-bit integer multiply, then the data structure might have p (32-bit integer) inv64 (2-adic inverse, 1/p mod 2^64) qbound (FLOOR((2^64 - 1)/p) (one four-byte field, two 8-byte fields, per odd prime) uint64 q = n*inv64; /* mod 2^64 -- exact quotient if p divides n */ if (q <= qbound) then p divides n else p does not divide n end if (multiply, compare, branch, plus memory references and loop overhead). The proof notes that the computed q satisfies p*q == n (mod 2^64). The equality p*q = n will hold precisely when p*q <= 2^64 - 1. This is equivalent to q <= (2^64 - 1)/p. Since the main loop needs only inv64 and qbound, not p, you might store the two 8-byte entities in one array and p in another, to improve cache utilization. As long as n remains around 10^12, you might test (q < n) instead of (q <= qbound). This will reduce memory references. The test (q < n) will pass only 10^12 times in 2^64, about one in 10^7. You can afford the overhead to check for false hits. If you lack 64-bit integer multiplication, you might approximate the quotient by floating point arithmetic (storing the reciprocals of the primes), and round this to an integer (avoiding integer overflow on 32-bit machines when the quotient > 2^31), and check whether your approximate quotient is close to that integer. This is several more instructions. SQUFOF is good for factoring integers below 10^18, after doing trial division up to the cube root (so only two factors remain). It needs arithmetic only on integers up to 2*sqrt(n) (whereas Pollard Rho has intermediate results near n^2). In the rare case where SQUFOF fails, resort to trial division. Good luck!