Karim Belabas on Thu, 06 Jun 2019 23:17:27 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: Memory usage for zeta() |
* Andrew Walker [2019-06-06 09:03]: > Using one of the latest 64 bit binaries for Windows: > E:\PrimeZeta>gp64-2-12-0.exe > GP/PARI CALCULATOR Version 2.12.0 (alpha) > amd64 running mingw (x86-64/GMP-6.1.2 kernel) 64-bit version > compiled: Jun 3 2019, gcc version 6.3.0 20170516 (GCC) > threading engine: single > (readline not compiled in, extended help enabled) > > Copyright (C) 2000-2019 The PARI Group > > I've found that the memory usage for zeta() is a lot bigger than the previous version I've been using, > gp64-2-11-1.exe (also 64 bit) from 21 Nov last year. > Using the default precision, try zeta(0.5+100000.0*I) then at increasing values of the imaginarypart, looking at memory usage in the task manager. > > For the latest version zeta(0.5+300000*.0*I) the usage shown gets to over > 294 Mb whereas forthe earlier one it is just over 20 Mb Is this due to the > recent changes related to Bernoulli numbers? Yes. Computing zeta() at this height with Euler-Maclaurin formula requires computing *a lot* of Bernoulli numbers, B_2, ... B_{2N}. In 2.11 those were computed (and stored) as floating point numbers; in 2.12, there are computed (and stored) as exact rational numbers, for about the same cost. Of course, since the size of Bernoulli numbers B_{2n} increase linearly with n, the new cache size is quadratic in N, whereas it was linear in 2.11. On the other hand, it now contains much more information, of course. 2.11: ===== (22:58) gp > \g3 debug = 3 (22:58) gp > zeta(1/2+100000*I); lim, nn: [7155, 16078] Time sum from 1 to N-1: 28 caching Bernoulli numbers 2*14 to 2*7155 Time Bernoulli: 7696 Time Bernoulli sum: 7704 time = 7,732 ms. (22:58) gp > zeta(1/2+100000*I); \\ second time is fast ! lim, nn: [7155, 16078] Time sum from 1 to N-1: 32 Time Bernoulli sum: 8 \\ no cache computation here time = 40 ms. 2.12: ===== (22:59) gp > \g3 debug = 3 (22:59) gp > zeta(1/2+100000*I); lim, nn: [7155, 16078] Time sum from 1 to N-1: 32 caching Bernoulli numbers 2*14 to 2*7155 Time Bernoulli: 7676 Time Bernoulli sum: 7684 time = 7,716 ms. (22:59) gp > zeta(1/2+100000*I); lim, nn: [7155, 16078] Time sum from 1 to N-1: 28 Time Bernoulli sum: 12 time = 40 ms. Cheers, K.B. P.S. See attached a preliminary implementation of the Zetafast algorithm, written by Henri Cohen and myself, which doesn't need any Bernoulli number and scales much better than Euler-Maclaurin as Im(s) increases. Zetafast(D, s) computes L((D/.), s) for any (smallish) fundamental discriminant D and any complex s in the critical strip in time O_D(sqrt(|Im(s)|)) for fixed accuracy, so comparable to Rimann-Siegel [but much simpler]. Whereas zeta(s) sums O(|Im(s)|) terms... The implementation strives to be efficient outside of the critical strip so it is quite a bit more complicated than necessary if 0 <= Re(s) <= 1. (23:01) gp > \r Zetafast (23:01) gp > Zetafast(1, 1/2+100000*I) \\ = zeta(1/2 + 100000*I) time = 44 ms. %1 = 1.0730320148577531321140762694920985307 + 5.7808485443635039842610405578322343743*I (23:01) gp > Zetafast(1, 1/2+10000000*I) time = 424 ms. %2 = 11.458040610577092545002268589041325797 - 8.6434372268360217238182148996339272448*I (23:01) gp > Zetafast(1, 1/2+1000000000*I) time = 4,224 ms. %3 = -2.7617480298380609423768128776672451381 - 1.6775122409894598392132050998908713339*I (23:02) gp > Zetafast(1, 1/2+100000000000*I) time = 43,160 ms. %4 = 0.64366392558011857271943565209178429193 + 0.11686159146168534184488292306042677997*I -- Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17 Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23 351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/ F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP] `
/* Compute L(chi_D, s) by the Zetafast algorithm. */ Vecsumchi(V, D) = sum(m = 1, #V, my(k = kronecker(D, m)); if (k, k * V[m])); Zetafastmakedatainit(q, sig, T, bit) = { my(v, N, M, r1, r2, logN, cost, corr, ME, MQ); localbitprec(32); my(la = 3.5, u = la - 1 - log(la), B = bit * log(2)); T = max(T, 1); r1 = (x -> my(y = x - sig); (sig-1) * log(2*Pi) - (x+1/2) * log(x) + (y+1/2)/2 * log(y^2 + T^2) - x * log(2*Pi/q) + sig + B + T * (Pi/2 - atan(T/y))); logN = (x -> my(a = 1 / (x - sig)); (log(10*a / la) + r1(x) * a) / (x * a + 1)); r2 = (x -> u * x + (1/2+sig) * log(x) + (1+sig) * log(la) - B + (sig-1) * logN(x)); ME = (x -> exp((r1(x) - x * logN(x)) / (x - sig))); MQ = (x -> la * x * exp(logN(x))); corr = (x -> 1 + (sig - 1) * (ME(x) + q / (2*Pi*B) * exp(-logN(x)))); cost = (x -> x * (MQ(x) + 10 * corr(x)^2 * ME(x))); return ([logN, r2, ME, cost]); } Zetafastmakedata(q, sig, T, bit) = { my(v, L, N, M, r1, logN, r2, ME, cost); localbitprec(32); my(la = 3.5, u = la - 1 - log(la), B = bit * log(2)); [logN, r2, ME, cost] = Zetafastmakedatainit(q, sig, T, bit); sig += 0.1; if (r2(sig) < 0, my(b = max(sig, 2*B / u)); v = ceil(solve(x = sig, b, r2(x))) , my(v1 = ceil(sig), mi1 = cost(v1)); while (1, v2 = v1 + 5; mi2 = cost(v2); if (mi2 < mi1, v1 = v2; mi1 = mi2, v = v1; break()) ) ); N = bitprecision(exp(logN(v)), bit + 64); M = ceil(ME(v)); return ([la, v, N, M]); } Exp(z) = { my(B = (getlocalbitprec() + 20) * log(2)); if (real(z) > -B, exp(z), 0.); } ZetafastE1(D, s, mu, data) = { my(bit = getlocalbitprec(), [la, v, N, M] = data); my(sig = real(s), q = abs(D), in, C, S, VE, Vin, nu); if (sig > 1, bit += (sig-1) * (M + q / (2*Pi*N)) / log(2)); localbitprec(bit); N = bitprecision(N, bit); nu = q / (2 * Pi * N); in = mu * I * nu; if (mu == -1, my(eps, Tv = imag(s) / v); M *= exp(-(Pi/2) * Tv); eps = Pi / 2 - nu / M; if (eps > 0, M *= exp(-eps * Tv)); M = ceil(M) ); VE = vector (M, m, (m + in)^s); Vin = powers(-in, v - 1); S = Vecsumchi(dirpowers(M, s), D); C = 1; for (w = 0, v - 1, S -= C * Vin[w + 1] * Vecsumchi(VE, D); if (w == v - 1, break); C *= (s - w) / (w + 1); VE = vector(M, m, VE[m] / (m + in)); ); my(G = sqrt(q) * (2*Pi)^s * q^(-s - 1)); if (D < 0, G *= -mu * I); /* G contains chi(-mu)(2 pi)^{s-1}q^{-s}g(chi) */ return (if (mu == -1, Exp(log(G * S) + lngamma(-s) + I * Pi/2 * s); , G * S * gamma(-s) * exp(-I * Pi/2 * s))); } ZetafastQ(D, s, data) = { my([la, v, N] = data, M = ceil(la*v*N), gav = (v - 1)!); my(VS, VE, P, S); VS = dirpowers(M, -s); VE = powers(exp(-1 / N), M); P = Polrev(vector(v, w, N^(v - w) * (gav / (w - 1)!))); S = sum(n = 1, M, my(k = kronecker(D, n)); if (k, k * VS[n] * VE[n+1] * subst(P, 'x, n))); return (S / (N^(v - 1) * gav)); } ZetafastG(s, data) = { my([la, v, N] = data); Exp(lngamma(s + v) + s * log(N) - log(s * (v - 1)!)); } Zetafast(D = 1, s) = { my(bit = getlocalbitprec(), la, v, N, data, S); my(sig = real(s), T = imag(s), flconj = 0); if (sig < 0, error("please use functional equation")); if (T == 0 && s == round(s) && s >= 0, return (lfun(D, s))); if (T < 0, flconj = 1; T = -T); [la, v, N, M] = data = Zetafastmakedata(abs(D), sig, T, bit); if (flconj, s = conj(s)); localbitprec(bit + 32); S = ZetafastQ(D, s, data) - if (D == 1, ZetafastG(1-s, data)) + ZetafastE1(D, s-1, 1, data) + ZetafastE1(D, s-1, -1, data); if (flconj, S = conj(S)); return (bitprecision(S, bit)); }