Karim Belabas on Thu, 06 Jun 2019 23:17:27 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Memory usage for zeta()


* Andrew Walker [2019-06-06 09:03]:
> Using one of the latest 64 bit binaries for Windows:
> E:\PrimeZeta>gp64-2-12-0.exe
>                    GP/PARI CALCULATOR Version 2.12.0 (alpha)
>           amd64 running mingw (x86-64/GMP-6.1.2 kernel) 64-bit version
>             compiled: Jun  3 2019, gcc version 6.3.0 20170516 (GCC)
>                             threading engine: single
>                (readline not compiled in, extended help enabled)
> 
>                      Copyright (C) 2000-2019 The PARI Group
> 
> I've found that the memory usage for zeta() is a lot bigger than the previous version I've been using,
> gp64-2-11-1.exe (also 64 bit) from 21 Nov last year.
> Using the default precision, try zeta(0.5+100000.0*I) then at increasing values of the imaginarypart, looking at memory usage in the task manager. 
> 
> For the latest version zeta(0.5+300000*.0*I) the usage shown gets to over
> 294 Mb whereas forthe earlier one it is just over 20 Mb Is this due to the
> recent changes related to Bernoulli numbers?

Yes. Computing zeta() at this height with Euler-Maclaurin formula requires
computing *a lot* of Bernoulli numbers, B_2, ... B_{2N}. In 2.11 those were
computed (and stored) as floating point numbers; in 2.12, there are computed
(and stored) as exact rational numbers, for about the same cost. Of course,
since the size of Bernoulli numbers B_{2n} increase linearly with n, the new
cache size is quadratic in N, whereas it was linear in 2.11.

On the other hand, it now contains much more information, of course.

2.11:
=====
(22:58) gp > \g3
   debug = 3
(22:58) gp > zeta(1/2+100000*I);
lim, nn: [7155, 16078]
Time sum from 1 to N-1: 28
caching Bernoulli numbers 2*14 to 2*7155
Time Bernoulli: 7696
Time Bernoulli sum: 7704
time = 7,732 ms.
(22:58) gp > zeta(1/2+100000*I); \\ second time is fast !
lim, nn: [7155, 16078]
Time sum from 1 to N-1: 32
Time Bernoulli sum: 8  \\ no cache computation here
time = 40 ms.

2.12:
=====
(22:59) gp > \g3
   debug = 3
(22:59) gp > zeta(1/2+100000*I);
lim, nn: [7155, 16078]
Time sum from 1 to N-1: 32
caching Bernoulli numbers 2*14 to 2*7155
Time Bernoulli: 7676
Time Bernoulli sum: 7684
time = 7,716 ms.
(22:59) gp > zeta(1/2+100000*I);
lim, nn: [7155, 16078]
Time sum from 1 to N-1: 28
Time Bernoulli sum: 12
time = 40 ms.

Cheers,

    K.B.

P.S. See attached a preliminary implementation of the Zetafast algorithm,
written by Henri Cohen and myself, which doesn't need any Bernoulli number
and scales much better than Euler-Maclaurin as Im(s) increases.

Zetafast(D, s) computes L((D/.), s) for any (smallish) fundamental
discriminant D and any complex s in the critical strip in time
O_D(sqrt(|Im(s)|)) for fixed accuracy, so comparable to Rimann-Siegel [but
much simpler]. Whereas zeta(s) sums O(|Im(s)|) terms...

The implementation strives to be efficient outside of the critical strip so
it is quite a bit more complicated than necessary if 0 <= Re(s) <= 1.

(23:01) gp > \r Zetafast
(23:01) gp > Zetafast(1, 1/2+100000*I) \\ = zeta(1/2 + 100000*I)
time = 44 ms.
%1 = 1.0730320148577531321140762694920985307 + 5.7808485443635039842610405578322343743*I
(23:01) gp > Zetafast(1, 1/2+10000000*I)
time = 424 ms.
%2 = 11.458040610577092545002268589041325797 - 8.6434372268360217238182148996339272448*I
(23:01) gp > Zetafast(1, 1/2+1000000000*I)
time = 4,224 ms.
%3 = -2.7617480298380609423768128776672451381 - 1.6775122409894598392132050998908713339*I
(23:02) gp > Zetafast(1, 1/2+100000000000*I)
time = 43,160 ms.
%4 = 0.64366392558011857271943565209178429193 + 0.11686159146168534184488292306042677997*I

--
Karim Belabas, IMB (UMR 5251)  Tel: (+33) (0)5 40 00 26 17
Universite de Bordeaux         Fax: (+33) (0)5 40 00 21 23
351, cours de la Liberation    http://www.math.u-bordeaux.fr/~kbelabas/
F-33405 Talence (France)       http://pari.math.u-bordeaux.fr/  [PARI/GP]
`
/* Compute L(chi_D, s) by the Zetafast algorithm. */

Vecsumchi(V, D) =
  sum(m = 1, #V, my(k = kronecker(D, m)); if (k, k * V[m]));

Zetafastmakedatainit(q, sig, T, bit) =
{ my(v, N, M, r1, r2, logN, cost, corr, ME, MQ);

  localbitprec(32);
  my(la = 3.5, u = la - 1 - log(la), B = bit * log(2));
  T = max(T, 1);
  r1 = (x -> my(y = x - sig);
             (sig-1) * log(2*Pi) - (x+1/2) * log(x)
             + (y+1/2)/2 * log(y^2 + T^2) - x * log(2*Pi/q)
             + sig + B + T * (Pi/2 - atan(T/y)));
  logN = (x -> my(a = 1 / (x - sig));
               (log(10*a / la) + r1(x) * a) / (x * a + 1));
  r2 = (x -> u * x + (1/2+sig) * log(x) + (1+sig) * log(la)
             - B + (sig-1) * logN(x));
  ME = (x -> exp((r1(x) - x * logN(x)) / (x - sig)));
  MQ = (x -> la * x * exp(logN(x)));
  corr = (x -> 1 + (sig - 1) * (ME(x)
                 + q / (2*Pi*B) * exp(-logN(x))));
  cost = (x -> x * (MQ(x) + 10 * corr(x)^2 * ME(x)));
  return ([logN, r2, ME, cost]);
}

Zetafastmakedata(q, sig, T, bit) =
{ my(v, L, N, M, r1, logN, r2, ME, cost);

  localbitprec(32);
  my(la = 3.5, u = la - 1 - log(la), B = bit * log(2));

  [logN, r2, ME, cost] = Zetafastmakedatainit(q, sig, T, bit);
  sig += 0.1;
  if (r2(sig) < 0,
     my(b = max(sig, 2*B / u));
     v = ceil(solve(x = sig, b, r2(x)))
  ,
     my(v1 = ceil(sig), mi1 = cost(v1));
     while (1,
       v2 = v1 + 5; mi2 = cost(v2);
       if (mi2 < mi1, v1 = v2; mi1 = mi2, v = v1; break())
     )
  );
  N = bitprecision(exp(logN(v)), bit + 64);
  M = ceil(ME(v));
  return ([la, v, N, M]);
}

Exp(z) =
{ my(B = (getlocalbitprec() + 20) * log(2));

  if (real(z) > -B, exp(z), 0.);
}

ZetafastE1(D, s, mu, data) =
{ my(bit = getlocalbitprec(), [la, v, N, M] = data);
  my(sig = real(s), q = abs(D), in, C, S, VE, Vin, nu);

  if (sig > 1, bit += (sig-1) * (M + q / (2*Pi*N)) / log(2));
  localbitprec(bit);
  N = bitprecision(N, bit);
  nu = q / (2 * Pi * N); in = mu * I * nu;
  if (mu == -1,
    my(eps, Tv = imag(s) / v);
    M *= exp(-(Pi/2) * Tv); eps = Pi / 2 - nu / M;
    if (eps > 0, M *= exp(-eps * Tv));
    M = ceil(M)
  );
  VE = vector (M, m, (m + in)^s);
  Vin = powers(-in, v - 1);
  S = Vecsumchi(dirpowers(M, s), D);
  C = 1;
  for (w = 0, v - 1,
    S -= C * Vin[w + 1] * Vecsumchi(VE, D);
    if (w == v - 1, break);
    C *=  (s - w) / (w + 1);
    VE = vector(M, m, VE[m] / (m + in));
  );
  my(G = sqrt(q) * (2*Pi)^s * q^(-s - 1));
  if (D < 0, G *= -mu * I);
/* G contains chi(-mu)(2 pi)^{s-1}q^{-s}g(chi) */
  return (if (mu == -1,
    Exp(log(G * S) + lngamma(-s) + I * Pi/2 * s);
  ,
    G * S * gamma(-s) * exp(-I * Pi/2 * s)));
}

ZetafastQ(D, s, data) =
{ my([la, v, N] = data, M = ceil(la*v*N), gav = (v - 1)!);
  my(VS, VE, P, S);

  VS = dirpowers(M, -s);
  VE = powers(exp(-1 / N), M);
  P = Polrev(vector(v, w, N^(v - w) * (gav / (w - 1)!)));
  S = sum(n = 1, M,
          my(k = kronecker(D, n));
          if (k, k * VS[n] * VE[n+1] * subst(P, 'x, n)));
  return (S / (N^(v - 1) * gav));
}

ZetafastG(s, data) =
{ my([la, v, N] = data);
  Exp(lngamma(s + v) + s * log(N) - log(s * (v - 1)!));
}

Zetafast(D = 1, s) =
{ my(bit = getlocalbitprec(), la, v, N, data, S);
  my(sig = real(s), T = imag(s), flconj = 0);

  if (sig < 0, error("please use functional equation"));
  if (T == 0 && s == round(s) && s >= 0, return (lfun(D, s)));
  if (T < 0, flconj = 1; T = -T);
  [la, v, N, M] = data = Zetafastmakedata(abs(D), sig, T, bit);
  if (flconj, s = conj(s));
  localbitprec(bit + 32);
  S = ZetafastQ(D, s, data) - if (D == 1, ZetafastG(1-s, data))
      + ZetafastE1(D, s-1,  1, data)
      + ZetafastE1(D, s-1, -1, data);
  if (flconj, S = conj(S));
  return (bitprecision(S, bit));
}