Karim Belabas on Wed, 05 Feb 2014 12:20:24 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: handling several substacks ?


* Bill Allombert [2014-02-05 11:49]:
> On Wed, Feb 05, 2014 at 10:52:13AM +0100, Pascal Molin wrote:
> > I have to deal with the following kind of loops:
> >   {
> >     /* main loop */
> >     pari_sp av1 = avma, lim1 = stack_lim(avma, 1);
> >     long n;
> >     for (n = 1; n <= N; ++n)
> >     {
> >         long i, j, k;
> >         i = random_Fl(L)+1;
> >         j = random_Fl(L)+1;
> >           for (k = 1; k <= K; ++k)
> >           {
> >             gel(gel(a, i), k) = gmul( gel(gel(a, j), k), gel(gel(m, i), k));
> >             if (avma > lim1)
> >               a = gerepilecopy(av1, a);
[...]
> Try the new C file in attachment.
> 
> ? gploop()
> time = 2,280 ms.
> %4 = 0
> ? cloop()
> time = 1,028 ms.
> %5 = 0

To compare with Bill's implementation of the recommanded approach,
here's the complete version using clones (cloop2).

(12:00) gp > cloop(50000,1000,8)
time = 15,912 ms.

(12:00) gp > cloop2(50000,1000,8)
time = 23,688 ms.

(12:00) gp > gploop(50000,1000,8)
time = 35,580 ms.

So the recommanded way is a clear winner in ordinary situations. As expected,
clones are less efficient (here we use them in the simplest
possible way, and do little useful work, so their overhead is large).

But thay have a much better worse case behaviour: in "desperation mode,
when too little memory was available to start with, low_stack will be
triggered frequently; in the worse case, each scalar operation triggers
a gerepilecopy of the full array...

Cheers,

    K.B.

P.S. The initial fillup of arrays a and m are very memory inefficient.
To help avoid desperation mode, it's better to cleanup garbage here as well.

Either gerepiles (better) or using lower lever functinos

  gel(p1, k) = expIr(randomr(prec));

(loses 'prec' words instead of 3*prec in the original code); or

  pari_sp av = avma;
  gel(p1, k) = gerepileuptoleaf(av, expIr(randomr(prec)));

(loses nothing, memory-optimal).

--
Karim Belabas, IMB (UMR 5251)  Tel: (+33) (0)5 40 00 26 17
Universite Bordeaux 1          Fax: (+33) (0)5 40 00 69 50
351, cours de la Liberation    http://www.math.u-bordeaux1.fr/~kbelabas/
F-33405 Talence (France)       http://pari.math.u-bordeaux1.fr/  [PARI/GP]
`
/*-*- compile-command: "/usr/local/bin/gcc-4.7 -c -o stack.o -O3 -Wall -fno-strict-aliasing -fomit-frame-pointer -fPIC -I"/Users/pascal/devel/install/include" stack.gp.c && /usr/local/bin/gcc-4.7 -o stack.gp.so -bundle -flat_namespace -undefined suppress -O3 -Wall -fno-strict-aliasing -fomit-frame-pointer -fPIC stack.o "; -*-*/
#include <pari/pari.h>
/*
GP;install("cloop2","D50000,L,D1000,L,D8,L,p","cloop2","./stack2.so");
*/
GEN
cloop2(long N, long K, long L, long prec)
{
  pari_sp ltop = avma;
  GEN a, m;
  {
    /* init */
    long l;
    a = cgetg(L+1, t_VEC);
    m = cgetg(L+1, t_VEC);
    for (l = 1; l <= L; ++l)
    {
      long k;
      GEN p1, p2;
      p1 = cgetg(K+1, t_VEC);
      p2 = cgetg(K+1, t_VEC);
      for (k = 1; k <= K; ++k)
      {
          gel(p1, k) = gexp(gmul(gen_I(), genrand(real_1(prec))), prec);
          gel(p2, k) = gexp(gmul(gen_I(), genrand(real_1(prec))), prec);
      }
      gel(a, l) = p1;
      gel(m, l) = p2;
    }
  }
  {
    /* main loop */
    long n;
    for (n = 1; n <= N; ++n)
    {
      long i, j, k;
      i = random_Fl(L)+1;
      j = random_Fl(L)+1;
      for (k = 1; k <= K; ++k)
      {
        pari_sp av = avma;
        GEN old = gel(gel(a, i), k);
        gel(gel(a, i), k) = gclone(gmul( gel(gel(a, j), k), gel(gel(m, i), k)));
        if (isclone(old)) gunclone(old);
        avma = av;
      }
    }
  }
  /* use A = a[i] here */
  avma = ltop;
  return gen_0;
}