Karim Belabas on Wed, 05 Feb 2014 11:43:02 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: handling several substacks ?


* Pascal Molin [2014-02-05 10:52]:
> I have to deal with the following kind of loops:
> 
> gploop(N=8000,K=400,L=8) = {
>   my(m,a);
>   a = vector(L,l,vector(K,k,exp(I*random(1.))));
>   m = vector(L,l,vector(K,k,exp(I*random(1.))));
>   for(n=1,N, /* complex outer loop, stack clean */
>       my(i,j);
>       i=random(L)+1;j=random(L)+1;
>       for(k=1,K, /* trivial inner loop, stack demanding */
>         a[i][k] = a[j][k] * m[i][k];
>         );
>       /* use A = a[i] here */
>     );
> }
> 
> gp handles them correctly, but I do not know how to translate this in
> libpari
> using decent stack management. A gerepilecopy call on the vector a when
> the stack gets empty is not affordable (this is impressive, see the c file
> attached, unless I have made a mistake).
> And gaffect also makes copies.

1) The simplest general solution is to use clones (what gp does), with
inner loop of the form:

  for (k = 1; k <= K; k++)
  {
    pari_sp av = avma;
    GEN old = mael(a,i,k);
    mael(a,i,k) = gclone( gmul(mael(a,j,k), mael(m,i,k)) );
    avma = av; if (isclone(old)) gunclone(old);
  }

You waste almost no memory, and the gclone overhead should be minimal
compared to gmul().

2) IF your data is of a very specific type (mostly leaves: t_INT, t_REAL,
t_VECSMALL), then ad hoc constructions based on variants of gaffect will
be a little more efficient. I *strongly* advise against using them,
though. See ifactor.c:alloc_scratch() to understand why; this very
carefully crafted code (written by Gerhard Niklasch around 1997)

- was a nightmare to debug when memory corruptions eventually turned out,
  ten years later.

- doesn't always use the right low-level algorithms (e.g. Barett or
  Montgomery reduction, for a start, but there are about 5 or 6
  "standard" improvements that should be incorporated) and it's very
  hard to adapt it now. It will be easier to rewrite it from scratch.


A much simpler example (in a much simpler situation) could be closer to
what you're trying to achieve: DedekZeta.c:l340-368.


> I think I would like to allocate L+1 stack slices and make sure that
> each component of some a[i] points only inside one of them, manually
> changing the stack pointer before each k loop to a free slice.

There used to be a mechanism for this (was called switch_stack()), which
had serious drawbacks. It disappeared some time ago because we were no
longer using it (because of the drawbacks), and the proposed better
alternative has only been partially implemented. (Not committed.)

It requires changing the way we access bot / top in libpari
(avma remains unaffected).

I'd stick to 1) for the time being.


Cheers,

    K.B.
--
Karim Belabas, IMB (UMR 5251)  Tel: (+33) (0)5 40 00 26 17
Universite Bordeaux 1          Fax: (+33) (0)5 40 00 69 50
351, cours de la Liberation    http://www.math.u-bordeaux1.fr/~kbelabas/
F-33405 Talence (France)       http://pari.math.u-bordeaux1.fr/  [PARI/GP]
`