Gerhard Niklasch on Fri, 9 Apr 1999 14:40:04 +0200 (MET DST)

 Re: ECM code in PARI

```In response to:
> Date: Thu, 08 Apr 1999 19:30:24 -0400

> 1. Computing (n)P
>
> The code uses Peter Montgomery's PRAC algorithm to create a Lucas chain
> for the purpose of computing (n)P. However, this algorithm was designed
> to handle calculations based on a recursion of the form
>
>   x[m+n] = f(x[m], x[n], x[m-n])
>
> on the assumption that all three parameters are required to calculate f.

And the implementation heavily exploits the fact that we can get by
with just the first two parameters in the present context -- there
are rather fewer subcases, several subcases need fewer operations, etc.

> Since in general an addition chain is slightly shorter
> than a Lucas chain, it would make sense to switch to an addition chain
> approach.

Yep, but do we want to go to great lengths to either store precomputed
addition chains for thousands of multipliers, or re-establish them on
the fly?  PRAC already does rather better than the old binary shift
algorithm, and as far as I can see there are at most a few % to be
gained (and only for the B1 phase).

> Also, since it is as easy to subtract two points as it is to add them,
> it would make sense, when computing (n)P, if n = 3 mod 4, to calculate
> (n)P = (n+1)P - P. This ensures that there will be at least twice as
> many doubling steps as adding steps, so that the length of the addition
> chain will not exceed 1.5*lg(n). This is certainly better than could be
> obtained in general from an ordinary addition chain or a Lucas chain.

Can't comment on this without further thought. :)  If memory serves,
PRAC chains should still be within a constant (and not exorbitant)
factor of that length.  There are certainly cases where this doesn't
help at all:  (4)P - P requires more operations than (2)P + P.

> Note that there is no advantage in this case to using ellmult to
> multiply a point by a prime. For addition chains, as opposed to Lucas
> chains, factorization is not all that helpful. It would make more sense
> to multiply by the product of several small primes.

But we're multiplying by something which _arises_ as a product of powers
of primes - we're not factoring the multipliers.

(And I've tried several ways e.g. of multiplying by 9, but all turned out
to be slower than multiplying by 3, twice in succession.)

One advantage of PRAC is that at each instant you know in advance whether
a denominator can vanish -- an elladd() can _never_ degenerate into an
unforeseen elldouble().

If you can come up with something (a) practical and (b) faster, by
all means let's see it! :)

> 2. Avoiding invmod

I don't think this is much of an issue, since the current code calls
invmod() very rarely indeed.  When N is at all large, we're working on
64 curves simultaneously, and one invmod() will handle up to 128 (!)
individual additions.  During the B2 phase, only the giant steps involve
calls to invmod().  The running time is entirely dominated by multipli-
cations mod N.

Cheers, Gerhard
```