Bill Daly on Thu, 08 Apr 1999 19:30:24 -0400

 ECM code in PARI

```I've been looking at the implementation of ECM factorization in
ifactor1.c, and I have a couple of comments.

1. Computing (n)P

The code uses Peter Montgomery's PRAC algorithm to create a Lucas chain
for the purpose of computing (n)P. However, this algorithm was designed
to handle calculations based on a recursion of the form

x[m+n] = f(x[m], x[n], x[m-n])

on the assumption that all three parameters are required to calculate f.
For an elliptic curve, the calculation of (m+n)P requires only the
coordinates of (m)P and (n)P, (m-n)P not being needed. Therefore, a
Lucas chain is not necessary for this purpose, and an ordinary addition
chain will do. Since in general an addition chain is slightly shorter
than a Lucas chain, it would make sense to switch to an addition chain
approach.

Also, since it is as easy to subtract two points as it is to add them,
it would make sense, when computing (n)P, if n = 3 mod 4, to calculate
(n)P = (n+1)P - P. This ensures that there will be at least twice as
many doubling steps as adding steps, so that the length of the addition
chain will not exceed 1.5*lg(n). This is certainly better than could be
obtained in general from an ordinary addition chain or a Lucas chain.

Note that there is no advantage in this case to using ellmult to
multiply a point by a prime. For addition chains, as opposed to Lucas
chains, factorization is not all that helpful. It would make more sense
to multiply by the product of several small primes.

2. Avoiding invmod

In principle, it is possible to carry out the entire algorithm without
using invmod at all until the very end. Essentially, this can be done by
clearing the denominators in the formulas for doubling and adding
points, with the denominators kept in a separate array. If a point is
represented by a 3-vector [x,y,d], corresponding to the usual 2-vector
[x/d^2,y/d^3], it is possible to write versions of elladd() and
elldouble() which compute [x2,y2,d2] = (2)[x1,y1,d1] and [x3,y3,d3] =
[x1,y1,d1]+[x2,y2,d2] without requiring division mod N, i.e. without
calling invmod.

Unfortunately, this doesn't work well for adding two points, since the
new code would require 4*sqri + 13*mulii calls, while the current
implementation uses 1*sqri + 5*mulii calls. However, the new code for
doubling a point would require only 6*sqri + 3*mulii calls instead of
the current 2*sqri + 5*mulii calls, and assuming that a sqri is about
twice as fast as a mulii, the code should about break even, except that
the invmod can be avoided.

Here is a PARI/GP implementation of elldouble which does this. It takes
as input a 3-vector u = [ux,uy,ud], and returns a 3-vector v =
[vx,vy,vd] such that [vx/vd^2,vy/vd^3] is equal to (2)[ux/ud^2,uy/ud^3].
I have assumed a curve of the form y^2 = x^3 + A*x + B, but of course
one can always assume that A = 1, as is the case in ifactor1.c. I did
not count A*t2 as a mulii for this reason.

{elldouble(u/*,...local variables*/) =
ux = u[1]; uy = u[2]; ud = u[3];
t1 = ud*ud;
t2 = t1*t1;
t3 = 3*ux*ux + A*t2;
t4 = t3*t3;
t5 = 2*uy*uy;
t6 = 2*ux*t5;
vx = t4 - 2*t6;
vy = -(2*t5*t5 + t3*(t4 - 3*t6));
vd = 2*uy*ud;
[vx,vy,vd];
}

This is also going to require an extra array to hold the denominators.

It is also possible to define an elladd along the same lines. The
following PARI/GP implementation will show why this is not a good idea.
The input is two 3-vectors [ux,uy,ud] and [vx,vy,vd], and the output is
a 3-vector [wx,wy,wd].

ux = u[1]; uy = u[2]; ud = u[3];
vx = v[1]; vy = v[2]; vd = v[3];
r1 = ud*ud;
s1 = vd*vd;
r2 = ux*s1;
s2 = vx*r1;
t2 = r2 - s2;
r3 = uy*s1*vd;
s3 = vy*r1*ud;
t3 = r3 - s3;
t4 = t2*t2;
t5 = t3*t3;
t6 = r2 + s2;
wx = -t4*t6 + t5;
wy = t4*(r3*(t6+s2)-s3*(t6+r2)) - t3*t5;
wd = ud*vd*t2;
[wx,wy,wd];
}

To use the elldouble logic but not the elladd logic, it will be
necessary to do an invmod any time that an elldouble step is followed by
an elladd step. Essentially, this should convert [x,y,d] to
[x/d^2,y/d^3,1], which is equivalent. If the proposed change to the
addition chain logic is implemented, this should eliminate at least 2/3
of the invmod calls.

Note that the 3-vector [x,y,d] effectively defines a point [x,y] on the
curve y^2 = x^3 + (A*d^4)*x + (B*d^6), which is of course equivalent to
the original curve.

Regards,

Bill
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom