Ruud H.G. van Tol on Sat, 26 Nov 2022 10:21:21 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: 3^k performance




On 2022-11-26 09:29, Karim Belabas wrote:
* Ruud H.G. van Tol [2022-11-26 08:47]:

A022330_1(n)=1+n+sum(k=1,n,logint(3^k,2))

A022330_2(n)=my(t=1);1+n+sum(k=1,n,logint(t*=3,2))

? A022330_1(10^5)
cpu time = 8,322 ms, real time = 8,352 ms.
%15 = 7924941755

? A022330_2(10^5)
cpu time = 215 ms, real time = 217 ms.
%16 = 7924941755

So about a 40x difference.

Is that worthwhile to look into?

This is expected:

(09:02) gp > for(k=1,10^5,3^k)
time = 8,409 ms.
(09:02) gp > my(t=1);for(k=1,10^5,t*=3)
time = 175 ms.

Assume x is a small integer (here x = 3).

Computing x^k as x * t where t = x^(k-1) is already known involves a
single multiplication by x, which is done in linear tine wrt the size of t,
i.e. about s = k * log2(x).

Computing x^k from scratch involves essentially log2(k) consecutive
squarings of an increasingly large integer [ and also a few
multiplications by x whose cost is negligible ]. The final squaring
only, namely (x^r)^2 where r = k >> 1, has a cost which is superlinear
in the size of x^r, i.e about s / 2. Concretely :

(09:18) gp > t = 3^(10^5 >> 1);
(09:18) gp > for(i=1,10^4,t^2)
time = 1,805 ms.
(09:19) gp > T = t^2;
(09:19) gp > for(i=1,10^4,T^2)
time = 4,726 ms.

So doubling the size of t multiplies the squaring time by about 2.62 in
this range, which would correspond to a complexity M(s) ~ s^1.39
(where 1.39 ~ log2(2.62)) for the cost of a multiplication of size s.

Now compare s and (s / 2)^1.39 for s = log2(3^10^5), the actual size
you're interested in:

(09:20) gp > s = log(3^10^5)/log(2);
(09:20) gp > (s/2)^1.39 / s
%3 = 40.698247890865989130716669156266722278

So a 40 x difference is quite expected !

How to best approach that?

Never compute from scratch when previous results can be reused :-)

Sure, and that is kind of what I imagined was going on.
(am going through the source now)

In the Perl-compiler, we have a peephole-optimiser phase
that operates right after the main compiler phase
that builds the AST, to patch the tree where feasible.
Example: https://metacpan.org/pod/Devel::GoFaster

For the case at hand, I have no idea yet
if it is feasible to either optimise it
at the pre-compilation phase,
or inside the compilation-phase,
or right after it.
(or indeed, leave it to the user)

Even an optional (!) warning
about "sub-optimal code, stupid"
would already help. :)
(which a linter could already do)

-- Ruud