Ruud H.G. van Tol on Sat, 26 Nov 2022 10:21:21 +0100 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: 3^k performance |
On 2022-11-26 09:29, Karim Belabas wrote:
* Ruud H.G. van Tol [2022-11-26 08:47]:A022330_1(n)=1+n+sum(k=1,n,logint(3^k,2)) A022330_2(n)=my(t=1);1+n+sum(k=1,n,logint(t*=3,2)) ? A022330_1(10^5) cpu time = 8,322 ms, real time = 8,352 ms. %15 = 7924941755 ? A022330_2(10^5) cpu time = 215 ms, real time = 217 ms. %16 = 7924941755 So about a 40x difference. Is that worthwhile to look into?This is expected: (09:02) gp > for(k=1,10^5,3^k) time = 8,409 ms. (09:02) gp > my(t=1);for(k=1,10^5,t*=3) time = 175 ms. Assume x is a small integer (here x = 3). Computing x^k as x * t where t = x^(k-1) is already known involves a single multiplication by x, which is done in linear tine wrt the size of t, i.e. about s = k * log2(x). Computing x^k from scratch involves essentially log2(k) consecutive squarings of an increasingly large integer [ and also a few multiplications by x whose cost is negligible ]. The final squaring only, namely (x^r)^2 where r = k >> 1, has a cost which is superlinear in the size of x^r, i.e about s / 2. Concretely : (09:18) gp > t = 3^(10^5 >> 1); (09:18) gp > for(i=1,10^4,t^2) time = 1,805 ms. (09:19) gp > T = t^2; (09:19) gp > for(i=1,10^4,T^2) time = 4,726 ms. So doubling the size of t multiplies the squaring time by about 2.62 in this range, which would correspond to a complexity M(s) ~ s^1.39 (where 1.39 ~ log2(2.62)) for the cost of a multiplication of size s. Now compare s and (s / 2)^1.39 for s = log2(3^10^5), the actual size you're interested in: (09:20) gp > s = log(3^10^5)/log(2); (09:20) gp > (s/2)^1.39 / s %3 = 40.698247890865989130716669156266722278 So a 40 x difference is quite expected !How to best approach that?Never compute from scratch when previous results can be reused :-)
Sure, and that is kind of what I imagined was going on. (am going through the source now) In the Perl-compiler, we have a peephole-optimiser phase that operates right after the main compiler phase that builds the AST, to patch the tree where feasible. Example: https://metacpan.org/pod/Devel::GoFaster For the case at hand, I have no idea yet if it is feasible to either optimise it at the pre-compilation phase, or inside the compilation-phase, or right after it. (or indeed, leave it to the user) Even an optional (!) warning about "sub-optimal code, stupid" would already help. :) (which a linter could already do) -- Ruud