| hermann on Sat, 15 Jul 2023 00:03:06 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| Re: Why is "lift(Mod(qnr, n)^(n\4))" 16% slower than C libgmp "powm(r, qnr, n/4, n)" ? |
On 2023-07-14 19:13, Bill Allombert wrote:
The short answer is that the GNU MP library does not provide a function mpn_powm that PARI could use. mpz_powm use a lot of internal mpn functions for fast modularreduction which are very efficient but not public.Now, I could add a wrapper for mpz_powm for large entries but 16% slower is notthat bad and we need fast modular reduction in more general setting.Could you compare p=(2^95369 + 1)/3; Mod(2,p)^((p-1)/5) with mpz_powm and PARI ?
Sure.Performance problem was related to the base, exponent and modulus I used. For your base, exponent and modulus gp outperforms C++ with mpz_powm() by 27% !
That is very good to know.
OS is Ubuntu 22.04 Desktop.
gp: 19.6s (5.440 GHz)
g++: 26.9s (5.320 GHz)
hermann@7600x:~$ cat 95369.gp
p=(2^95369 + 1)/3;
x=lift(Mod(2, p)^((p-1)\5));
##
print(x);
hermann@7600x:~$ perf stat -e cycles,task-clock gp < 95369.gp
Reading GPRC: /etc/gprc
GPRC Done.
GP/PARI CALCULATOR Version 2.15.3 (released)
amd64 running linux (x86-64/GMP-6.2.1 kernel) 64-bit version
compiled: Jun 30 2023, gcc version 11.3.0 (Ubuntu
11.3.0-1ubuntu1~22.04.1)
threading engine: single
(readline not compiled in, extended help enabled)
Copyright (C) 2000-2022 The PARI Group
PARI/GP is free software, covered by the GNU General Public License, and
comes
WITHOUT ANY WARRANTY WHATSOEVER. Type ? for help, \q to quit. Type ?18 for how to get moral (and possibly technical) support. parisizemax = 2000003072, primelimit = 500000 *** last result computed in 19,604 ms. 1 Goodbye! Performance counter stats for 'gp': 107,245,113,513 cycles # 5.440 GHz19,715.98 msec task-clock # 1.000 CPUs utilized
19.717168440 seconds time elapsed
19.605913000 seconds user
0.110405000 seconds sys
hermann@7600x:~$
hermann@7600x:~$ g++ 95369.cc -lgmp -lgmpxx -O3 -Wall -pedantic -Wextra
-o 95369
hermann@7600x:~$ perf stat -e cycles,task-clock ./95369 28709-digit prime p (95368 bits) 26.8999s 1 done Performance counter stats for './95369': 143,114,482,703 cycles # 5.320 GHz26,901.18 msec task-clock # 1.000 CPUs utilized
26.902467744 seconds time elapsed
26.901533000 seconds user
0.000000000 seconds sys
hermann@7600x:~$ cat 95369.cc
// g++ 95369.cc -lgmp -lgmpxx -O3 -Wall -pedantic -Wextra -o 95369
// (cpplinted and cppchecked)
//
#include <time.h>
#include <math.h>
#include <gmpxx.h>
#include <assert.h>
#include <iostream>
int main() {
mpz_class a, b, c, p;
char *buf;
int bits;
float dt;
buf = new char[32000000];
assert(buf);
mpz_ui_pow_ui(a.get_mpz_t(), 2, 95369);
a += 1;
a /= mpz_class("3");
p = a;
c = (p-1) / 5;
std::cerr
<< strlen(mpz_get_str(buf, 10, p.get_mpz_t())) << "-digit prime
p ("
<< (bits = strlen(mpz_get_str(buf, 2, p.get_mpz_t()))) << "
bits)\n";
b = 2;
clock_t start = clock();
mpz_powm(a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t(),
p.get_mpz_t());
dt = static_cast<float>(clock() - start) / CLOCKS_PER_SEC;
std::cerr << dt << "s\n";
std::cout << a << "\n";
std::cerr << "done\n";
delete buf;
return 0;
}
hermann@7600x:~$
Regards,
Hermann.