Bill Allombert on Fri, 01 Nov 2024 18:22:30 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Why 70% / 51% slowdown of PARI/GP for pthread over single for 7950X / 7600X AMD CPUs?


On Fri, Nov 01, 2024 at 05:49:05PM +0100, hermann@stamm-wilbrandt.de wrote:
> It took a while to determine pthread versus single to cause slower
> computation times on faster AMD 7950X CPU over slower AMD 7600X CPU.
> 
> With same /etc/gprc, both on Ubuntu 22.04, GP 2.16.1 alpha:
> 
> ________pthread	single	PARI/GP
> 7950X___0:34:51 0:20:33	1.70
> 7600X___0:36:15	0:24:04	1.51
> 
> The h:mm:ss runtimes are for 729 million evaluations of
> Cayley-Menger determinant for computing volume of
> tetrahedron given 6 edge distances:
> 
> ? T=128;L=30;forvec (X=[[1,L],[1,L],[1,L],[1,L],[1,L],[1,L]], M=[0,1,1,1,1;1,0,X[1]^2,X[2]^2,X[3]^2;1,X[1]^2,0,X[4]^2,X[5]^2;1,X[2]^2,X[4]^2,0,X[6]^2;1,X[3]^2,X[5]^2,X[6]^2,0];if(matdet(M)==2*T,print(X)));
> [2, 2, 2, 2, 2, 2]
> cpu time = 34min, 50,094 ms, real time = 34min, 50,196 ms.
> ?
> 
> 1) What are explanations for the massive slowdowns for pthread?

This is a known issue, caused by the use of thread-local variables.

There are some way to reduce the slowdown:
- use gp-sta instead of gp-dyn
- use the compiler flag --flto=auto

Alternatively, you can avoid pthread and use MPI, which is more annoying to use
but do not require thread-local variables.

Using this benchmark:
T=128;L=12;forvec (X=[[1,L],[1,L],[1,L],[1,L],[1,L],[1,L]], M=[0,1,1,1,1;1,0,X[1]^2,X[2]^2,X[3]^2;1,X[1]^2,0,X[4]^2,X[5]^2;1,X[2]^2,X[4]^2,0,X[6]^2;1,X[3]^2,X[5]^2,X[6]^2,0];if(matdet(M)==2*T,print(X)));
##

I get:

% Olinux-x86_64/gp-dyn -q < ben
  ***   last result computed in 13,326 ms.
% Olinux-x86_64/gp-sta -q < ben
  ***   last result computed in 13,144 ms.
% Olinux-x86_64-pthread/gp-dyn -q < ben
  ***   last result: cpu time 22,510 ms, real time 22,511 ms.
% Olinux-x86_64-pthread/gp-sta -q < ben
  ***   last result: cpu time 13,675 ms, real time 13,677 ms.

Olinux-x86_64-pthread/gp-sta + --flto:
  ***   last result: cpu time 13,145 ms, real time 13,147 ms.

Cheers,
Bill.