Bill Allombert on Tue, 10 Jul 2018 23:59:43 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: SIGSEGV on isprime


On Tue, Jul 10, 2018 at 06:35:40PM +0200, Ján Jančár wrote:
> Hi all,
> 
> While running pari on some grid computing machines I keep encountering
> this mysterious error, which only happens when working on certain
> machines and not on others.
> 
> To reproduce, I compiled the following:
> 
> on:
>     Linux 3.16.0-5-amd64 #1 SMP Debian
>     3.16.51-3+deb8u1+zs2 (2018-01-12) x86_64 GNU/Linux
> 
> where it run just fine.
> However when run(and compiled) on:
>     Linux 4.9.0-6-amd64 #1 SMP Debian
>     4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
> 
> It SIGSEGVs in isprime():
> 
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x00002aaaab1f771c in red_montgomery (T=0x2aaae879e6a0, N=0x2aaae879efe0, inv=2796584439883844767) at ../src/kernel/gmp/mp.c:1013
> > 1013	    while (Td < (GEN)av) { t = subllx(*++Td, *++Nd); *Td = t; }
> > (gdb) bt
> > #0  0x00002aaaab1f771c in red_montgomery (T=0x2aaae879e6a0, N=0x2aaae879efe0, inv=2796584439883844767) at ../src/kernel/gmp/mp.c:1013
> > #1  0x00002aaaab33a0ac in _sqr_montred (E=0x2aaae879ef38, x=0x2aaae879e6d0) at ../src/basemath/arith1.c:3311
> > #2  0x00002aaaab33a138 in _mul2_montred (E=0x2aaae879ef38, x=0x2aaae879e6d0) at ../src/basemath/arith1.c:3326
> > #3  0x00002aaaab39d641 in gen_pow_fold_i (x=0x2aaae879eef0, N=0x2aaae879efa0, E=0x2aaae879ef38, sqr=0x2aaaab33a068 <_sqr_montred>, msqr=0x2aaaab33a10d <_mul2_montred>)
> >     at ../src/basemath/bb_group.c:254
> > #4  0x00002aaaab33abf3 in Fp_pow (A=0x2aaaabaa3998 <readonly_constants+56>, K=0x2aaae879efa0, N=0x2aaae879efe0) at ../src/basemath/arith1.c:3508
> > #5  0x00002aaaab5ca498 in bad_for_base (S=0x7fffffffe210, a=0x2aaaabaa3998 <readonly_constants+56>) at ../src/basemath/prime.c:95
> > #6  0x00002aaaab5cbaf2 in BPSW_psp (N=0x2aaae879efe0) at ../src/basemath/prime.c:570
> > #7  0x00002aaaab5cca58 in isprime (x=0x2aaae879efe0) at ../src/basemath/prime.c:846
> > #8  0x0000000000400788 in main ()
> > (gdb) info locals
> > __value = 8722076062158158581
> > __arg1 = 144115188075855878
> > __arg2 = 10180160215563133591
> > __temp = 18446744073709551615
> > av = 46913531930272
> > Te = 0x2aaae867e688
> > Td = 0x2aaae867e6a0
> > Ne = 0x2aaae867efe8
> > Nd = 0x2aaae867f000
> > scratch = 0x2aaae867e680
> > i = 2
> > j = 2
> > m = 9020403664262637533
> > t = 8722076062158158581
> > d = 4
> > k = 2
> > carry = 1
> > hiremainder = 4978068440930021014
> > overflow = 1
> > (gdb) info args
> > T = 0x2aaae867e6a0
> > N = 0x2aaae867efe0
> > inv = 2796584439883844767
> > (gdb) quit
> 
> I compiled pari 2.9.5 / 2.10.1 / current git master, with
> ./Configure --enable-tls -g
> and the error happens in all of the versions.
> 
> Any ideas on what might be causing this? ldd of the binary on both
> machines shows the same libraries are used, so it is very mysterious to
> me that it works on one and not on the other.

Why are you using --enable-tls ? Does it makes a difference ?
Are you using the same compiler ? The same processor ?
This code has not changed between 2.9.5 and 2.10.1, however it is rather
messy, so maybe it is not compiled correctly.
You can also try
./Configure --kernel=none 

Cheers,
Bill