Bill Allombert on Sun, 08 Sep 2024 15:24:58 +0200
|
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PATCH 2.16-2-beta]: about 20x speedup of forprime() in 10^12 .. 2^64
|
- To: pari-dev@pari.math.u-bordeaux.fr
- Subject: Re: [PATCH 2.16-2-beta]: about 20x speedup of forprime() in 10^12 .. 2^64
- From: Bill Allombert <Bill.Allombert@math.u-bordeaux.fr>
- Date: Sun, 8 Sep 2024 15:24:53 +0200
- Arc-authentication-results: i=1; smail; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; d=math.u-bordeaux.fr; s=openarc; t=1725801895; c=relaxed/relaxed; bh=L15yg8snWzwkgK3brRPl808LAUNZ5abaE+xr95g3ep0=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: Content-Transfer-Encoding:In-Reply-To; b=vxvOb19aMWsmRgpM6IdwVKhvtq7HVcBOObZWGJ01sS/NEy6q2rk07YeA24CFQF1iDowpGknFEQvmnnp7VgBG+GnLFWf6dNe7Lwk6eb5yB+/wnmQ7UN7n+8jTJ1Pb6ON+AL+/PfO1OYqewlhe8mA/91frd371dpqKiCwEN/UJTZtvrJHRPKjzmWALxuMIg+ZEnnOr0mbvnL4h8d/njYsIp+eMWMDKDW1kCjJEOZfTl0ld0rrRHL94odg6K/E0EPKIolc0WewS/zzDOIN53Q88h/N0YsHrBPJBpC7/8jeLxYo4/PyOfimGIsgYltzZK1+KruJ1vKsJxn/t6nAsVbkf4EDCsQbzivf+Erma73fggGkkWsmmah5sW1qU0mObM0BJJb/p9YkhS6419LPbTyI0midyV9qQYeTMiGOOd5wfB7SvAwhm9MN89QFyZFu1IxmZ55wjjLnh7DQckyh+8/0+YcTykoTw8Gbj+scGDrDPfFoLwgTvYNz/9o+vhsjydqa4V592ygZnWQt0v8iAmqX0AJkzaSrJhTdlDMtY+PjDW2N3P8CD91o+NrHre/Mbv3iGGgvcFgdizhimrB0WcciMxDiJ9sPzaSXYceWq3WgRtb50XMhhNMTo6T32bK+551OOWdfoB+UYWpgSMsOpLMHBQ6fEkyBbsl/qWfdTa022CLU=
- Arc-seal: i=1; a=rsa-sha256; d=math.u-bordeaux.fr; s=openarc; t=1725801895; cv=none; b=xwCLvB6hwMZa78GsqqL3IaHiS+HBqp2wxOe811CFtbZqJRKuKqQRBuYatZX1Rd3YrYKpD3D6O1JFy41IRc4jI5tBJI2OU9fSQXjOtxi3W6mBCXBInBf+N9UVsQk81J08nRlOv0LYDwZOtERN3j+b90RMNZrJ50ZMzVdiYUsbzypq6c2W5+XKMsON/GCKDhmXQzVleOgp0Z/a9gMa1HhPQOvU+2Tx6R1Cz19eiMXluEdQaLSc1gdN3u4I4ybNkP1Gr5D13Iupo0zhe8v3hRM5/6to8uv5cN/HXZniaIuO9Ph1WbVwK5ULsJ+uZrvoYOngovasp4mfIbhumB06h2oKb+2o4XlXFmDJsmV7Cx14puO8dEDfTi9mGQqjeXXDsI8Iqybz2IsATefiFcQdz6M/T0VEUuKUZyxZnwVnxHmLrK10kQtwow+a6o+eAHj22uajWwDJLYY8h7JBm2D6jQ849y6yXLsUkmlys1zvJpITy5E4sGWYhbyqAdXwtNMKY3OtBLcDZH/V/7J1lRU01EJzR5pELhqFltEYx94ouGtIKnyfPs0xRVe66Zv8VyKcyCFMGWlFzHM6HYMuIKr6MCSO+1VFzM8jty7VprQVmKszsz5c74SJn6VgnPCE4/rZs/LFEnrKRUFO71MeHVoCqmnSHMzN45ImIlBrHyNW1F2QWko=
- Authentication-results: smail; dmarc=none header.from=math.u-bordeaux.fr
- Authentication-results: smail; arc=none
- Delivery-date: Sun, 08 Sep 2024 15:24:58 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=math.u-bordeaux.fr; s=2022; t=1725801895; bh=L15yg8snWzwkgK3brRPl808LAUNZ5abaE+xr95g3ep0=; h=Date:From:To:Subject:References:In-Reply-To:From; b=Zv3aQmrYbQVkgdgbh9qv1FvWS4NwgviEVLzpfRoaJvdjRFx5tSbOvLnaUIv5azBTL T3BeZMDKYm05fPtckWHuSaQDU0FKnLMyaeOQmzdP1dzL7udWrg2O1+L5fJ5G3vW9RP c5MWuTcK3t1VISh8auuMHUGc+FMeOzfNG0UrdC/LwUhh0X+HtsyjAxC83JazS+pOwo ASazCZNplm2AnruE1MNY49hs8FxI/SKdmNw8q9XvTYhYgtTrnXcxZg9enMg6wR9eyj 7mt7E+oH33mHO4px/AunkCIJoqZotHcLhlDrCSDFhinWYgbJ2P4wK5YgQvnHmhWhZp 8vgzSQKtMWloa0EbaSa8NvPOPDwxA20p5O36hpjWgDOFrOG7LhE2kUX8wnmB4iPbX1 yRruPQ3AIk9o4+1q9C1mH7Leq10CgIcmAcq1E6BAyM08qI2pUgxs6/zHsh8VLM5k4G LBHqKiniq1iDpWnKJIfFsDu1L6xrDGLO0aoGRNHlKvstYaBNtSQFt3Rcy3KTGy3oGG kQkzzgYnoo5RtfW2AOmlaCW6MeDzdocsz/+rZaJ3a3xiBN9bRfNmypbfgO2M3IvEjx Indp8k+07/5dWTErxPGXHC9492TMUR5uq1nq94A0clKUtLOdza3esTyj6iMIDstHkS g/yNU3bLo2OVDA3XDg12bISQ=
- In-reply-to: <ZsH0fuX68GJ4oDyO@debian.attlocal.net>
- Mail-followup-to: pari-dev@pari.math.u-bordeaux.fr
- References: <ZsH0fuX68GJ4oDyO@debian.attlocal.net>
On Sun, Aug 18, 2024 at 06:17:50AM -0700, Ilya Zakharevich wrote:
> On a contemporary low-midlevel CPU (13th Gen Intel(R) Core(TM) i5-13500T), with
>
> timeit(s,f) = my(t=gettime(), o=f()); if(#o, o=Str("\t",o)); printf("%s%7.3f%s\n",s,gettime()/1000,o);
> my(c, L=2*10^9); for(k=8,19, c=0; timeit(Strprintf("10^%d\t",k), (()->forprime(p=10^k,10^k+L,c++);c)))
>
> SUMMARY: with this patch, the overhead of forprime in the range
> 2^32..2^64 is a constant overhead of “1.5 increments (as in c++)” per
> a found prime, plus the overhead of “√(N/10^19) increments” for
> sieving near N.
Could explain what rem_half does ?
If I understand correctly,
you are suggesting to change is optimal_chunk:
ulong chunk = 0x80000UL;
to
ulong chunk = 0x800000UL;
Is it correct ?
Cheers,
Bill