Philippe Elbaz-Vincent on Sun, 21 Sep 2003 14:24:43 +0100 (WEST) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: performance comparison |
Hi Igor, what do you mean by Athlon at 2.4Ghz ? (Are you overclocking it or do you mean Athlon XP or MP 2400, I guess the later, the clock is 2Ghz and the cache only 256KB). In my experience and with PARI/GP (programs using integers op, few floating points, lot of matrices computations) the x86 arch is "way faster" than the Alpha arch whatever tuning you do with cc or gcc (even recent releases). I should point out, that strictly speaking it's very vague, say that clock for clock I had no applications running faster on an Alpha21264 for instance (than an Athlon). But I never used nfgaloisconj (neither the nf* functions in general). Among the x86 arch the Athlon is, usually, the winner in term of performance/clock. Unless you have huge L2 cache (8MB or better 16MB), I doubt you will get more than 10%-15% of improvements, in general. As suggested by Karim, you should be able to test the 'cache effect' by using pol of small degrees. btw: as far as I understand the P4 arch, it has been optimized for floating point op. (as multimedia contents usually use floating point op. more than integer op, and the P4 is optimized for multimedia as advertised by Intel. It's also consistent with the benches using mp3/ogg encoding or divx encoding, where P4s usually beat by a clear margin the Athlons at equivalent 'performance rating'). In general, a P4 at 3Ghz is not behind an AthlonXP2400 (unless it is a 3200+ that you are speaking), in fact, with PARI/GP, they should be in the same range of performance with even the P4 leading the march (this is based on an extrapolation of a comparison between an AthlonMP2200+ and a P4/3Ghz), thus in your benches there could also be a scheduling problem at the level of the P4 (gcc is most likely guilty here, as it is lacking efficient scheduling for the P4), remember the P4 is also optimized for pipelining, thus very sensitive to scheduling. just for curiosity what was the flags for cc and gcc ? just my 2cents. Cheers, Philippe. I3M, UMR CNRS 5149, CC51, Université Montpellier II. E-mail: pev@math.univ-montp2.fr | Phone: +33 (0)467143958 http://www.math.univ-montp2.fr/~pev | Fax: +33 (0)467143558 On Sun, 21 Sep 2003, Igor Schein wrote: > Hi, > > I ran nfgaloisconj(degree-92 pol) on 3 different platforms: > > Tru64 on 1GHz ev68 compiled with cc 99s > Linux on 2.4GHz athlon compiled with gcc 220s > Linux on 3.0GHz P4 compiled with gcc 302s > > I used --with-gmp for compilation of latest CVS sources and the > initial stack was 256m. > > I can see 2 things here, that P4 does lose in comparison to Athlon, > and that Alpha is a clear winner. Am I correct in assuming that this > is essentially a benchmark of floating point performance, where Alpha > is way superior, or does Alpha's large cache play the role in the > equation? Or maybe gcc-vs-cc proposition is the factor here? > > In fact, I suspect that Pari's benches are dominated by integer > arithmetic, making P4 the fastest platform, while most number field > operations are floating point internsive. Since number fields are > bread-and-butter of Pari... > > Just curious. > > Thanks > > Igor >