Philippe Elbaz-Vincent on Sun, 21 Sep 2003 14:24:43 +0100 (WEST)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: performance comparison


Hi Igor,

what do you mean by Athlon at 2.4Ghz ? (Are you overclocking it or do you
mean Athlon XP or MP 2400, I guess the later, the clock is 2Ghz and the
cache only 256KB).

In my experience and with PARI/GP (programs using integers op, few
floating points, lot of
matrices computations) the x86 arch is "way faster" than the Alpha
arch whatever tuning you do with cc or gcc (even recent releases).
I should point out, that strictly speaking it's very vague, say that clock
for clock I
had no applications running faster on an Alpha21264 for instance (than an
Athlon). But I never used nfgaloisconj (neither the nf* functions in
general).
Among the x86 arch the Athlon is, usually, the winner in term of
performance/clock.

Unless you have huge L2 cache (8MB or better 16MB),
I doubt you will get more than 10%-15% of improvements, in general.
As suggested by Karim, you should be able to test the 'cache effect'
by using pol of small degrees.

btw: as far as I understand the P4 arch, it has been optimized for
floating point op. (as multimedia contents usually use floating point op.
more than integer op, and the P4 is optimized for multimedia as advertised
by Intel. It's also consistent with the benches using mp3/ogg encoding
or divx encoding, where P4s usually beat by a clear margin the Athlons at
equivalent
'performance rating'). In general,  a P4 at 3Ghz is not  behind an
AthlonXP2400 (unless it is a 3200+ that you are speaking), in fact, with
PARI/GP, they should be in the same range of performance with even the P4
leading the march (this is based on
an extrapolation of a comparison between an AthlonMP2200+ and a P4/3Ghz),
thus in your benches there could also be a scheduling problem at the level
of the P4 (gcc is most likely guilty here, as it is lacking efficient
scheduling for the P4), remember the P4 is also optimized for pipelining,
thus very sensitive to scheduling.

just for curiosity what was the flags for cc and gcc ?

just my 2cents.

Cheers, Philippe.

I3M, UMR CNRS 5149, CC51, Université Montpellier II.
E-mail: pev@math.univ-montp2.fr     | Phone: +33 (0)467143958
http://www.math.univ-montp2.fr/~pev | Fax: +33 (0)467143558


On Sun, 21 Sep 2003, Igor Schein wrote:

> Hi,
>
> I ran nfgaloisconj(degree-92 pol) on 3 different platforms:
>
> Tru64 on 1GHz ev68 compiled with cc		99s
> Linux on 2.4GHz athlon compiled with gcc	220s
> Linux on 3.0GHz P4 compiled with gcc		302s
>
> I used --with-gmp for compilation of latest CVS sources and the
> initial stack was 256m.
>
> I can see 2 things here, that P4 does lose in comparison to Athlon,
> and that Alpha is a clear winner.  Am I correct in assuming that this
> is essentially a benchmark of floating point performance, where Alpha
> is way superior, or does Alpha's large cache play the role in the
> equation?   Or maybe gcc-vs-cc proposition is the factor here?
>
> In fact, I suspect that Pari's benches are dominated by integer
> arithmetic, making P4 the fastest platform, while most number field
> operations are floating point internsive.  Since number fields are
> bread-and-butter of Pari...
>
> Just curious.
>
> Thanks
>
> Igor
>