Lorenz Minder on Thu, 03 Sep 2009 08:45:29 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: TRY/CATCH is unnecessarily slow on OS X


Hi,

BA wrote:
> On Sat, Aug 15, 2009 at 01:45:18AM +0200, Lorenz Minder wrote:
> > [ ./except p is ~10x slower than ./except n on OS X ]
> 
> On my amd64-linux laptop I get:
> %time ./expect n
> ./expect n  0,71s user 0,00s system 99% cpu 0,712 total
> %time ./expect p
> ./expect p  1,23s user 0,00s system 100% cpu 1,226 total
> 
> This is still slow but more bearable.

I find this surprising indeed.  I did a couple of runs with 64bit
compiles on OS X and NetBSD, and I can't observe anything that bad when
_setjmp / _longjmp is used.  (I get a ~20% slowdown or so relative to no
error checking.  It's a bit more with Linux on G4, but still only ~30%.)

The numbers are later in the mail.

I don't have access to 64bit Linux currently, any chance you could
figure out what's going on here?  Linux apparently has a
BSD-compatibility mode for setjmp(), I hope that's not activated for
some reason.  It might be advisable to also test the _setjmp() patch on
Linux, just to make sure.

I also looked at glibc _setjmp() (and hence setjmp()) implementation on
AMD64 and it looked very reasonable (just saving a couple of registers).

> > Why is it so slow?  The problem is that the Mac OS X version of 
> > setjmp/longjmp saves and restores the signal mask, which appears to take 
> > an insane amount of time.  Assuming that this is not necessary (if it 
> > is, I think you have a bug on SysV), it is better to use a variant of 
> > setjmp/longjmp that does not do this. Use _setjmp/_longjmp or possibly 
> > sigsetjmp(,0)/siglongjmp() instead.
> 
> I have to agree that all systems should behave the same.
> However my understanding of POSIX.1 is that setjmp/longjmp 
> must not save and restore the signal mask, and non POSIX systems
> are not required to carry sigsetjmp/siglongjmp.

I went and checked the POSIX spec this afternoon.  These are the main
points:

1) It is unspecified whether setjmp() saves the signal mask.  (They
   don't say anything about that in the setjmp manpage itself, but
   elsewhere they explicitly say it's unspecified.)

2) _setjmp() must not store the signal mask.

3) sigsetjmp(,0) does not have to store the signal mask, but it can.
   sigsetjmp(,1) must store it.

4) They say new software should not use _setjmp, but rather sigsetjmp.

I think that given that sigsetjmp(,0) may store the signal mask, the
advice 4) seems dodgy.

> On my laptop, TRY/CATCH incurr a 80% penalty on the running time. We
> should do more timings.

Yes, indeed.  See also the bottom of this mail.

BA:
> On Mon, Aug 24, 2009 at 01:03:13AM -0700, Lorenz Minder wrote:
> err_catch is only supposed to be used through the TRY/CATCH macros so this is
> not an issue. What is more problematic is changing the type of DATA->env:
> User code might do setjmp(GP_DATA->env) and be broken by the change.

Ok, thanks for clarifying.

> So maybe we could use _setjmp/_getjmp instead of sigsetjmp/siglongjmp.

We can't use _longjmp() on a jmp_buf that has been created with setjmp()
or vice versa, so I don't think this will work.  Looking at the code
again, it seems that the setjmp(GP_DATA->env)/longjmp(GP_DATA->env)
business is separate from the TRY/CATCH/err_catch() infrastructure, so
one option would be only to change TRY/CATCH/err_catch() to use
_setjmp/_longjmp and leave the GP_DATA->env stuff as is.  It would be
wise, though, to make sure that "pari_setjmp(GP_DATA->env)" would cause
a compile-time error, so as to avoid erroneous use of pari_setjmp().  I
think that can be arranged.

This assumes that setjmp(GP_DATA->env) is only called infrequently, and
is hence not performance-critical. I believe this holds at least for how
it's used in GP.

What do you think?

> (It is also slightly confusing to use the sig* varianltin order *not* to save
> signals).

Yes.  Then again, I dislike the name _setjmp just as much.  So I don't
really care.  I took sigsetjmp(,0) because that was recommended for new
software.  Reading POSIX again today makes me no longer sure this is a
valid point.  So I can go back to _setjmp, sure.

> > The other possible problem is that error handling is supposed to be
> > reworked in any case if I understand correctly.  Unless I'm misreading
> > the plan, this would likely need a new API anyways, so it might be a
> > better plan just to change the API just once.  If that is the case, what
> > would be the best thing for me to do to ensure that the new API does not
> > have the same performance problems on BSD-style platforms as the current
> > one does?
> 
> We will change error handling, but this will still use *setjmp/*longjmp
> anyway.

I was actually going to propose a small change that makes it possible
for users to avoid anything setjmp()/longjmp() based, provided they can
come up with a viable alternative.  The idea is as follows.

1) Add another argument to err_catch() that takes a function pointer to
an error handler which is called when an error is to be "thrown".  If
the supplied function pointer is NULL, just do the *longjmp thing in the
error case.

2) Make err_catch() a public function.

3) (optional) By default, compile libpari with "-fexceptions" (if GCC is
used).

The idea is then that a caller can install an error handler that uses
some means other than longjmp() to return.  For example, a C++
programmer might simply write a handler that throws an exception, and in
his code he would then catch this with an ordinary try/catch block.

Why would he want to do that?  Because the C++ try/catch construct has
much more robust semantics than CATCH/TRY.  For example, you can use any
flow control statement such as "break", "continue" or "return" to leave
a try-block without ill-effect.  Also, the stack unwinder calls
destructors for objects that are unwound.  And as a nice additional
plus, C++ exceptions are typically much faster than anything setjmp() or
_setjmp() based.  (See the numbers at the end of the mail.)

I would find this a very valuable extension to libpari's error handling
mechanisms.

This extension would also make it possible for users to base the
exception-handling e.g. on a libunwind-based version of setjmp(), which
would reportedly also be faster.

The first attached patch (setjmp2.patch) implements this change (modulo
documentation) for testing purposes.  The new attached "except.cc" uses
it to have errors handled with C++ exceptions if called with 'x'.

The cumulative setjmp3.patch does the change setjmp->_setjmp for
TRY/CATCH.  (This is again just for testing, a real patch can be
implemented once we know exactly what is needed.)

> A note on the patch: I would prefer if pari_setjmp/pari_longjmp were
> real functions and not macros.

I can't do this for pari_setjmp(), because if pari_setjmp() then calls
setjmp() and returns, the context from which setjmp() was called will be
destroyed.  The longjmp() will then not do what we want.  I could do it
for pari_longjmp() however.  To what file would that go?

Below are a number of measurements.  Summary:

* Cost setjmp vs no error checking:
	Mac OS:	12x (32 bit)	4.5x (64 bit)
	NetBSD:			2.5x (64 bit)
	Linux:	+32% (G4)

* Cost _setjmp vs no error checking:
	Mac OS:	+21% (32 bit)	+21% (64 bit)
	NetBSD:			+17% (64 bit)
	Linux:	+34% (G4)

* C++ EH seems to cost somewhere between nothing and 8%.  (But the
 library size goes up by ~10%.)

I find the >30% I get with Linux/G4 relatively expensive, but not as bad
as the 80% you report on AMD64, and much, much better than the cost for
BSD-style systems with setjmp().

Best,
--Lorenz
	

Mac OS X, i386, 2.4 GHz 
=======================
	
	(times are averages)

	w/ setjmp(), 32 bit, -fexceptions:
	  [vtag: pari_devel_cxxexc]

	except n:	0.686
	except x:	0.740
	except p:	8.301

	w/ _setjmp(), 32 bit, -fno-exceptions:
	  [vtag: pari_devel_fast_sj]

	except n:	0.705
	(except x:	0.718  <- for completeness; incorrect exc handling.)
	except p:	0.852

	Max OS X, w/ _setjmp(), 64 bit, no GMP, w/ -fexceptions:
	  [vtag: pari_devel_cxxexc_64]

	except n:	0.395
	except x:	0.400
	except p:	0.478

Mac OS X, i386/amd64, 2.4 GHz
=============================

	(single runs)

	64 bit compile, no gmp, PARI w/ setjmp:

	except n: 	0m0.417s
	except p:	0m1.949s

	64 bit compile, no gmp, PARI w/ _setjmp:

	except n: 	0m0.412s
	except p: 	0m0.486s


NetBSD, amd64, 2.4 GHz
======================

	(times are averages)

	64 bit, w/ setjmp, -fno-exceptions, linked against GMP.

	./except n:	0.371
	(./except x:	0.371 <- for completeness, EH not working.)
	./except p:	0.995

	64 bit, w/ setjmp, -fexceptions, linked against GMP.
	[ size of libpari.a : 7.5M ]

	./except n:	0.374
	./except x:	0.371
	./except p:	0.980

	64 bit, w/ _setjmp, -fno-exceptions, linked against GMP.

	./except n:	0.371
	(./except x:	0.371 <- for completeness, EH not working.)
	./except p:	0.435

	64 bit, w/ _setjmp, -fexceptions, linked against GMP.
	[ size of libpari.a : 7.5M ]

	./except n:	0.373
	./except x:	0.371
	./except p:	0.437


Linux, G4, 1 GHz
================

	(times are averages)

	32 bit, w/ setjmp, -fexceptions, GMP.

	except n:	2.575
	except x:	2.572
	except p:	3.419

	32 bit, w/ _setjmp, -fexceptions, GMP.
	[ size of libpari.so = 4.4MB ]

	except n:	2.582
	except x:	2.572
	except p:	3.467

	32 bit, w/ _setjmp, -fno-exceptions, GMP.
	[ size of libpari.so = 4.1MB ]

	except n:	2.549
	(except x:	2.550 <- for completeness, EH not working.)
	except p:	3.418

-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

Attachment: except.cc
Description: Binary data

Attachment: setjmp2.patch
Description: Binary data

Attachment: setjmp3.patch
Description: Binary data