| Lorenz Minder on Thu, 03 Sep 2009 08:45:29 +0200 | 
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| Re: TRY/CATCH is unnecessarily slow on OS X | 
Hi, BA wrote: > On Sat, Aug 15, 2009 at 01:45:18AM +0200, Lorenz Minder wrote: > > [ ./except p is ~10x slower than ./except n on OS X ] > > On my amd64-linux laptop I get: > %time ./expect n > ./expect n 0,71s user 0,00s system 99% cpu 0,712 total > %time ./expect p > ./expect p 1,23s user 0,00s system 100% cpu 1,226 total > > This is still slow but more bearable. I find this surprising indeed. I did a couple of runs with 64bit compiles on OS X and NetBSD, and I can't observe anything that bad when _setjmp / _longjmp is used. (I get a ~20% slowdown or so relative to no error checking. It's a bit more with Linux on G4, but still only ~30%.) The numbers are later in the mail. I don't have access to 64bit Linux currently, any chance you could figure out what's going on here? Linux apparently has a BSD-compatibility mode for setjmp(), I hope that's not activated for some reason. It might be advisable to also test the _setjmp() patch on Linux, just to make sure. I also looked at glibc _setjmp() (and hence setjmp()) implementation on AMD64 and it looked very reasonable (just saving a couple of registers). > > Why is it so slow? The problem is that the Mac OS X version of > > setjmp/longjmp saves and restores the signal mask, which appears to take > > an insane amount of time. Assuming that this is not necessary (if it > > is, I think you have a bug on SysV), it is better to use a variant of > > setjmp/longjmp that does not do this. Use _setjmp/_longjmp or possibly > > sigsetjmp(,0)/siglongjmp() instead. > > I have to agree that all systems should behave the same. > However my understanding of POSIX.1 is that setjmp/longjmp > must not save and restore the signal mask, and non POSIX systems > are not required to carry sigsetjmp/siglongjmp. I went and checked the POSIX spec this afternoon. These are the main points: 1) It is unspecified whether setjmp() saves the signal mask. (They don't say anything about that in the setjmp manpage itself, but elsewhere they explicitly say it's unspecified.) 2) _setjmp() must not store the signal mask. 3) sigsetjmp(,0) does not have to store the signal mask, but it can. sigsetjmp(,1) must store it. 4) They say new software should not use _setjmp, but rather sigsetjmp. I think that given that sigsetjmp(,0) may store the signal mask, the advice 4) seems dodgy. > On my laptop, TRY/CATCH incurr a 80% penalty on the running time. We > should do more timings. Yes, indeed. See also the bottom of this mail. BA: > On Mon, Aug 24, 2009 at 01:03:13AM -0700, Lorenz Minder wrote: > err_catch is only supposed to be used through the TRY/CATCH macros so this is > not an issue. What is more problematic is changing the type of DATA->env: > User code might do setjmp(GP_DATA->env) and be broken by the change. Ok, thanks for clarifying. > So maybe we could use _setjmp/_getjmp instead of sigsetjmp/siglongjmp. We can't use _longjmp() on a jmp_buf that has been created with setjmp() or vice versa, so I don't think this will work. Looking at the code again, it seems that the setjmp(GP_DATA->env)/longjmp(GP_DATA->env) business is separate from the TRY/CATCH/err_catch() infrastructure, so one option would be only to change TRY/CATCH/err_catch() to use _setjmp/_longjmp and leave the GP_DATA->env stuff as is. It would be wise, though, to make sure that "pari_setjmp(GP_DATA->env)" would cause a compile-time error, so as to avoid erroneous use of pari_setjmp(). I think that can be arranged. This assumes that setjmp(GP_DATA->env) is only called infrequently, and is hence not performance-critical. I believe this holds at least for how it's used in GP. What do you think? > (It is also slightly confusing to use the sig* varianltin order *not* to save > signals). Yes. Then again, I dislike the name _setjmp just as much. So I don't really care. I took sigsetjmp(,0) because that was recommended for new software. Reading POSIX again today makes me no longer sure this is a valid point. So I can go back to _setjmp, sure. > > The other possible problem is that error handling is supposed to be > > reworked in any case if I understand correctly. Unless I'm misreading > > the plan, this would likely need a new API anyways, so it might be a > > better plan just to change the API just once. If that is the case, what > > would be the best thing for me to do to ensure that the new API does not > > have the same performance problems on BSD-style platforms as the current > > one does? > > We will change error handling, but this will still use *setjmp/*longjmp > anyway. I was actually going to propose a small change that makes it possible for users to avoid anything setjmp()/longjmp() based, provided they can come up with a viable alternative. The idea is as follows. 1) Add another argument to err_catch() that takes a function pointer to an error handler which is called when an error is to be "thrown". If the supplied function pointer is NULL, just do the *longjmp thing in the error case. 2) Make err_catch() a public function. 3) (optional) By default, compile libpari with "-fexceptions" (if GCC is used). The idea is then that a caller can install an error handler that uses some means other than longjmp() to return. For example, a C++ programmer might simply write a handler that throws an exception, and in his code he would then catch this with an ordinary try/catch block. Why would he want to do that? Because the C++ try/catch construct has much more robust semantics than CATCH/TRY. For example, you can use any flow control statement such as "break", "continue" or "return" to leave a try-block without ill-effect. Also, the stack unwinder calls destructors for objects that are unwound. And as a nice additional plus, C++ exceptions are typically much faster than anything setjmp() or _setjmp() based. (See the numbers at the end of the mail.) I would find this a very valuable extension to libpari's error handling mechanisms. This extension would also make it possible for users to base the exception-handling e.g. on a libunwind-based version of setjmp(), which would reportedly also be faster. The first attached patch (setjmp2.patch) implements this change (modulo documentation) for testing purposes. The new attached "except.cc" uses it to have errors handled with C++ exceptions if called with 'x'. The cumulative setjmp3.patch does the change setjmp->_setjmp for TRY/CATCH. (This is again just for testing, a real patch can be implemented once we know exactly what is needed.) > A note on the patch: I would prefer if pari_setjmp/pari_longjmp were > real functions and not macros. I can't do this for pari_setjmp(), because if pari_setjmp() then calls setjmp() and returns, the context from which setjmp() was called will be destroyed. The longjmp() will then not do what we want. I could do it for pari_longjmp() however. To what file would that go? Below are a number of measurements. Summary: * Cost setjmp vs no error checking: Mac OS: 12x (32 bit) 4.5x (64 bit) NetBSD: 2.5x (64 bit) Linux: +32% (G4) * Cost _setjmp vs no error checking: Mac OS: +21% (32 bit) +21% (64 bit) NetBSD: +17% (64 bit) Linux: +34% (G4) * C++ EH seems to cost somewhere between nothing and 8%. (But the library size goes up by ~10%.) I find the >30% I get with Linux/G4 relatively expensive, but not as bad as the 80% you report on AMD64, and much, much better than the cost for BSD-style systems with setjmp(). Best, --Lorenz Mac OS X, i386, 2.4 GHz ======================= (times are averages) w/ setjmp(), 32 bit, -fexceptions: [vtag: pari_devel_cxxexc] except n: 0.686 except x: 0.740 except p: 8.301 w/ _setjmp(), 32 bit, -fno-exceptions: [vtag: pari_devel_fast_sj] except n: 0.705 (except x: 0.718 <- for completeness; incorrect exc handling.) except p: 0.852 Max OS X, w/ _setjmp(), 64 bit, no GMP, w/ -fexceptions: [vtag: pari_devel_cxxexc_64] except n: 0.395 except x: 0.400 except p: 0.478 Mac OS X, i386/amd64, 2.4 GHz ============================= (single runs) 64 bit compile, no gmp, PARI w/ setjmp: except n: 0m0.417s except p: 0m1.949s 64 bit compile, no gmp, PARI w/ _setjmp: except n: 0m0.412s except p: 0m0.486s NetBSD, amd64, 2.4 GHz ====================== (times are averages) 64 bit, w/ setjmp, -fno-exceptions, linked against GMP. ./except n: 0.371 (./except x: 0.371 <- for completeness, EH not working.) ./except p: 0.995 64 bit, w/ setjmp, -fexceptions, linked against GMP. [ size of libpari.a : 7.5M ] ./except n: 0.374 ./except x: 0.371 ./except p: 0.980 64 bit, w/ _setjmp, -fno-exceptions, linked against GMP. ./except n: 0.371 (./except x: 0.371 <- for completeness, EH not working.) ./except p: 0.435 64 bit, w/ _setjmp, -fexceptions, linked against GMP. [ size of libpari.a : 7.5M ] ./except n: 0.373 ./except x: 0.371 ./except p: 0.437 Linux, G4, 1 GHz ================ (times are averages) 32 bit, w/ setjmp, -fexceptions, GMP. except n: 2.575 except x: 2.572 except p: 3.419 32 bit, w/ _setjmp, -fexceptions, GMP. [ size of libpari.so = 4.4MB ] except n: 2.582 except x: 2.572 except p: 3.467 32 bit, w/ _setjmp, -fno-exceptions, GMP. [ size of libpari.so = 4.1MB ] except n: 2.549 (except x: 2.550 <- for completeness, EH not working.) except p: 3.418 -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
Attachment:
except.cc
Description: Binary data
Attachment:
setjmp2.patch
Description: Binary data
Attachment:
setjmp3.patch
Description: Binary data