An Embedded Error Recovery and Debugging Mechanism for Scripting Language Extensions
1. PY-MA
An Embedded Error Recovery and Debugging Mechanism for
Scripting Language Extensions
David M. Beazley
Department of Computer Science
The University of Chicago
beazley@cs.uchicago.edu
June 29, 2001
3. Scripting Language Extensions
Scripting languages
• Tcl, Perl, Python, Ruby, Guile, PHP etc.
,
Scripting extensions
• Extend scripting interpreters with compiled functions and types (in C/C++)
• Provide access to popular libraries and existing software
• Performance optimization
• A variety of automated tools can help (e.g., SWIG)
Benefits
• Scripting languages make a lot of things easier
• User interface development
• Web programming
• Rapid prototyping, debugging, etc.
• Plus, scripting languages are fun to use
4. Example
Scripting Interpreter
Dynamic Loading
foo.so bar.so ... blah.so
Extension building
• Extensions packaged as shared libraries
• Dynamic loading used to import modules
• Contents of extension modules add new commands or types
• Everything lives in the same process
• Interpreter responsible for high-level control
Examples
• GUI built with Tcl/Tk
• Scientific/engineering software (e.g., MATLAB)
5. A More Complex Example
Apache
libphp3.so mod_perl.so ... mod_python.so
Embedded Python Interpreter
Compiled
Extensions MySQLdb.so blah.so
.py .py .py .py
.py scripts
• May have multiple interpreters and a mix of C code and scripts
• Also may include IPC, sockets, subprocesses, threads, etc.
• Bottom line: a much more complicated runtime environment
6. Scripting Language Errors
If you make a mistake in a script, you get an error
% python spam.py
Traceback (most recent call last):
File "spam.py", line 15, in ?
blah()
File "spam.py", line 12, in blah
bar()
File "spam.py", line 9, in bar
foo()
File "spam.py", line 6, in foo
spam(3)
File "spam.py", line 3, in spam
doh(n)
NameError: There is no variable named 'doh'
To fix:
• Just look at the error message and make a change
• Run a script-level debugger
7. Errors in Extension Code
Uh oh...
% python spam.py
Segmentation Fault (core dumped)
or
% python spam.py
Bus Error (core dumped)
or
% python spam.py
Assertion failed: n > 0, file debug.c, line 54
Abort (core dumped)
Obviously something “bad” has happened
• Even worse: May get no error message at all (mod_python)
• May not get a core file either
8. Common Failure Modes
Uninitialized Data
• Improper initialization of libraries (forgot to call an initialization function?)
• Calling functions in the wrong order?
Improper argument checking
• Passing of NULL pointers
• Improper conversion of scripting objects to C
Failed assertions
• Extensive use of assert().
Weird stuff
• Illegal instructions
• Bus error. Memory alignment problems
Math errors
• Floating point exception (divide by zero?).
To fix:
• Trial and error with print statements?
• Cursing?
9. GDB Traceback
(gdb) where #31 0xff165544 in Tcl_UplevelObjCmd (dummy=0x1, interp=0x20e658, objc=1,
#0 0xff1d9bf0 in __sigprocmask () from /usr/lib/libthread.so.1 objv=0x24ec80)
#1 0xff1ce628 in _resetsig () from /usr/lib/libthread.so.1 at ./../generic/tclProc.c:614
#2 0xff1cdd18 in _sigon () from /usr/lib/libthread.so.1 #32 0xff13e98c in TclExecuteByteCode (interp=0x20e658, codePtr=0x2a0b70)
#3 0xff1d0e8c in _thrp_kill () from /usr/lib/libthread.so.1 at ./../generic/tclExecute.c:845
#4 0xfee49b10 in raise () from /usr/lib/libc.so.1 #33 0xff122bf8 in Tcl_EvalObjEx (interp=0x20e658, objPtr=0x274d50, flags=0)
#5 0xfee3512c in abort () from /usr/lib/libc.so.1 at ./../generic/tclBasic.c:2723
#6 0xfee353d0 in _assert () from /usr/lib/libc.so.1 #34 0xff165afc in TclObjInterpProc (clientData=0x1, interp=0x20e658, objc=0,
#7 0xfeee13ec in abort_crash () from /u0/beazley/Projects/WAD/WAD/Test/./ objv=0xffbeebd8) at ./../generic/tclProc.c:1001
debugmodule.so #35 0xff15e18c in EvalObjv (interp=0x20e658, objc=2, objv=0xffbeebd8,
#8 0xfeee28ec in _wrap_abort_crash () command=0xffbef024 "n tkButtonUp .1907556n", length=25, flags=0)
from /u0/beazley/Projects/WAD/WAD/Test/./debugmodule.so at ./../generic/tclParse.c:932
#9 0x281c8 in call_builtin (func=0x1cc4f0, arg=0x1f9424, kw=0x0) at ceval.c:2650 #36 0xff15e7d0 in Tcl_EvalEx (interp=0x20e658,
#10 0x28094 in PyEval_CallObjectWithKeywords (func=0x1cc4f0, arg=0x1f9424, kw=0x0) script=0xffbef024 "n tkButtonUp .1907556n", numBytes=25, flags=-4264800)
at ceval.c:2618 at ./../generic/tclParse.c:1393
#11 0x26764 in eval_code2 (co=0x1d37e0, globals=0x0, locals=0x1d37cf, args=0x1cc4f0, #37 0xff15e9c0 in Tcl_Eval (interp=0x20e658,
argcount=1762552, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x0) at string=0xffbef024 "n tkButtonUp .1907556n") at ./../generic/tclParse.c:1512
ceval.c:1951 #38 0xff1243d0 in Tcl_GlobalEval (interp=0x20e658,
#12 0x263a0 in eval_code2 (co=0x1d3858, globals=0x0, locals=0x1cc4f0, args=0x19b1a4, command=0xffbef024 "n tkButtonUp .1907556n") at ./../generic/tclBasic.c:4139
argcount=1883008, kws=0x1d7318, kwcount=0, defs=0x0, defcount=0, owner=0x0) #39 0xff221a40 in Tk_BindEvent (bindingTable=0xffbef024, eventPtr=0x29ffa0,
at ceval.c:1850 tkwin=0x2790a8, numObjects=2045728, objectPtr=0xffbef170) at ./../generic/
#13 0x263a0 in eval_code2 (co=0x1d3e50, globals=0x0, locals=0x19b1a4, args=0x1a7374, tkBind.c:1784
argcount=1883128, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x0) at #40 0xff226450 in TkBindEventProc (winPtr=0x2790a8, eventPtr=0x29ffa0)
ceval.c:1850 at ./../generic/tkCmds.c:244
#14 0x285e0 in call_function (func=0x1a73a4, arg=0x18f114, kw=0x0) at ceval.c:2772 #41 0xff22c218 in Tk_HandleEvent (eventPtr=0x29ffa0) at ./../generic/tkEvent.c:737
#15 0x28080 in PyEval_CallObjectWithKeywords (func=0x1a73a4, arg=0x18f114, kw=0x0) #42 0xff22c61c in WindowEventProc (evPtr=0x29ff98, flags=-1) at ./../generic/
at ceval.c:2616 tkEvent.c:1072
#16 0x680b0 in builtin_apply (self=0x0, args=0x0) at bltinmodule.c:88 #43 0xff15bb54 in Tcl_ServiceEvent (flags=-1) at ./../generic/tclNotify.c:607
#17 0x281c8 in call_builtin (func=0x1910c8, arg=0x1f9b54, kw=0x0) at ceval.c:2650 #44 0xff15beec in Tcl_DoOneEvent (flags=-1) at ./../generic/tclNotify.c:846
#18 0x28094 in PyEval_CallObjectWithKeywords (func=0x1910c8, arg=0x1f9b54, kw=0x0) #45 0x99314 in EventHook () at ./_tkinter.c:2020
at ceval.c:2618 #46 0xbaf30 in rl_read_key () at input.c:374
#19 0x26764 in eval_code2 (co=0x1f3948, globals=0x0, locals=0x1f38f0, args=0x1910c8, #47 0xac920 in readline_internal_char () at readline.c:454
argcount=1733540, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x2436e4) #48 0xaca64 in readline_internal_charloop () at readline.c:507
at ceval.c:1951 #49 0xaca94 in readline_internal () at readline.c:521
#20 0x285e0 in call_function (func=0x24374c, arg=0x1a606c, kw=0x0) at ceval.c:2772 #50 0xac704 in readline (prompt=0x1cbd9c ">>> ") at readline.c:349
#21 0x28080 in PyEval_CallObjectWithKeywords (func=0x261414, arg=0x18f114, kw=0x0) #51 0x8249c in call_readline (prompt=0x1cbd9c ">>> ") at ./readline.c:462
at ceval.c:2616 #52 0x21ae0 in PyOS_Readline (prompt=0x1cbd9c ">>> ") at myreadline.c:118
#22 0x98064 in PythonCmd (clientData=0x1cc8e0, interp=0x20e658, argc=0, #53 0x205a0 in tok_nextc (tok=0x27abd0) at tokenizer.c:192
argv=0xffbee060) #54 0x20fb4 in PyTokenizer_Get (tok=0x27abd0, p_start=0xffbef8c4, p_end=0xffbef8c0)
at ./_tkinter.c:1274 at tokenizer.c:516
#23 0xff122064 in TclInvokeStringCommand (clientData=0x278538, interp=0x20e658, #55 0x20274 in parsetok (tok=0x27abd0, g=0x17026c, start=256, err_ret=0xffbef9b0)
objc=1, at parsetok.c:128
objv=0x24ec84) at ./../generic/tclBasic.c:1752 #56 0x20158 in PyParser_ParseFile (fp=0x18ebe8, filename=0xbf628 "<stdin>",
#24 0xff13e98c in TclExecuteByteCode (interp=0x20e658, codePtr=0x2a0cd0) g=0x17026c,
at ./../generic/tclExecute.c:845 start=256, ps1=0x1cbd9c ">>> ", ps2=0x25a7e4 "... ", err_ret=0xffbef9b0)
#25 0xff122bf8 in Tcl_EvalObjEx (interp=0x20e658, objPtr=0x2370c8, flags=0) at parsetok.c:75
at ./../generic/tclBasic.c:2723 #57 0x3a9c0 in PyRun_InteractiveOne (fp=0x18ebe8, filename=0xbf628 "<stdin>")
#26 0xff258220 in TkInvokeButton (butPtr=0x279188) at ./../generic/tkButton.c:1457 at pythonrun.c:514
#27 0xff257698 in ButtonWidgetObjCmd (clientData=0x279188, interp=0x20e658, objc=2, #58 0x3a8bc in PyRun_InteractiveLoop (fp=0x18ebe8, filename=0xbf628 "<stdin>")
objv=0x295e00) at ./../generic/tkButton.c:835 at pythonrun.c:478
#28 0xff15e18c in EvalObjv (interp=0x20e658, objc=2, objv=0x295e00, #59 0x3a7ac in PyRun_AnyFileEx (fp=0x18ebe8, filename=0xbf628 "<stdin>", closeit=0)
command=0xff182128 "", at pythonrun.c:453
length=0, flags=262144) at ./../generic/tclParse.c:932 #60 0x3a76c in PyRun_AnyFile (fp=0x18ebe8, filename=0xbf628 "<stdin>") at
#29 0xff15e2b8 in Tcl_EvalObjv (interp=0x20e658, objc=2, objv=0x295e00, pythonrun.c:444
flags=262144) #61 0x1ff20 in Py_Main (argc=3, argv=0xffbefc74) at main.c:297
at ./../generic/tclParse.c:1019 #62 0x1f90c in main (argc=3, argv=0xffbefc74) at python.c:10
#30 0xff122928 in Tcl_EvalObjEx (interp=0x20e658, objPtr=0x2370e0, flags=262144) (gdb)
at ./../generic/tclBasic.c:2565
10. GDB Traceback
(gdb) where #31 0xff165544 in Tcl_UplevelObjCmd (dummy=0x1, interp=0x20e658, objc=1,
#0 0xff1d9bf0 in __sigprocmask () from /usr/lib/libthread.so.1 objv=0x24ec80)
#1 0xff1ce628 in _resetsig () from /usr/lib/libthread.so.1 at ./../generic/tclProc.c:614
#2 0xff1cdd18 in _sigon () from /usr/lib/libthread.so.1 #32 0xff13e98c in TclExecuteByteCode (interp=0x20e658, codePtr=0x2a0b70)
#3 0xff1d0e8c in _thrp_kill () from /usr/lib/libthread.so.1 at ./../generic/tclExecute.c:845
#4 0xfee49b10 in raise () from /usr/lib/libc.so.1 #33 0xff122bf8 in Tcl_EvalObjEx (interp=0x20e658, objPtr=0x274d50, flags=0)
#5 0xfee3512c in abort () from /usr/lib/libc.so.1 at ./../generic/tclBasic.c:2723
#6 0xfee353d0 in _assert () from /usr/lib/libc.so.1 #34 0xff165afc in TclObjInterpProc (clientData=0x1, interp=0x20e658, objc=0,
#7 0xfeee13ec in abort_crash () from /u0/beazley/Projects/WAD/WAD/Test/./ objv=0xffbeebd8) at ./../generic/tclProc.c:1001
debugmodule.so #35 0xff15e18c in EvalObjv (interp=0x20e658, objc=2, objv=0xffbeebd8,
#8 0xfeee28ec in _wrap_abort_crash () command=0xffbef024 "n tkButtonUp .1907556n", length=25, flags=0)
from /u0/beazley/Projects/WAD/WAD/Test/./debugmodule.so at ./../generic/tclParse.c:932
#9 0x281c8 in call_builtin (func=0x1cc4f0, arg=0x1f9424, kw=0x0) at ceval.c:2650 #36 0xff15e7d0 in Tcl_EvalEx (interp=0x20e658,
#10 0x28094 in PyEval_CallObjectWithKeywords (func=0x1cc4f0, arg=0x1f9424, kw=0x0) script=0xffbef024 "n tkButtonUp .1907556n", numBytes=25, flags=-4264800)
at ceval.c:2618 at ./../generic/tclParse.c:1393
#11 0x26764 in eval_code2 (co=0x1d37e0, globals=0x0, locals=0x1d37cf, args=0x1cc4f0, #37 0xff15e9c0 in Tcl_Eval (interp=0x20e658,
argcount=1762552, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x0) at string=0xffbef024 "n tkButtonUp .1907556n") at ./../generic/tclParse.c:1512
ceval.c:1951 #38 0xff1243d0 in Tcl_GlobalEval (interp=0x20e658,
(gdb) where
#12 0x263a0 in eval_code2 (co=0x1d3858, globals=0x0, locals=0x1cc4f0, args=0x19b1a4, command=0xffbef024 "n tkButtonUp .1907556n") at ./../generic/tclBasic.c:4139
#0 0xff1d9bf0 in __sigprocmask () from /usr/lib/libthread.so.1
argcount=1883008, kws=0x1d7318, kwcount=0, defs=0x0, defcount=0, owner=0x0)
at ceval.c:1850
#39 0xff221a40 in Tk_BindEvent (bindingTable=0xffbef024, eventPtr=0x29ffa0,
tkwin=0x2790a8, numObjects=2045728, objectPtr=0xffbef170) at ./../generic/
#1 0xff1ce628 in _resetsig () from /usr/lib/libthread.so.1
#13 0x263a0 in eval_code2 (co=0x1d3e50, globals=0x0, locals=0x19b1a4, args=0x1a7374,
argcount=1883128, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x0) at
tkBind.c:1784
#40 0xff226450 in TkBindEventProc (winPtr=0x2790a8, eventPtr=0x29ffa0)
ceval.c:1850 #2 0xff1cdd18 in _sigon () from /usr/lib/libthread.so.1 at ./../generic/tkCmds.c:244
#14 0x285e0 in call_function (func=0x1a73a4, arg=0x18f114, kw=0x0) at ceval.c:2772 #41 0xff22c218 in Tk_HandleEvent (eventPtr=0x29ffa0) at ./../generic/tkEvent.c:737
#3 0xff1d0e8c in _thrp_kill () from /usr/lib/libthread.so.1
#15 0x28080 in PyEval_CallObjectWithKeywords (func=0x1a73a4, arg=0x18f114, kw=0x0) #42 0xff22c61c in WindowEventProc (evPtr=0x29ff98, flags=-1) at ./../generic/
at ceval.c:2616
#4 0xfee49b10 in raise () from /usr/lib/libc.so.1
#16 0x680b0 in builtin_apply (self=0x0, args=0x0) at bltinmodule.c:88
tkEvent.c:1072
#43 0xff15bb54 in Tcl_ServiceEvent (flags=-1) at ./../generic/tclNotify.c:607
#5 0xfee3512c in abort () from /usr/lib/libc.so.1
#17 0x281c8 in call_builtin (func=0x1910c8, arg=0x1f9b54, kw=0x0) at ceval.c:2650
#18 0x28094 in PyEval_CallObjectWithKeywords (func=0x1910c8, arg=0x1f9b54, kw=0x0)
#44 0xff15beec in Tcl_DoOneEvent (flags=-1) at ./../generic/tclNotify.c:846
#45 0x99314 in EventHook () at ./_tkinter.c:2020
at ceval.c:2618#6 0xfee353d0 in _assert () from /usr/lib/libc.so.1 #46 0xbaf30 in rl_read_key () at input.c:374
#19 0x26764 in eval_code2 (co=0x1f3948, globals=0x0, locals=0x1f38f0, args=0x1910c8, #47 0xac920 in readline_internal_char () at readline.c:454
#7 0xfeee13ec in abort_crash () from /u0/beazley/Projects/WAD/WAD/Test/
argcount=1733540, kws=0x0, kwcount=0, defs=0x0, defcount=0, owner=0x2436e4) #48 0xaca64 in readline_internal_charloop () at readline.c:507
at ceval.c:1951
./debugmodule.so
#20 0x285e0 in call_function (func=0x24374c, arg=0x1a606c, kw=0x0) at ceval.c:2772
#49 0xaca94 in readline_internal () at readline.c:521
#50 0xac704 in readline (prompt=0x1cbd9c ">>> ") at readline.c:349
#8 0xfeee28ec in _wrap_abort_crash ()
#21 0x28080 in PyEval_CallObjectWithKeywords (func=0x261414, arg=0x18f114, kw=0x0)
at ceval.c:2616
#51 0x8249c in call_readline (prompt=0x1cbd9c ">>> ") at ./readline.c:462
#52 0x21ae0 in PyOS_Readline (prompt=0x1cbd9c ">>> ") at myreadline.c:118
from /u0/beazley/Projects/WAD/WAD/Test/./debugmodule.so
#22 0x98064 in PythonCmd (clientData=0x1cc8e0, interp=0x20e658, argc=0, #53 0x205a0 in tok_nextc (tok=0x27abd0) at tokenizer.c:192
argv=0xffbee060) #54 0x20fb4 in PyTokenizer_Get (tok=0x27abd0, p_start=0xffbef8c4, p_end=0xffbef8c0)
#9 0x281c8 in call_builtin (func=0x1cc4f0, arg=0x1f9424, kw=0x0) at
at ./_tkinter.c:1274 at tokenizer.c:516
ceval.c:2650
#23 0xff122064 in TclInvokeStringCommand (clientData=0x278538, interp=0x20e658,
objc=1,
#55 0x20274 in parsetok (tok=0x27abd0, g=0x17026c, start=256, err_ret=0xffbef9b0)
at parsetok.c:128
#10 0x28094 in PyEval_CallObjectWithKeywords (func=0x1cc4f0,
objv=0x24ec84) at ./../generic/tclBasic.c:1752
#24 0xff13e98c in TclExecuteByteCode (interp=0x20e658, codePtr=0x2a0cd0)
#56 0x20158 in PyParser_ParseFile (fp=0x18ebe8, filename=0xbf628 "<stdin>",
g=0x17026c,
arg=0x1f9424, kw=0x0)
at ./../generic/tclExecute.c:845 start=256, ps1=0x1cbd9c ">>> ", ps2=0x25a7e4 "... ", err_ret=0xffbef9b0)
#25 0xff122bf8 in Tcl_EvalObjEx (interp=0x20e658, objPtr=0x2370c8, flags=0) at parsetok.c:75
at ceval.c:2618
at ./../generic/tclBasic.c:2723 #57 0x3a9c0 in PyRun_InteractiveOne (fp=0x18ebe8, filename=0xbf628 "<stdin>")
#26 0xff258220 in TkInvokeButton (butPtr=0x279188) at ./../generic/tkButton.c:1457 at pythonrun.c:514
#27 0xff257698 in ButtonWidgetObjCmd (clientData=0x279188, interp=0x20e658, objc=2, #58 0x3a8bc in PyRun_InteractiveLoop (fp=0x18ebe8, filename=0xbf628 "<stdin>")
objv=0x295e00) at ./../generic/tkButton.c:835 at pythonrun.c:478
#28 0xff15e18c in EvalObjv (interp=0x20e658, objc=2, objv=0x295e00, #59 0x3a7ac in PyRun_AnyFileEx (fp=0x18ebe8, filename=0xbf628 "<stdin>", closeit=0)
command=0xff182128 "", at pythonrun.c:453
length=0, flags=262144) at ./../generic/tclParse.c:932 #60 0x3a76c in PyRun_AnyFile (fp=0x18ebe8, filename=0xbf628 "<stdin>") at
#29 0xff15e2b8 in Tcl_EvalObjv (interp=0x20e658, objc=2, objv=0x295e00, pythonrun.c:444
flags=262144) #61 0x1ff20 in Py_Main (argc=3, argv=0xffbefc74) at main.c:297
at ./../generic/tclParse.c:1019 #62 0x1f90c in main (argc=3, argv=0xffbefc74) at python.c:10
#30 0xff122928 in Tcl_EvalObjEx (interp=0x20e658, objPtr=0x2370e0, flags=262144) (gdb)
at ./../generic/tclBasic.c:2565
11. The Debugging Problem
Problem
• Debugger doesn’t know anything about script code
• Mostly provides information about the implementation of the interpreter
• Can’t fully answer question of “how did I get here?”
Sometimes it is hard to reproduce a problem
• Run-time environment may be complex
• Problems may be due to timing or unusual event sequences
• Problem may only occur after a long period of time
• May not get any debugging information after a crash (no core file, no messages)
Other issues
• Requires users to run a separate debugger
• Requires users to have a C development environment
• Assumes users know how to use the debugger
• Makes it difficult for an extension developer to get useful feedback
Question: Can you do any better?
12. Embedded Error Recovery
Idea:
• Most interpreters have exception/error handling.
• Maybe fatal extension errors could be handled as exceptions
Interpreter Interpreter
DeathException
Extension Extension
Code Code
Death (core dumped)
Why this approach?
• Allows interpreter to unwind its internal call stack and provide details
• Might work better in a very complex execution environment
• Self contained
13. WAD
Wrapped Application Debugger
• A proof of concept implementation of embedded error recovery
• A self-contained scripting language extension module
• Requires no modifications or recompilation of any code
Demo
• Tcl/Tk
• Mixed language debugging in Python
• Apache + mod_python
14. WAD Demo
% python
>>> import debug
>>> debug.seg_crash()
Segmentation fault (core dumped)
%
% python
>>> import debug
>>> import libwadpy
WAD Enabled
>>> debug.seg_crash()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
SegFault: [ C stack trace ]
#2 0x000281c0 in call_builtin(func=0x1cbaf0,arg=0x18f114,kw=0x0) in 'ceval.c', line
2650
#1 0xfeee26b8 in _wrap_seg_crash(self=0x0,args=0x18f114) in 'pydebug.c', line 510
#0 0xfeee1258 in seg_crash(0x1,0xfeef2d48,0x19a9f8,0x0,0x7365675f,0x5f5f6469) in
'debug.c', line 18
/u0/beazley/Projects/WAD/WAD/Test/debug.c, line 18
int seg_crash() {
int *a = 0;
=> *a = 3;
return 1;
}
15. Big Picture
WAD
• WAD is a dynamically loadable extension module
• Converts catastrophic extension errors to scripting exceptions
Key features
• No modifications to interpreter or extension code
• No recompilation
• No relinking
• No separate debugger required (gdb, dbx, etc.)
• No C, C++, Fortran development environment needed
• No added performance penalty (no instrumentation, tracing, etc.)
The rest of this talk
• How it actually works
• Limitations
• Discussion
16. Calling Extension Code
Scripting languages provide a foreign function interface
Scripting
Interpreter
call_builtin()
NULL,
args (result,status) TCL_OK,
TCL_ERROR,
...
Wrappers
args result
C/C++ Extension
• A small number of exit points from the interpreter (e.g., call_builtin above)
• External functions are often wrappers that call the real function
• Success indicated by returning a special status code (e.g., NULL, TCL_OK, ...)
• Returning an error status generates an interpreter exception/error
17. Extension Errors
Fatal errors result in signals
Scripting
Interpreter
call_func()
args
Wrappers
args
C/C++ Extension signal core dumped
• SIGSEGV, SIGBUS, SIGABRT, SIGILL, SIGFPE, etc.
18. Catching Extension Errors
Install a special signal handler
Scripting
Interpreter
call_func()
args
error
return from
signal
Wrappers
WAD
1. Inspect stack
args
2. Collect info
C/C++ Extension signal 3. Raise exception
4. Return
• On signal, inspect process stack and collect debugging information
• Force a return to the interpreter with a suitable error status
• Interpreter doesn’t know anything "bad" happened (looks like a normal error)
19. WAD Implementation - Signals
signal get
unwind stack
context
symbols/debug
raise exception
signal
modify context
return
Signal handling
• Use sigaction() with special options for receiving process context
• Handler gets PC, SP and other CPU registers
,
• Modifications to context take effect on return from handler
• Commonly used to implement user threads
• We’re going to use it to back out of extension code
20. WAD Implementation - Unwinding
signal get 0x0001f7c4
unwind stack
context 0x0001f904
0x0001fedc
0x0003a7c8
symbols/debug 0x000281c0
...
0xfeee241c
0xfeee1178
raise exception 0x000bee2c
0xfee350e4
0xfee49b08
signal 0xff1d0e84
modify context 0xff1cdd10
return
0xff1d9bf0 error
Stack unwinding
• Read process virtual memory map from /proc
• Unwind entire call stack of the process using PC and SP from process context
• Use memory map to validate (in case of corrupted stack frames)
• Get list of PC values and raw stack frames
21. WAD Implementation - Symbols
signal signal _start
unwind stack
handler main
Py_Main
...
symbols/debug PyEval_CallObjectW
call_builtin
_wrap_spam
spam
raise exception __eprintf
abort
raise
signal _thrp_kill
modify context _sigon
return
__sigprocmask
Symbols and debugging information
• Collect debugging information from ELF symbol table and STABS
• Get list of function names, source files, line numbers, arguments, etc.
• Final result is a linked-list of annotated stack frames.
22. WAD Implementation - Exceptions
signal signal _start
unwind stack
handler main
Py_Main
... Interpreter
symbols/debug PyEval_CallObjectW
call_builtin
_wrap_spam
spam
raise exception __eprintf
abort
raise Extension
Code
signal _thrp_kill search
modify context _sigon
return
__sigprocmask
Raising an exception
• Search the call stack to find the first call to extension code (e.g., call_builtin)
• Note: this depends on the scripting language (more later)
• If match found, stack trace is passed to a language-specific handler function
• Handler raises an appropriate scripting language exception with stack trace info
23. WAD Implementation - Context
signal signal _start
unwind stack
handler main
Py_Main
...
symbols/debug PyEval_CallObjectW
call_builtin
_wrap_spam
spam
raise exception __eprintf
abort
raise
signal _thrp_kill
modify context _sigon
return
__sigprocmask
Context modification
• Bottom of call stack is chopped off
24. WAD Implementation - Context
signal signal _start
unwind stack
handler main
Py_Main
...
symbols/debug PyEval_CallObjectW
call_builtin
raise exception error
signal
modify context
return
Context modification
• Bottom of call stack is chopped off
• PC, SP and registers modified to emulate a function return with an error status
,
• Signal handler returns
• Details a little hairy (more shortly)
25. Different Scripting Languages
Most of WAD is scripting language neutral
• Signal handling, unwinding, generation of stack trace, context modification
Each scripting language handled as an extension
Python module
register syms = { symbol
{"call_builtin", NULL}, table
{"PyObject_Print", -1},
WAD {"PyObject_Repr",NULL},
Core ...
} symbol name return value
match void handle_error(stack *s) { error
handler
found ...
}
• Module registers names of interpreter functions that might call extension code
• Return value is the value an extension function must return to signal an error
• Handler only invoked if WAD matches symbol name on call stack
26. Context Modification
Return mechanism is similar to
• setjmp/longjmp in C
• C++ exception handling.
However
• The interpreter is not instrumented or modified in any way.
• There is no corresponding setjmp() call.
• There is no matching try { ... } clause in C++.
Problem : corrupted CPU registers
• Interpreter not designed with aborted procedure return in mind
• By chopping off call stack, we return to the interpreter with inconsistent state
• Callee-save registers not restored properly
However, we can mostly fix this...
27. Register Recovery
Solution: SPARC
• Each procedure gets a fresh set of CPU registers (i.e., a “window”)
• To restore state: simply roll back the register windows
Solution: i386
• Inspect machine instructions of function prologues
• Figure out where callee-save registers were saved on call-stack
• Restore values while walking up the stack.
blah:
55 pushl %ebp
89 e5 movl %esp,%ebp
83 ec 2c subl $0x2c,%esp Size of locals
57 pushl %edi
56 pushl %esi
Saved registers
53 pushl %ebx
• Only a heuristic. Might get it wrong, but it usually works.
28. Implementation Details
WAD implementation
• Mostly ANSI C, some assembly code, some C++
• ~1500 semicolons
• Most code related to introspection (debugging, symbol tables, etc...)
• Core is scripting language independent
• Execution is isolated (own stack and memory management).
• Does not rely upon third party libraries (e.g., libbfd).
Compatibility
• Sun Sparc Solaris
• i386 GNU/Linux (recent kernels).
• Python and Tcl
• Perl (was working, currently broken)
29. Limitations
Aborted execution
• May leak memory
• No destruction of objects in C++
• May interact poorly with C++ exceptions
• May result in unreleased system resources (files, sockets, etc.)
• May result in deadlock (if holding locks when error occurs)
Unrecoverable errors
• Errors that destroy or corrupt the interpreter
• Stack overflow (results in corrupted call-stack)
Compiler optimization
• False reporting of debugging data, source files, and lines
• Incorrect register recovery (-fomit-frame-pointer)
30. Limitations
Threads
• Mixing threads and signals is an extremely delicate topic
• WAD requires a fully functional signal implementation
• Some versions older versions of GNU/Linux do not seem to work correctly
Threads present other issues
• Signals of interest are handled by the faulting thread
• However, some interpreters place a mutex lock around the interpreter
Python Extension
acquire_lock()
... func() {
call_builtin() store_thread_state()
release_lock()
abort ... run ...
???
acquire_lock()
restore_thread_state()
success
... return
release_lock() }
• If signal occurs while lock released, we violate mutual exclusion on interpreter
• May deliver exception to wrong thread (probably bad)
31. Limitations
Interference with normal signal handling
• Scripting languages also provide support for signals
• Signals caught by WAD are unlikely to be used in a script
• One exception : Perl sigtrap module
• User can disable WAD by installing a new signal handler
Stripped symbol tables
• May make it difficult to find return location
• May provide only limited debugging information
• Can find some symbols in dynamic loading tables, but may not be enough
Things that just won’t work
• Breakpoints
• Single-step execution
• Restart
• WAD lives in the same process
32. Discussion
What happens if WAD breaks?
• Worst case: program core dumps
• Can still load core in debugger
• Will get a few extra stack frames (for WAD)
Does WAD prevent the use of a debugger?
• Traditional debugger can still be attached to process
• Debugger overrides signal handling (which disables WAD)
Repeated errors
• If interpreter is corrupted, may crash immediately after return
• This reinvokes WAD with new context information
• May get several different tracebacks, followed by a core dump.
Continued execution
• After an error occurs, is it safe to resume execution?
• Depends on the error
• Depends on the application
• Primary goal is to get error information, not to run forever
33. Discussion
Wouldn’t it be easier to modify the interpreter?
• Use of setjmp()/longjmp() would simplify implementation, avoid register problems
• However, number of external exits may be substantial (extension types)
• Many interpreters are re-entrant (extensions can call eval)
• Performance impact?
• Difficult to maintain unless patches made part of interpreter distribution
• Not nearly as cool
How hard is it to port WAD to other platforms?
• WAD language modules are platform-neutral
• Operate on data that is presented in a generic format
• WAD core would need to be modified
• /proc, process context, register recovery, ELF, STABS, etc.
• Note: Solaris and GNU/Linux versions are nearly identical however.
34. Related Work
Bits and pieces in many different areas
• Programming environments (Common Lisp, Java?)
• Modifications to gdb (common lisp, Python, others?)
• Perl sigtrap (prints a perl stack trace on fatal signal)
• Exception handling
• Asynchronous exception handling in programming languages (ML, Haskell, etc.)
• PyDebug (persistent breakpoints, interface to gdb from Python)
More references?
• Debugging and exception handling are huge topics
• Not aware of anything quite like WAD in scripting language world
• Would be interested in getting feedback
35. Future Directions
A lot of people are building scripting language extensions
• Debugging support can be greatly improved
• Hope to see more work in this area
WAD
• Improve the core
• Better interaction with threads and C++
• Support for more languages
• Support for other platforms
• Interaction with other debuggers and IDEs
Bottom line
• Still a lot of hard problems to work on
36. More Information
http://systems.cs.uchicago.edu/wad
• Freely available under LGPL
• This is work in progress
• Volunteers welcome
• I’m also looking for students