This document provides information on various debugging and profiling tools that can be used for Ruby including:
- lsof to list open files for a process
- strace to trace system calls and signals
- tcpdump to dump network traffic
- google perftools profiler for CPU profiling
- pprof to analyze profiling data
It also discusses how some of these tools have helped identify specific performance issues with Ruby like excessive calls to sigprocmask and memcpy calls slowing down EventMachine with threads.
7. strace -cp <pid>
-c
Count time, calls, and errors for each system call and report a
summary on program exit.
-p pid
Attach to the process with the process ID pid and begin tracing.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
50.39 0.000064 0 1197 592 read
34.65 0.000044 0 609 writev
14.96 0.000019 0 1226 epoll_ctl
0.00 0.000000 0 4 close
0.00 0.000000 0 1 select
0.00 0.000000 0 4 socket
0.00 0.000000 0 4 4 connect
0.00 0.000000 0 1057 epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00 0.000127 4134 596 total
8. strace -ttTp <pid> -o <file>
-t
Prefix each line of the trace with the time of day.
-tt
If given twice, the time printed will include the microseconds.
-T
Show the time spent in system calls. This records the time
difference between the beginning and the end of each system call.
-o filename
Write the trace output to the file filename rather than to stderr.
01:09:11.266949 epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
01:09:11.300102 accept(10, {sa_family=AF_INET, sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
01:09:11.300190 fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
01:09:11.300237 fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
01:09:11.300277 setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
01:09:11.300489 accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
01:09:11.300547 epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
01:09:11.300593 epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
01:09:11.300633 read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
01:09:11.301727 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
01:09:11.302095 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
01:09:11.302144 write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
01:09:11.302221 read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
10. stracing ruby: sigprocmask
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.326334 0 3568567 rt_sigprocmask
0.00 0.000000 0 9 read
0.00 0.000000 0 10 open
0.00 0.000000 0 10 close
0.00 0.000000 0 9 fstat
0.00 0.000000 0 25 mmap
------ ----------- ----------- --------- --------- ----------------
100.00 0.326334 3568685 0 total
debian/redhat compile ruby with --enable-pthread
uses a native thread timer for SIGVTALRM
causes excessive calls to sigprocmask: 30% slowdown!
11. tcpdump
dump traffic on a network
tcpdump -i eth1 -s 1500 -nqA
tcp dst port 80
tcpdump -i eth0 -s 1500 -nqA
tcp dst port 3306
tcpdump -i eth1 -s 1500 -w <file>
tcp dst port 80
12. tcpdump -i <eth> -s <len> -nqA <expr>
tcpdump -i <eth> -w <file> <expr>
-i <eth>
Network interface.
-s <len>
Snarf len bytes of data from each packet.
-n
Don't convert addresses (host addresses, port numbers) to names.
-q
Quiet output. Print less protocol information.
-A
Print each packet (minus its link level header) in ASCII.
-w <file>
Write the raw packets to file rather than printing them out.
<expr>
libpcap expression, for example:
tcp src port 80
tcp dst port 3306
13. tcp dst port 80
19:52:20.216294 IP 24.203.197.27.40105 > 174.37.48.236.80: tcp 438
E...*.@.l.%&.....%0....POx..%s.oP.......GET /poll_images/cld99erh0/logo.png HTTP/1.1
Accept: */*
Referer: http://apps.facebook.com/realpolls/?_fb_q=1
tcp dst port 3306
19:51:06.501632 IP 10.8.85.66.50443 > 10.8.85.68.3306: tcp 98
E..."K@.@.Yy
.UB
.UD.....z....L............
GZ.y3b..[......W....SELECT * FROM `votes` WHERE (`poll_id` = 72621) LIMIT 1
tcpdump -w <file>
17. Profiling MRI
10% of production VM
time spent in
rb_str_sub_bang
String#sub!
called from Time.parse
return unless str.sub!(/A(d{1,2})/, '')
return unless str.sub!(/A( d|d{1,2})/, '')
return unless str.sub!(/A( d|d{1,2})/, '')
return unless str.sub!(/A(d{1,3})/, '')
return unless str.sub!(/A(d{1,2})/, '')
return unless str.sub!(/A(d{1,2})/, '')
switch to third_base gem
ThirdBase: Fast and Easy
Date/DateTime class for
Ruby
18. Profiling EM + threads
Total: 3763 samples
2764 73.5% catch_timer
989 26.3% memcpy
3 0.1% st_lookup
2 0.1% rb_thread_schedule
1 0.0% rb_eval
1 0.0% rb_newobj
1 0.0% rb_gc_force_recycle
known issue: EM+threads =
slow
memcpy??
thread context switches
copy the stack w/ memcpy
EM allocates huge buffer
on the stack
solution: move buffer to
the heap
22. ltrace -F <conf> -b -g -x <sym>
-b
Ignore signals.
-g
Ignore libraries linked at compile time.
-F <conf>
Read prototypes from config file.
-x <sym>
Trace calls to the function sym.
-s <num>
Show first num bytes of string args.
-F ltrace.conf
int mysql_real_query(addr,string,ulong);
void garbage_collect(void);
int memcached_set(addr,string,ulong,string,ulong);
24. gdb
the GNU debugger
gdb <program>
gdb <program> <pid>
Be sure to build with:
-ggdb
-O0
25. gdb walkthrough
% gdb ./test-it start gdb
(gdb) b average set breakpoint on function named average
Breakpoint 1 at 0x1f8e: file test-it.c, line 3.
(gdb) run run program
Starting program: /Users/joe/test-it
Reading symbols for shared libraries ++. done
Breakpoint 1, average (x=5, y=6) at test-it.c:3 hit breakpoint!
3 int sum = x + y;
(gdb) bt show backtrace
#0 average (x=5, y=6) at test-it.c:3
function stack
#1 0x00001fec in main () at test-it.c:12
(gdb) s
4 double avg = sum / 2.0; single step
(gdb) s
5 return avg;
(gdb) p avg
$1 = 5.5 print variables
(gdb) p sum
$2 = 11
26. Ruby VM stack traces
(gdb) where
#0 0x0002a55e in rb_call (klass=1386800, recv=5056455, mid=42, argc=1, argv=0xbfffe5c0,
scope=0, self=1403220) at eval.c:6125
#1 0x000226ef in rb_eval (self=1403220, n=0x1461e4) at eval.c:3493
#2 0x00026d01 in rb_yield_0 (val=5056455, self=1403220, klass=0, flags=0, avalue=0) at
eval.c:5083
#3 0x000270e8 in rb_yield (val=5056455) at eval.c:5168
#4 0x0005c30c in int_dotimes (num=1000000001) at numeric.c:2946
#5 0x00029be3 in call_cfunc (func=0x5c2a0 <int_dotimes>, recv=1000000001, len=0, argc=0,
argv=0x0) at eval.c:5759
#6 0x00028fd4 in rb_call0 (klass=1387580, recv=1000000001, id=5785, oid=5785, argc=0,
argv=0x0, body=0x152b24, flags=0) at eval.c:5911
#7 0x0002a7a7 in rb_call (klass=1387580, recv=1000000001, mid=5785, argc=0, argv=0x0,
scope=0, self=1403220) at eval.c:6158
#8 0x000226ef in rb_eval (self=1403220, n=0x146284) at eval.c:3493
#9 0x000213e3 in rb_eval (self=1403220, n=0x1461a8) at eval.c:3223
#10 0x0001ceea in eval_node (self=1403220, node=0x1461a8) at eval.c:1437
#11 0x0001d60f in ruby_exec_internal () at eval.c:1642
#12 0x0001d660 in ruby_exec () at eval.c:1662
#13 0x0001d68e in ruby_run () at eval.c:1672
#14 0x000023dc in main (argc=2, argv=0xbffff7c4, envp=0xbffff7d0) at main.c:48
rb_eval recursively executes ruby code in 1.8
27. Debugging Ruby Segfaults
test_segv.rb:4: [BUG] Segmentation fault
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9.7.0]
def test
#include "ruby.h" require 'segv'
4.times do
VALUE Dir.chdir '/tmp' do
segv() Hash.new{ segv }[0]
{ end
VALUE array[1]; end
array[1000000] = NULL; end
return Qnil;
} sleep 10
test()
void
Init_segv()
{
rb_define_method(rb_cObject, "segv", segv, 0);
}
28. 1. Attach to running process
$ ps aux | grep ruby
joe 23611 0.0 0.1 25424 7540 S Dec01 0:00 ruby test_segv.rb
$ sudo gdb ruby 23611
Attaching to program: ruby, process 23611
0x00007fa5113c0c93 in nanosleep () from /lib/libc.so.6
(gdb) c
Continuing.
Program received signal SIGBUS, Bus error.
segv () at segv.c:7
7 array[1000000] = NULL;
2. Use a coredump
Process.setrlimit Process::RLIMIT_CORE, 300*1024*1024
$ sudo mkdir /cores
$ sudo chmod 777 /cores
$ sudo sysctl kernel.core_pattern=/cores/%e.core.%s.%p.%t
$ sudo gdb ruby /cores/ruby.core.6.23611.1259781224
29. def test
require 'segv'
4.times do
Dir.chdir '/tmp' do
Hash.new{ segv }[0]
end
end (gdb) where
end #0 segv () at segv.c:7
#1 0x000000000041f2be in call_cfunc () at eval.c:5727
test() ...
#13 0x000000000043ba8c in rb_hash_default () at hash.c:521
...
#19 0x000000000043b92a in rb_hash_aref () at hash.c:429
...
#26 0x00000000004bb7bc in chdir_yield () at dir.c:728
#27 0x000000000041d8d7 in rb_ensure () at eval.c:5528
#28 0x00000000004bb93a in dir_s_chdir () at dir.c:816
...
#35 0x000000000041c444 in rb_yield () at eval.c:5142
#36 0x0000000000450690 in int_dotimes () at numeric.c:2834
...
#48 0x0000000000412a90 in ruby_run () at eval.c:1678
#49 0x000000000041014e in main () at main.c:48
32. require 'sinatra'
$ ab -c 1 -n 50 http://127.0.0.1:4567/compute
$ ab -c 1 -n 50 http://127.0.0.1:4567/sleep
get '/sleep' do
sleep 0.25 Sampling profiler:
'done'
end 232 samples total
get '/compute' do 83 samples were in /compute
proc{ |n|
a,b=0,1 118 samples had /compute on
n.times{ a,b = b,a+b } the stack but were in
b another function
}.call(10_000)
'done' /compute accounts for 50%
of process, but only 35% of
end time was in /compute itself
== Sinatra has ended his set (crowd applauds)
PROFILE: interrupts/evictions/bytes = 232/0/2152
Total: 232 samples
83 35.8% 35.8% 118 50.9% Sinatra::Application#GET /compute
56 24.1% 59.9% 56 24.1% garbage_collector
35 15.1% 75.0% 113 48.7% Integer#times
36. gdb.rb
gdb with MRI hooks
gdb.rb <pid>
http://github.com/tmm1/gdb.rb
37. def test
require 'segv'
4.times do
Dir.chdir '/tmp' do
Hash.new{ segv }[0]
end
end
end
(gdb) ruby threads
test()
0xa3e000 main curr thread THREAD_RUNNABLE WAIT_NONE
node_vcall segv in test_segv.rb:5
node_call test in test_segv.rb:5
node_call call in test_segv.rb:5
node_call default in test_segv.rb:5
node_call [] in test_segv.rb:5
node_call test in test_segv.rb:4
node_call chdir in test_segv.rb:4
node_call test in test_segv.rb:3
node_call times in test_segv.rb:3
node_vcall test in test_segv.rb:9
40. mongrel sleeper thread
0x16814c00 thread THREAD_STOPPED WAIT_TIME(0.47) 1522 bytes
node_fcall sleep in lib/mongrel/configurator.rb:285
node_fcall run in lib/mongrel/configurator.rb:285
node_fcall loop in lib/mongrel/configurator.rb:285
node_call run in lib/mongrel/configurator.rb:285
node_call initialize in lib/mongrel/configurator.rb:285
node_call new in lib/mongrel/configurator.rb:285
node_call run in bin/mongrel_rails:128
node_call run in lib/mongrel/command.rb:212
node_call run in bin/mongrel_rails:281
node_fcall (unknown) in bin/mongrel_rails:19
def run
@listeners.each {|name,s|
s.run
}
$mongrel_sleeper_thread = Thread.new { loop { sleep 1 } }
end
41. god memory leaks
(gdb) ruby objects arrays
elements instances
43 God::Process
94310 3
43 God::Watch
94311 3
43 God::Driver
94314 2
43 God::DriverEventQueue
94316 1
43 God::Conditions::MemoryUsage
43 God::Conditions::ProcessRunning
5369 arrays
43 God::Behaviors::CleanPidFile
2863364 member elements
45 Process::Status
86 God::Metric
many arrays with 327 God::System::SlashProcPoller
90k+ elements! 327 God::System::Process
406 God::DriverEvent
5 separate god leaks fixed by Eric
Lindvall with the help of gdb.rb!
42. ruby method cache
(gdb) ruby methodcache
Module#extend wipes the UnboundMethod#arity
Hash#[]=
entire method cache! Class#private
Array#freeze
(gdb) b rb_clear_cache Module#Integer
Breakpoint 1 at 0x41067b: file eval.c, line 351. Fixnum#<
(gdb) c Class#is_a?
Continuing. Fixnum#+
Class#protected
Breakpoint 1, rb_clear_cache () at eval.c:351 Class#>=
351 if (!ruby_running) return;
(gdb) ruby threads 2028 empty slots (99.02%)
0x1623000 main curr thread THREAD_RUNNABLE WAIT_NONE
node_call extend_object in sin.rb:23
node_call extend in sin.rb:23
node_call GET /other in lib/sinatra/base.rb:779
node_call GET /other in lib/sinatra/base.rb:779
node_call call in lib/sinatra/base.rb:779
node_fcall route in lib/sinatra/base.rb:474
44. 191691 total objects
Final heap size 191691 filled, 220961 free
Displaying top 20 most common line/class pairs
89513 __null__:__null__:__node__
41438 __null__:__null__:String
2348 lib/ruby/site_ruby/1.8/rubygems/specification.rb:557:Array
1508 lib/ruby/gems/1.8/specifications/gettext-1.9.gemspec:14:String
1021 lib/ruby/gems/1.8/specifications/heel-0.2.0.gemspec:14:String
951 lib/ruby/site_ruby/1.8/rubygems/version.rb:111:String
935 lib/ruby/site_ruby/1.8/rubygems/specification.rb:557:String
834 lib/ruby/site_ruby/1.8/rubygems/version.rb:146:Array
BleakHouse
installs a patched version of ruby: ruby-bleak-
house
unlike gdb.rb, see where objects were created
(file:line)
create multiple dumps over time with `kill -USR2
<pid>` and compare to find leaks
45. Coming soon:
bleak_house++
no patches or need to recompile
ruby
“assembly metaprogramming” to
setup a trampoline on rb_new_obj
http://timetobleed.com/rewrite-
your-ruby-vm-at-runtime-to-hot-
patch-useful-features/
memory profiler
coredump a production ruby process
load the core to generate profiles
of memory usage and leaks