Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
0.5mln packets per second with Erlang
1. 0.5mln packets per second with Erlang
Revelations from a real-world project based on Erlang on Xen
ErLounge/SF
June 6, 2014
Maxim Kharchenko
CTO, Cloudozer LLP
mk@cloudozer.com
2. The road map
!
Erlang on Xen intro
!
LINCX project overview
!
Speed-related notes
– Arguments are registers
– ETS tables are (mostly) ok
– Do not overuse records
– GC is key to speed
– gen_server vs. barebone process
– NIFS: more pain than gain
– Fast counters
!
Q&A
3. 3
Erlang on Xen a.k.a. LING
!
A new Erlang platform that runs without OS
!
Conceived in 2009
!
Highly-compatible with Erlang/OTP
!
Built from scratch, not a “port”
!
Optimized for low startup latency
!
Open sourced in 2014 (github.com/cloudozer/ling)
!
The public build service is available
Go to erlangonxen.org
5. The road map
!
Erlang on Xen intro
!
LINCX project overview
!
Speed-related notes
– Arguments are registers
– ETS tables are (mostly) ok
– Do not overuse records
– GC is key to speed
– gen_server vs. barebone process
– NIFS: more pain than gain
– Fast counters
!
Q&A
6. LINCX: project overview
!
Started in December, 2013
!
Initial scope = porting LINC-Switch to LING
!
High degree of compatibility demonstrated for LING
!
Extended scope = fix LINC-Switch fast path
!
Beta version of LINCX open sourced on March 3, 2014
!
LINCX runs 100x faster than the old code
6
LINC-Switch is an OpenFlow software switch implemented in ErlangLINC-Switch is an OpenFlow software switch implemented in Erlang
For more details go to http://FlowForwarding.org
7. Raw network interfaces in Erlang
* LING adds raw network interfaces:
* Raw interface receives whole Ethernet frames
* LINCX uses standard gen_tcp:* for the control connection
and net_vif:* - for data ports
* Raw interfaces support mailbox_limit option - packets get
dropped if the mailbox of the receiving process overflows
7
Port
=
net_vif:open(“eth1”,
[]),
port_command(Port,
<<1,2,3>>),
receive
{Port,{data,Frame}}
-‐>
...
Port
=
net_vif:open(“eth1”,
[{mailbox_limit,16384}]),
...
8. Testbed configuration
* Test traffic goes between vm1 and vm2
* LINCX runs as a separate Xen domain
* Virtual interfaces are bridged in Dom0
8
9. Processing delay and low-level NIC stats
!
LING can measure a processing delay for a packet:
! ling:experimental(processing_delay,
[]).
! Processing
delay
statistics:
! Packets:
2000
Delay:
1.342us
+-‐
0.143
(95%)
!
LING can collect low-level stats for a network interface:
! ling:experimental(llstat,
1).
%%
stop/display
! Duration:
4868.6ms
! RX:
interrupts:
69170
(0
kicks
0.0%)
(freq
14207.4/s
period
70.4us)
! RX:
reqs
per
int:
0/0.0/0
! RX:
tx
buf
freed
per
int:
0/8.5/234
! TX:
outputs:
1479707
(112263
kicks
7.6)
(freq
303928.8/s
period
3.3us)
! TX:
tx
buf
freed
per
int:
0/0.6/113
! TX:
rates:
303.9kpps
3622.66Mbps
avg
pkt
size
1489.9B
! TX:
drops:
12392
(freq
2545.3/s
period
392.9us)
! TX:
drop
rates:
2.5kpps
30.26Mbps
avg
pkt
size
1486.0B
9
11. The road map
!
Erlang on Xen intro
!
LINCX project overview
!
Speed-related notes
– Arguments are registers
– ETS tables are (mostly) ok
– Do not overuse records
– GC is key to speed
– gen_server vs. barebone process
– NIFS: more pain than gain
– Fast counters
!
Q&A
12. 12
Arguments are registers
!
Many arguments do not make a function any slower
!
Do not reshuffle arguments:
animal(batman
=
Cat,
Dog,
Horse,
Pig,
Cow,
State)
-‐>
feed(Cat,
Dog,
Horse,
Pig,
Cow,
State);
animal(Cat,
deli
=
Dog,
Horse,
Pig,
Cow,
State)
-‐>
pet(Cat,
Dog,
Horse,
Pig,
Cow,
State);
...
%%
SLOW
animal(Cat,
Dog,
Horse,
Pig,
Cow,
State)
-‐>
feed(Goat,
Cat,
Dog,
Horse,
Pig,
Cow,
State);
...
13. 13
ETS tables are (mostly) ok
!
A small ETS table lookup = 10x function activations
!
Do not use ets:tab2list() inside tight loops
!
Treat ETS as a database; not a pool of global variables
!
1-2 ETS lookups on the fast path are ok
!
Beware that ets:lookup(), etc create a copy of the data on
the heap of the caller, similarly to message passing
14. 14
Do not overuse records
!
selelement() creates a copy of the tuple
!
State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?)
copies of the tuple
!
Use tuples explicitly in the performance-critical sections to
see the heap footprint of the code
%%
from
9p.erl
mixer({rauth,_,_},
{tauth,_,AFid,_,_},
_)
-‐>
{write_auth,AFid};
mixer({rauth,_,_},
{tauth,_,AFid,_,_,_},
_)
-‐>
{write_auth,AFid};
mixer({rwrite,_,_},
_,
initial)
-‐>
start_attaching;
mixer({rerror,_,_},
_,
initial)
-‐>
auth_failed;
mixer({rlerror,_,_},
_,
initial)
-‐>
auth_failed;
mixer({rattach,_,Qid},
{tattach,_,Fid,_,_,AName,_},
initial)
-‐>
{attach_more,Fid,AName,qid_type(Qid)};
mixer({rclunk,_},
{tclunk,_,Fid},
initial)
-‐>
{forget,Fid};
15. !
Heap is a list of chunks
!
'new heap' is close to its head, 'old heap' - to its tail
!
A GC run takes 10μs on average
!
GC may run 1000s times per second
15
Garbage collection is key to speed
HTOPproc_t
16. How to tackle GC-related issues
– (Priority 1) Call erlang:garbage_collect() at strategic points
– (Priority 2) For the fastest code avoid GC completely –
restart the fast process regularly
– spawn(F,
[{suppress_gc,true}]),
%%
LING-‐only
– (Priority 3) Use fullsweep_after option
16
17. 17
gen_server vs barebone process
!
Message passing using gen_server:call() is 2x slower
than Pid ! Msg
!
For speedy code prefer barebone processes to gen_servers
!
Design Principles are about high availability, not high
performance
18. 18
NIFs: more pain than gain
!
A new principle of Erlang development: do not use NIFs
!
For a small performance boost, NIFs undermine key
properties of Erlang: reliability and soft-realtime
guarantees
!
Most of the time Erlang code can be made as fast as C
!
Most of performance problems of Erlang are traceable to
NIFs, or external C libraries, which are similar
!
Erlang on Xen does not have NIFs and we do not plan to
add them
19. 19
Fast counters
!
32-bit or 64-bit unsigned integer counters with overflow -
trivial in C, not easy in Erlang
!
FIXNUMs are signed 29-bit integers, BIGNUMs consume
heap and 10-100x slower
!
Use two variables for a counter?
foo(C1,
16#ffffff,
...)
→
foo(C1+1,
0,
...);
foo(C1,
C2,
...)
-‐>
foo(C1,
C2+1,
...);
...
!
Erlang on Xen has a new experimental feature – fast
counters:
erlang:new_counter(Bits)
-‐>
Ref
erlang:increment_counter(Ref,
Incr)
erlang:read_counter(Ref)
erlang:release_counter(Ref)