Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
0.5mln packets per second with Erlang
Revelations from a real-world project based on Erlang on Xen
ErLounge/SF
June 6, 201...
The road map
!
Erlang on Xen intro
!
LINCX project overview
!
Speed-related notes
– Arguments are registers
– ETS tables a...
3
Erlang on Xen a.k.a. LING
!
A new Erlang platform that runs without OS
!
Conceived in 2009
!
Highly-compatible with Erla...
4
Zerg demo: zerg.erlangonxen.org
The road map
!
Erlang on Xen intro
!
LINCX project overview
!
Speed-related notes
– Arguments are registers
– ETS tables a...
LINCX: project overview
!
Started in December, 2013
!
Initial scope = porting LINC-Switch to LING
!
High degree of compati...
Raw network interfaces in Erlang
* LING adds raw network interfaces:
* Raw interface receives whole Ethernet frames
* LINC...
Testbed configuration
* Test traffic goes between vm1 and vm2
* LINCX runs as a separate Xen domain
* Virtual interfaces ar...
Processing delay and low-level NIC stats
!
LING can measure a processing delay for a packet:
! ling:experimental(processin...
IXIA confirms 460kpps peak rate
!
1GbE hw NICs/128 byte packets
!
IXIA packet generator/analyzer
10
The road map
!
Erlang on Xen intro
!
LINCX project overview
!
Speed-related notes
– Arguments are registers
– ETS tables a...
12
Arguments are registers
!
Many arguments do not make a function any slower
!
Do not reshuffle arguments:
animal(batman	...
13
ETS tables are (mostly) ok
!
A small ETS table lookup = 10x function activations
!
Do not use ets:tab2list() inside tig...
14
Do not overuse records
!
selelement() creates a copy of the tuple
!
State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?...
!
Heap is a list of chunks
!
'new heap' is close to its head, 'old heap' - to its tail
!
A GC run takes 10μs on average
!
...
How to tackle GC-related issues
– (Priority 1) Call erlang:garbage_collect() at strategic points
– (Priority 2) For the fa...
17
gen_server vs barebone process
!
Message passing using gen_server:call() is 2x slower
than Pid ! Msg
!
For speedy code ...
18
NIFs: more pain than gain
!
A new principle of Erlang development: do not use NIFs
!
For a small performance boost, NIF...
19
Fast counters
!
32-bit or 64-bit unsigned integer counters with overflow -
trivial in C, not easy in Erlang
!
FIXNUMs ar...
20
Questions?
??? ??
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Performance optimization 101 - Erlang Factory SF 2014
Next
Upcoming SlideShare
Performance optimization 101 - Erlang Factory SF 2014
Next
Download to read offline and view in fullscreen.

7

Share

0.5mln packets per second with Erlang

Download to read offline

0.5mln packets per second with Erlang

  1. 1. 0.5mln packets per second with Erlang Revelations from a real-world project based on Erlang on Xen ErLounge/SF June 6, 2014 Maxim Kharchenko CTO, Cloudozer LLP mk@cloudozer.com
  2. 2. The road map ! Erlang on Xen intro ! LINCX project overview ! Speed-related notes – Arguments are registers – ETS tables are (mostly) ok – Do not overuse records – GC is key to speed – gen_server vs. barebone process – NIFS: more pain than gain – Fast counters ! Q&A
  3. 3. 3 Erlang on Xen a.k.a. LING ! A new Erlang platform that runs without OS ! Conceived in 2009 ! Highly-compatible with Erlang/OTP ! Built from scratch, not a “port” ! Optimized for low startup latency ! Open sourced in 2014 (github.com/cloudozer/ling) ! The public build service is available Go to erlangonxen.org
  4. 4. 4 Zerg demo: zerg.erlangonxen.org
  5. 5. The road map ! Erlang on Xen intro ! LINCX project overview ! Speed-related notes – Arguments are registers – ETS tables are (mostly) ok – Do not overuse records – GC is key to speed – gen_server vs. barebone process – NIFS: more pain than gain – Fast counters ! Q&A
  6. 6. LINCX: project overview ! Started in December, 2013 ! Initial scope = porting LINC-Switch to LING ! High degree of compatibility demonstrated for LING ! Extended scope = fix LINC-Switch fast path ! Beta version of LINCX open sourced on March 3, 2014 ! LINCX runs 100x faster than the old code 6 LINC-Switch is an OpenFlow software switch implemented in ErlangLINC-Switch is an OpenFlow software switch implemented in Erlang For more details go to http://FlowForwarding.org
  7. 7. Raw network interfaces in Erlang * LING adds raw network interfaces: * Raw interface receives whole Ethernet frames * LINCX uses standard gen_tcp:* for the control connection and net_vif:* - for data ports * Raw interfaces support mailbox_limit option - packets get dropped if the mailbox of the receiving process overflows 7 Port  =  net_vif:open(“eth1”,  []), port_command(Port,  <<1,2,3>>), receive {Port,{data,Frame}}  -­‐> ... Port  =  net_vif:open(“eth1”,  [{mailbox_limit,16384}]), ...
  8. 8. Testbed configuration * Test traffic goes between vm1 and vm2 * LINCX runs as a separate Xen domain * Virtual interfaces are bridged in Dom0 8
  9. 9. Processing delay and low-level NIC stats ! LING can measure a processing delay for a packet: ! ling:experimental(processing_delay,  []). ! Processing  delay  statistics: ! Packets:  2000  Delay:  1.342us  +-­‐  0.143  (95%) ! LING can collect low-level stats for a network interface: ! ling:experimental(llstat,  1).  %%  stop/display ! Duration:  4868.6ms ! RX:  interrupts:  69170  (0  kicks  0.0%)  (freq  14207.4/s  period  70.4us) ! RX:  reqs  per  int:  0/0.0/0 ! RX:  tx  buf  freed  per  int:  0/8.5/234 ! TX:  outputs:  1479707  (112263  kicks  7.6)  (freq  303928.8/s  period  3.3us) ! TX:  tx  buf  freed  per  int:  0/0.6/113 ! TX:  rates:  303.9kpps  3622.66Mbps  avg  pkt  size  1489.9B ! TX:  drops:  12392  (freq  2545.3/s  period  392.9us) ! TX:  drop  rates:  2.5kpps  30.26Mbps  avg  pkt  size  1486.0B 9
  10. 10. IXIA confirms 460kpps peak rate ! 1GbE hw NICs/128 byte packets ! IXIA packet generator/analyzer 10
  11. 11. The road map ! Erlang on Xen intro ! LINCX project overview ! Speed-related notes – Arguments are registers – ETS tables are (mostly) ok – Do not overuse records – GC is key to speed – gen_server vs. barebone process – NIFS: more pain than gain – Fast counters ! Q&A
  12. 12. 12 Arguments are registers ! Many arguments do not make a function any slower ! Do not reshuffle arguments: animal(batman  =  Cat,  Dog,  Horse,  Pig,  Cow,  State)  -­‐>   feed(Cat,  Dog,  Horse,  Pig,  Cow,  State); animal(Cat,  deli  =  Dog,  Horse,  Pig,  Cow,  State)  -­‐>   pet(Cat,  Dog,  Horse,  Pig,  Cow,  State); ...   %%  SLOW animal(Cat,  Dog,  Horse,  Pig,  Cow,  State)  -­‐>   feed(Goat,  Cat,  Dog,  Horse,  Pig,  Cow,  State); ...
  13. 13. 13 ETS tables are (mostly) ok ! A small ETS table lookup = 10x function activations ! Do not use ets:tab2list() inside tight loops ! Treat ETS as a database; not a pool of global variables ! 1-2 ETS lookups on the fast path are ok ! Beware that ets:lookup(), etc create a copy of the data on the heap of the caller, similarly to message passing
  14. 14. 14 Do not overuse records ! selelement() creates a copy of the tuple ! State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) copies of the tuple ! Use tuples explicitly in the performance-critical sections to see the heap footprint of the code %%  from  9p.erl mixer({rauth,_,_},  {tauth,_,AFid,_,_},  _)  -­‐>  {write_auth,AFid}; mixer({rauth,_,_},  {tauth,_,AFid,_,_,_},  _)  -­‐>  {write_auth,AFid}; mixer({rwrite,_,_},  _,  initial)  -­‐>  start_attaching; mixer({rerror,_,_},  _,  initial)  -­‐>  auth_failed; mixer({rlerror,_,_},  _,  initial)  -­‐>  auth_failed; mixer({rattach,_,Qid},  {tattach,_,Fid,_,_,AName,_},  initial)  -­‐>                {attach_more,Fid,AName,qid_type(Qid)}; mixer({rclunk,_},  {tclunk,_,Fid},  initial)  -­‐>  {forget,Fid};
  15. 15. ! Heap is a list of chunks ! 'new heap' is close to its head, 'old heap' - to its tail ! A GC run takes 10μs on average ! GC may run 1000s times per second 15 Garbage collection is key to speed HTOPproc_t
  16. 16. How to tackle GC-related issues – (Priority 1) Call erlang:garbage_collect() at strategic points – (Priority 2) For the fastest code avoid GC completely – restart the fast process regularly – spawn(F,  [{suppress_gc,true}]),  %%  LING-­‐only – (Priority 3) Use fullsweep_after option 16
  17. 17. 17 gen_server vs barebone process ! Message passing using gen_server:call() is 2x slower than Pid ! Msg ! For speedy code prefer barebone processes to gen_servers ! Design Principles are about high availability, not high performance
  18. 18. 18 NIFs: more pain than gain ! A new principle of Erlang development: do not use NIFs ! For a small performance boost, NIFs undermine key properties of Erlang: reliability and soft-realtime guarantees ! Most of the time Erlang code can be made as fast as C ! Most of performance problems of Erlang are traceable to NIFs, or external C libraries, which are similar ! Erlang on Xen does not have NIFs and we do not plan to add them
  19. 19. 19 Fast counters ! 32-bit or 64-bit unsigned integer counters with overflow - trivial in C, not easy in Erlang ! FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and 10-100x slower ! Use two variables for a counter? foo(C1,  16#ffffff,  ...)  →   foo(C1+1,  0,  ...); foo(C1,  C2,  ...)  -­‐>   foo(C1,  C2+1,  ...); ... ! Erlang on Xen has a new experimental feature – fast counters: erlang:new_counter(Bits)  -­‐>  Ref erlang:increment_counter(Ref,  Incr) erlang:read_counter(Ref) erlang:release_counter(Ref)
  20. 20. 20 Questions? ??? ??
  • xzimac

    Jun. 10, 2015
  • PaulFisher15

    Nov. 23, 2014
  • ssuser6d53d5

    Nov. 3, 2014
  • williambarnhill

    Oct. 7, 2014
  • nivertech

    Jun. 7, 2014
  • javier_juarez

    Jun. 7, 2014
  • alexeychurkin1

    Jun. 6, 2014

Views

Total views

2,945

On Slideshare

0

From embeds

0

Number of embeds

320

Actions

Downloads

29

Shares

0

Comments

0

Likes

7

×