SlideShare a Scribd company logo
1 of 22
Download to read offline
0.5 mln packets per second with Erlang 
Nov 22, 2014 
Maxim Kharchenko 
CTO/Cloudozer LLP
The road map 
• Erlang on Xen intro 
• LINCX project overview 
• Speed-related notes 
– Arguments are registers 
– ETS tables are (mostly) ok 
– Do not overuse records 
– GC is key to speed 
– gen_server vs. barebone process 
– NIFS: more pain than gain 
– Fast counters 
– Static compiler? 
• Q&A
Erlang on Xen a.k.a. LING 
• A new Erlang platform that runs without OS 
• Conceived in 2009 
• Highly-compatible with Erlang/OTP 
• Built from scratch, not a “port” 
• Optimized for low startup latency 
• Open sourced in 2014 (github.com/cloudozer/ling) 
• Local and remote builds 
Go to erlangonxen.org
Zerg demo: zerg.erlangonxen.org
The road map 
• Erlang on Xen intro 
• LINCX project overview 
• Speed-related notes 
– Arguments are registers 
– ETS tables are (mostly) ok 
– Do not overuse records 
– GC is key to speed 
– gen_server vs. barebone process 
– NIFS: more pain than gain 
– Fast counters 
– Static compiler? 
• Q&A
LINCX: project overview 
• Started in December, 2013 
• Initial scope = porting LINC-Switch to LING 
• High degree of compatibility demonstrated for LING 
• Extended scope = fix LINC-Switch fast path 
• Beta version of LINCX open sourced on March 3, 2014 
• LINCX runs 100x faster than the old code 
LINCX repository: 
github.com/FlowForwarding/lincx
Raw network interfaces in Erlang 
• LING adds raw network interfaces: 
Port = net_vif:open(“eth1”, []), 
port_command(Port, <<1,2,3>>), 
receive 
{Port,{data,Frame}} > ‐ 
... 
• Raw interface receives whole Ethernet frames 
• LINCX uses standard gen_tcp for the control connection and net_vif - 
for data ports 
• Raw interfaces support mailbox_limit option - packets get dropped if 
the mailbox of the receiving process overflows: 
Port = net_vif:open(“eth1”, [{mailbox_limit,16384}]), 
...
Testbed configuration 
* Test traffic goes between vm1 and vm2 
* LINCX runs as a separate Xen domain 
* Virtual interfaces are bridged in Dom0
IXIA confirms 460kpps peak rate 
• 1GbE hw NICs/128 byte packets 
• IXIA packet generator/analyzer
Processing delay and low-level stats 
• LING can measure a processing delay for a packet: 
1> ling:experimental(processing_delay, []). 
Processing delay statistics: 
Packets: 2000 Delay: 1.342us +‐ 0.143 (95%) 
• LING can collect low-level stats for a network interface: 
1> ling:experimental(llstat, 1). %% stop/display 
Duration: 4868.6ms 
RX: interrupts: 69170 (0 kicks 0.0%) (freq 14207.4/s period 70.4us) 
RX: reqs per int: 0/0.0/0 
RX: tx buf freed per int: 0/8.5/234 
TX: outputs: 1479707 (112263 kicks 7.6) (freq 303928.8/s period 3.3us) 
TX: tx buf freed per int: 0/0.6/113 
TX: rates: 303.9kpps 3622.66Mbps avg pkt size 1489.9B 
TX: drops: 12392 (freq 2545.3/s period 392.9us) 
TX: drop rates: 2.5kpps 30.26Mbps avg pkt size 1486.0B
The road map 
• Erlang on Xen intro 
• LINCX project overview 
• Speed-related notes 
– Arguments are registers 
– ETS tables are (mostly) ok 
– Do not overuse records 
– GC is key to speed 
– gen_server vs. barebone process 
– NIFS: more pain than gain 
– Fast counters 
– Static compiler? 
• Q&A
Arguments are registers 
animal(batman = Cat, Dog, Horse, Pig, Cow, State) > ‐ 
feed(Cat, Dog, Horse, Pig, Cow, State); 
animal(Cat, deli = Dog, Horse, Pig, Cow, State) > ‐ 
pet(Cat, Dog, Horse, Pig, Cow, State); 
... 
• Many arguments do not make a function any slower 
• But do not reshuffle arguments: 
%% SLOW 
animal(batman = Cat, Dog, Horse, Pig, Cow, State) > ‐ 
feed(Goat, Cat, Dog, Horse, Pig, Cow, State); 
...
ETS tables are (mostly) ok 
• A small ETS table lookup = 10x function activations 
• Do not use ets:tab2list() inside tight loops 
• Treat ETS as a database; not a pool of global variables 
• 1-2 ETS lookups on the fast path are ok 
• Beware that ets:lookup(), etc create a copy of the data on the heap of 
the caller, similarly to message passing
Do not overuse records 
• selelement() creates a copy of the tuple 
• State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) 
copies of the tuple 
• Use tuples explicitly in performance-critical sections to control 
the heap footprint of the code: 
%% from 9p.erl 
mixer({rauth,_,_}, {tauth,_,Afid,_,_}, _) ‐> {write_auth,AFid}; 
mixer({rauth,_,_}, {tauth,_,Afid,_,_,_}, _) ‐> {write_auth,AFid}; 
mixer({rwrite,_,_}, _, initial) ‐> start_attaching; 
mixer({rerror,_,_}, _, initial) ‐> auth_failed; 
mixer({rlerror,_,_}, _, initial) ‐> auth_failed; 
mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,Aname,_}, initial) > ‐ 
{attach_more,Fid,AName,qid_type(Qid)}; 
mixer({rclunk,_}, {tclunk,_,Fid}, initial) ‐> {forget,Fid};
Garbage collection is key to speed 
• Heap is a list of chunks 
• 'new heap' is close to its head, 'old heap' - to its tail 
proc_t 
• A GC run takes 10μs on average 
• GC may run 1000s times per second 
HTOP 
...
How to tackle GC-related issues 
• (Priority 1) Call erlang:garbage_collect() at strategic points 
• (Priority 2) For the fastest code avoid GC completely – restart the fast 
process regularly: 
spawn(F, [{suppress_gc,true}]), %% LING ‐only 
• (Priority 3) Use fullsweep_after option
gen_server vs barebone process 
• Message passing using gen_server:call() is 2x slower than Pid ! Msg 
• For speedy code prefer barebone processes to gen_servers 
• Design Principles are about high availability, not high performance
NIFs: more pain than gain 
• A new principle of Erlang development: do not use NIFs 
• For a small performance boost, NIFs undermine key properties of 
Erlang: reliability and soft-realtime guarantees 
• Most of the time Erlang code can be made as fast as C 
• Most of performance problems of Erlang are traceable to NIFs, or 
external C libraries, which are similar 
• Erlang on Xen does not have NIFs and we do not plan to add them
Fast counters 
• 32-bit or 64-bit unsigned integer counters with overflow - trivial in C, 
not easy in Erlang 
• FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and are 
10-100x slower 
• Use two variables for a counter? 
foo(C1, 16#ffffff, ...) -> foo(C1+1, 0, ...); 
foo(C1, C2, ...) ‐ > foo(C1, C2+1, ...); 
... 
• LING has a new experimental feature – fast counters: 
erlang:new_counter(Bits) ‐ > Ref 
erlang:increment_counter(Ref, Incr) 
erlang:read_counter(Ref) 
erlang:release_counter(Ref)
Future: static compiler for Erlang 
• Scalars and algebraic types 
• Structural types only – no nominal types 
• Target compiler efficiency not static type checking 
• A middle ground between: 
• “Type is a first class citizen” (Haskell) 
• “A single type is good enough” (Python, Erlang)
Future: static compiler for Erlang - 2 
• Challenges: 
• Pattern matching compilation 
• Type inference for recursive types 
y = {(unit | y), x, (unit | y)} 
y = nil | {x, y} 
• Work started in 2013 
• Currently the compiler is at the proof-of-concept stage
Questions 
?? 
? 
e-mail: maxim.kharchenko@gmail.com

More Related Content

What's hot

Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Forward
 

What's hot (20)

K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Developing High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & GoDeveloping High Performance Application with Aerospike & Go
Developing High Performance Application with Aerospike & Go
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
 
Recursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreRecursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, Bangalore
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
 
Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData
Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData
Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
 
Profiling with Devel::NYTProf
Profiling with Devel::NYTProfProfiling with Devel::NYTProf
Profiling with Devel::NYTProf
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
CNIT 127: Ch 2: Stack overflows on Linux
CNIT 127: Ch 2: Stack overflows on LinuxCNIT 127: Ch 2: Stack overflows on Linux
CNIT 127: Ch 2: Stack overflows on Linux
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...
How to Introduce Telemetry Streaming (gNMI) in Your Network with SNMP with Te...
 
Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.Internship final report@Treasure Data Inc.
Internship final report@Treasure Data Inc.
 
CNIT 127: Ch 8: Windows overflows (Part 2)
CNIT 127: Ch 8: Windows overflows (Part 2)CNIT 127: Ch 8: Windows overflows (Part 2)
CNIT 127: Ch 8: Windows overflows (Part 2)
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance Tips
 

Similar to 0.5mln packets per second with Erlang

Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
fnothaft
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
Data Con LA
 
10 instruction sets characteristics
10 instruction sets characteristics10 instruction sets characteristics
10 instruction sets characteristics
Sher Shah Merkhel
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4
Open Networking Summits
 
Erlang Message Passing Concurrency, For The Win
Erlang  Message  Passing  Concurrency,  For  The  WinErlang  Message  Passing  Concurrency,  For  The  Win
Erlang Message Passing Concurrency, For The Win
l xf
 

Similar to 0.5mln packets per second with Erlang (20)

Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Protocol Independence
Protocol IndependenceProtocol Independence
Protocol Independence
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
CNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginCNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you Begin
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
CNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you BeginCNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you Begin
 
Intro. to static analysis
Intro. to static analysisIntro. to static analysis
Intro. to static analysis
 
Performance
PerformancePerformance
Performance
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
10 instruction sets characteristics
10 instruction sets characteristics10 instruction sets characteristics
10 instruction sets characteristics
 
10 instruction sets characteristics
10 instruction sets characteristics10 instruction sets characteristics
10 instruction sets characteristics
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
Erlang Message Passing Concurrency, For The Win
Erlang  Message  Passing  Concurrency,  For  The  WinErlang  Message  Passing  Concurrency,  For  The  Win
Erlang Message Passing Concurrency, For The Win
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
Bioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekingeBioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekinge
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 

0.5mln packets per second with Erlang

  • 1. 0.5 mln packets per second with Erlang Nov 22, 2014 Maxim Kharchenko CTO/Cloudozer LLP
  • 2. The road map • Erlang on Xen intro • LINCX project overview • Speed-related notes – Arguments are registers – ETS tables are (mostly) ok – Do not overuse records – GC is key to speed – gen_server vs. barebone process – NIFS: more pain than gain – Fast counters – Static compiler? • Q&A
  • 3. Erlang on Xen a.k.a. LING • A new Erlang platform that runs without OS • Conceived in 2009 • Highly-compatible with Erlang/OTP • Built from scratch, not a “port” • Optimized for low startup latency • Open sourced in 2014 (github.com/cloudozer/ling) • Local and remote builds Go to erlangonxen.org
  • 5. The road map • Erlang on Xen intro • LINCX project overview • Speed-related notes – Arguments are registers – ETS tables are (mostly) ok – Do not overuse records – GC is key to speed – gen_server vs. barebone process – NIFS: more pain than gain – Fast counters – Static compiler? • Q&A
  • 6. LINCX: project overview • Started in December, 2013 • Initial scope = porting LINC-Switch to LING • High degree of compatibility demonstrated for LING • Extended scope = fix LINC-Switch fast path • Beta version of LINCX open sourced on March 3, 2014 • LINCX runs 100x faster than the old code LINCX repository: github.com/FlowForwarding/lincx
  • 7. Raw network interfaces in Erlang • LING adds raw network interfaces: Port = net_vif:open(“eth1”, []), port_command(Port, <<1,2,3>>), receive {Port,{data,Frame}} > ‐ ... • Raw interface receives whole Ethernet frames • LINCX uses standard gen_tcp for the control connection and net_vif - for data ports • Raw interfaces support mailbox_limit option - packets get dropped if the mailbox of the receiving process overflows: Port = net_vif:open(“eth1”, [{mailbox_limit,16384}]), ...
  • 8. Testbed configuration * Test traffic goes between vm1 and vm2 * LINCX runs as a separate Xen domain * Virtual interfaces are bridged in Dom0
  • 9. IXIA confirms 460kpps peak rate • 1GbE hw NICs/128 byte packets • IXIA packet generator/analyzer
  • 10. Processing delay and low-level stats • LING can measure a processing delay for a packet: 1> ling:experimental(processing_delay, []). Processing delay statistics: Packets: 2000 Delay: 1.342us +‐ 0.143 (95%) • LING can collect low-level stats for a network interface: 1> ling:experimental(llstat, 1). %% stop/display Duration: 4868.6ms RX: interrupts: 69170 (0 kicks 0.0%) (freq 14207.4/s period 70.4us) RX: reqs per int: 0/0.0/0 RX: tx buf freed per int: 0/8.5/234 TX: outputs: 1479707 (112263 kicks 7.6) (freq 303928.8/s period 3.3us) TX: tx buf freed per int: 0/0.6/113 TX: rates: 303.9kpps 3622.66Mbps avg pkt size 1489.9B TX: drops: 12392 (freq 2545.3/s period 392.9us) TX: drop rates: 2.5kpps 30.26Mbps avg pkt size 1486.0B
  • 11. The road map • Erlang on Xen intro • LINCX project overview • Speed-related notes – Arguments are registers – ETS tables are (mostly) ok – Do not overuse records – GC is key to speed – gen_server vs. barebone process – NIFS: more pain than gain – Fast counters – Static compiler? • Q&A
  • 12. Arguments are registers animal(batman = Cat, Dog, Horse, Pig, Cow, State) > ‐ feed(Cat, Dog, Horse, Pig, Cow, State); animal(Cat, deli = Dog, Horse, Pig, Cow, State) > ‐ pet(Cat, Dog, Horse, Pig, Cow, State); ... • Many arguments do not make a function any slower • But do not reshuffle arguments: %% SLOW animal(batman = Cat, Dog, Horse, Pig, Cow, State) > ‐ feed(Goat, Cat, Dog, Horse, Pig, Cow, State); ...
  • 13. ETS tables are (mostly) ok • A small ETS table lookup = 10x function activations • Do not use ets:tab2list() inside tight loops • Treat ETS as a database; not a pool of global variables • 1-2 ETS lookups on the fast path are ok • Beware that ets:lookup(), etc create a copy of the data on the heap of the caller, similarly to message passing
  • 14. Do not overuse records • selelement() creates a copy of the tuple • State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) copies of the tuple • Use tuples explicitly in performance-critical sections to control the heap footprint of the code: %% from 9p.erl mixer({rauth,_,_}, {tauth,_,Afid,_,_}, _) ‐> {write_auth,AFid}; mixer({rauth,_,_}, {tauth,_,Afid,_,_,_}, _) ‐> {write_auth,AFid}; mixer({rwrite,_,_}, _, initial) ‐> start_attaching; mixer({rerror,_,_}, _, initial) ‐> auth_failed; mixer({rlerror,_,_}, _, initial) ‐> auth_failed; mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,Aname,_}, initial) > ‐ {attach_more,Fid,AName,qid_type(Qid)}; mixer({rclunk,_}, {tclunk,_,Fid}, initial) ‐> {forget,Fid};
  • 15. Garbage collection is key to speed • Heap is a list of chunks • 'new heap' is close to its head, 'old heap' - to its tail proc_t • A GC run takes 10μs on average • GC may run 1000s times per second HTOP ...
  • 16. How to tackle GC-related issues • (Priority 1) Call erlang:garbage_collect() at strategic points • (Priority 2) For the fastest code avoid GC completely – restart the fast process regularly: spawn(F, [{suppress_gc,true}]), %% LING ‐only • (Priority 3) Use fullsweep_after option
  • 17. gen_server vs barebone process • Message passing using gen_server:call() is 2x slower than Pid ! Msg • For speedy code prefer barebone processes to gen_servers • Design Principles are about high availability, not high performance
  • 18. NIFs: more pain than gain • A new principle of Erlang development: do not use NIFs • For a small performance boost, NIFs undermine key properties of Erlang: reliability and soft-realtime guarantees • Most of the time Erlang code can be made as fast as C • Most of performance problems of Erlang are traceable to NIFs, or external C libraries, which are similar • Erlang on Xen does not have NIFs and we do not plan to add them
  • 19. Fast counters • 32-bit or 64-bit unsigned integer counters with overflow - trivial in C, not easy in Erlang • FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and are 10-100x slower • Use two variables for a counter? foo(C1, 16#ffffff, ...) -> foo(C1+1, 0, ...); foo(C1, C2, ...) ‐ > foo(C1, C2+1, ...); ... • LING has a new experimental feature – fast counters: erlang:new_counter(Bits) ‐ > Ref erlang:increment_counter(Ref, Incr) erlang:read_counter(Ref) erlang:release_counter(Ref)
  • 20. Future: static compiler for Erlang • Scalars and algebraic types • Structural types only – no nominal types • Target compiler efficiency not static type checking • A middle ground between: • “Type is a first class citizen” (Haskell) • “A single type is good enough” (Python, Erlang)
  • 21. Future: static compiler for Erlang - 2 • Challenges: • Pattern matching compilation • Type inference for recursive types y = {(unit | y), x, (unit | y)} y = nil | {x, y} • Work started in 2013 • Currently the compiler is at the proof-of-concept stage
  • 22. Questions ?? ? e-mail: maxim.kharchenko@gmail.com