• Find out about the sFlow instrumentation built into commodity data center network and server infrastructure.
• Understand how sFlow fits into the broader ecosystem of NetFlow, IPFIX, SNMP and DevOps monitoring technologies.
• Case studies demonstrate how sFlow telemetry combined with automation can lower costs, increase performance, and improve security of cloud infrastructure and applications.
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Network visibility and control using industry standard sFlow telemetry
1. Network visibility and control using
industry standard sFlow telemetry
Peter Phaal
InMon Corp.
March, 2016
Twitter: @sFlow
Blog: blog.sflow.com
San Francisco Network Visibility Meetup
6. Controllability and Observability
Basic concept is simple, a stable feedback control system requires:
1. ability to influence all important system states (controllable)
2. ability to monitor all important system states (observable)
7. It’s hard to stay on the road if you can’t see the
road, or keep to the speed limit without a
speedometer
It’s hard to stay on the road or maintain
speed if your brakes, engine or steering fail
Controllability and Observability driving example
Observability
Controllability
States location, speed, direction, ...
Tule fog in California Central Valley
8. Effect of delay on stability
Measurement delay Planning delay
Time
Configuration delayDisturbance Response delay
EffectLoop delay
DDoS launched Identify target, attacker Black hole, mark, re-route? Switch CLI commands Route propagation Traffic dropped
Components of loop delay
e.g. Slow reaction time causes
tired / drunk / distracted
driver to weave, very slow
reaction time and they leave
the road
9. What is sFlow?
“In God we trust. All others bring data.”
Dr. Edwards Deming
11. Open source agents for hosts, hypervisors and applications
Host sFlow project (http://sflow.net) is center of an ecosystem
of related open source projects embedding sFlow in popular
operating systems and applications
Host agent extends network visibility into public / private cloud
13. Simple
- standard structures - densely packed blocks of counters
- extensible (tag, length, value)
- RFC 1832: XDR encoded (big endian, quad-aligned, binary) - simple to encode /
decode
- unicast UDP transport
Minimal configuration
- collector address
- polling interval
Cloud friendly
- flat, two tier architecture: many embedded agents → central “smart” collector
- sFlow agents automatically start sending metrics on startup, automatically discovered
- eliminates complexity of maintaining polling daemons (and associated configurations)
Scaleable push protocol
14. • Counters tell you there is a
problem, but not why.
• Counters summarize
performance by dropping high
cardinality attributes:
- IP addresses
- URLs
- Memcache keys
• Need to be able to efficiently
disaggregate counter by
attributes in order to
understand root cause of
performance problems.
• How do you get this data when
there are millions of
transactions per second?
Counters aren’t enough
Why the spike in traffic?
(100Gbit link carrying 14,000,000 packets/second)
15. • Random sampling is lightweight
• Critical path roughly cost of
maintaining one counter:
if(--skip == 0) sample();
• Sampling is easy to distribute
among modules, threads,
processes without any
synchronization
• Minimal resources required to
capture attributes of sampled
transactions
• Easily identify top keys,
connections, clients, servers, URLs
etc.
• Unbiased results with known
accuracy
Break out traffic by client, server and port
(graph based on samples from100Gbit link carrying 14,000,000 packets/second)
sFlow also exports random samples
16. Integrated data model
Packet Header
Source Destination
TCP/UDP Socket TCP/UDP Socket
MAC Address MAC Address
Sampled Packet Headers
+
Forwarding State
I/F Counters
NETWORK
HOST
CPU
Memory
I/O
Adapter MACs
APPLICATION
Sampled Transactions
Transaction Counters
TCP/UDP Socket
Independent agents sFlow analyzer joins data for integrated view
18. Picking the right tools
“This is the Unix philosophy: Write programs that do one
thing and do it well. Write programs to work together.”
Doug McIlroy
19. packets
decode hash sendflow cache flushsample
Flow
Records
flow cache embedded on switchswitch
NetFlow
IPFIX
…
decode hash sendflow cache flush
Flow
Records
packets
send
polli/f counters
sample
multiple switches export sFlow
packets
send
polli/f counters
sample
...
centralized software flow cache
switch
switch
JSON/REST
NetFlow
IPFIX
…
• Reduce ASIC cost / complexity
• Fast response (data not sitting on switch)
• Centralized, network-wide visibility
• Increase flexibility → software defined analytics
Move flow cache from ASIC to external software
Scale-out alternative to SNMP polling
Traffic analytics with sFlow
20. sFlow-RT.com analytics engine
• Low latency flow analytics for real-time control applications
• Disaggregates flow cache from database. Choose external
database(s) for history (InfluxDB, Logstash, etc.)
• Programmable analytics pipeline through open APIs
21. RESTful API for defining flows
http://blog.sflow.com/2013/08/restflow.html
curl -H "Content-Type:application/json" -X PUT —data
'{"keys":"ipsource,ipdestination,tcpsourceport,tcpdestinationport",
"value":"bytes", "ipfixCollectors":["10.0.0.1"]}'
http://127.0.0.1:8008/flow/tcp/json
curl -H "Content-Type:application/json" -X PUT --data
'{"keys":"ipdestination,icmpunreachableport", "value":"frames"}'
http://127.0.0.1:8008/flow/unreachableport/json
• Instantly enables network wide monitoring of flows
• All switches, all ports, including hosts and virtual switches
• Contrast with task of re-configuring Flexible Netflow/IPFIX caches on
every switch in multi-vendor network. How many simultaneous flow
definitions are allowed? What key / value combinations are allowed?
curl -H "Content-Type:application/json" -X PUT --data
'{"value":"frames"}'
http://127.0.0.1:8008/flow/frames/json
22. InMon sFlow-RT
active timeout active timeout
NetFlow
Open
vSwitch
SolarWinds Real-Time NetFlow Analyzer
• sFlow does not use flow cache, so realtime charts more accurately reflect traffic trend
• NetFlow spikes caused by flow cache active-timeout for long running connections
Rapid detection of large flows
Flow cache active timeout delays large flow detection,
limits value of signal for real-time control applications
23. Counters and packet samples
http://blog.sflow.com/2013/02/measurement-delay-counters-vs-packet.html
• Packet samples give a fast signal that operates at scale
• Counters are maintained in hardware and provide precise traffic totals.
• Counters capture rare events, like packet discards, that can severely
impact performance.
• Counters report important link state information, like link speed, LAG
group membership etc.
25. Data models and transports
sFlow SNMP
NetFlow
version 5
OpenConfig
Telemetry IPFIX syslog
Model
standard
measurements
published by
sFlow.org,
Dataplane
focus: based on
IEEE, IETF,
APIs (MIB-2,
LAG-MIB,
libvirt, JMX, …)
standard
MIBs
defined
by IETF
standard
tcp / udp /
icmp flow
record
defined by
Cisco
Telemetry defined
as part of YANG
configuration
models by
OpenConfig.org
Control plane
focus: BGP,
MPLS, VLAN, etc.
Encoding
XDR
(RFC 4506)
ASN1
(IETF)
NetFlow
(Cisco)
protobufs,
JSON,
NetConf
IPFIX
(IETF)
Syslog
(RFC 5424)
Transport UDP UDP UDP UDP, HTTP
SCTP,
TCP,
UDP
UDP
Mode Push Pull Push Push Push Push
Easy to combine multiple data sources if you disaggregate tool chain
e.g. separate agents from collectors, feed data from all sources into InfluxDB / Logstash
27. Network visibility for DevOps tools
• Streaming filtering and summarization reduces data volume
and increases scaleability of backend tools
• Streaming flow analytics to generate application metrics
28. Feedback control of cloud infrastructure
“You can’t control what you can’t measure”
Tom DeMarco
29. Cloud depends on network
• Server costs (both capex and power consumption) far exceed networking costs in the data center.
• Network congestion caused server to wait, resulting in poor utilization of cloud infrastructure.
• Optimize network to increase data center efficiency
http://perspectives.mvdirona.com/2010/09/overall-data-center-costs/
“Typically the resource that is most scarce is the network.”
Amin Vahdat, Google, ONS2015 Keynote
http://blog.sflow.com/2015/06/optimizing-software-defined-data-center.html
35. ECMP monitoring challenge
• large number of links, 12
x 10G links
• all links need to be
monitored continuously,
180G total bandwidth
• real-time detection of
congested links
• real-time detection of
Elephant flows
http://blog.sflow.com/2015/03/ecmp-visibility-with-cumulus-linux.html
36. Fabric level performance metrics
• Fabric View application runs on
sFlow-RT
• Downloadable from sFlow-RT.com,
includes captured data set from 4
node 10G ECMP fabric
• Elephant collisions on spine links
occur frequently
• Collisions halve throughput
• Collisions cause packet discards
http://blog.sflow.com/2015/10/fabric-view.html
47. Open Networking Summit SDN Idol winning solution
Real-time SDN Analytics for DDoS mitigation
48. DDoS Mitigation Market Opportunity
DDoS Attack Megatrends [Reference 1]
• High bandwidth, volumetric infrastructure layer (Layer 3 & 4) attacks increased
approximately 30 percent
• DDoS attack volume also increased month-to-month in 2013, with 10 out of
12 months showing higher attack volume compared to 2012
• Average DDoS attack sizes continued to increase – many over 100 Gbps, the
largest peaking at 179 Gbps
DDoS Mitigation Market Growth
• $870M market by 2017, 18.2% CAGR – Source: IDC:Worldwide DDoS Prevention
Products and Services 2013-2017 Forecast
• $1049M market by 2017, 25% CAGR – Source: Infonetics: Global DDoS Prevention
Appliances 2012-2017 Forecast
Reference 1: Top DDoS Attack Trends http://www.itbriefcase.net/top-ddos-attack-trends-for-2013
49. DDoS Mitigation Use Case (1)
ISP 1
ISP 2
ISP N
• ISP/IX is uniquely positioned to protect customers from DDoS flood attacks
• New revenue from DDoS mitigation service + differentiates ISP/IX service
Attacker
User Prevent attack from
overwhelming customer
access link
Filter attack traffic in
real-time
Customer network
DDoS target host
Attack on single host can take out entire
customer data center. Customer cannot
mitigate flood attack without upstream help
ISP / IX
ISP/IX Market Segment
50. Customer
portal
DDoS Mitigation Service
Web UI + RESTful programmatic API
• real-time TopN analytics
• programmable filtering of traffic
• set thresholds + automatic blocking
Real-time sFlow visibility, Hybrid OpenFlow Control capability of Brocade switches/routers
REST API
InMonsFlow-RT
REST API
OpenFlowController
DDoS Mitigation
Application
Customer
Network
Internet
1. Flood
attack
overloads
customer
port
2. Attack maps to large flows
[Ref. 2]. sFlow-RT detects
attack (maps to large flows)
and characterizes attack
(srcip, dstip, protocol, ports,
etc.)
3. mitigation application takes signature, applies
customer policy, selects optimal control and push
OpenFlow rule(s) to switch(es)
5. OpenFlow rule(s)
applied to switch
forwarding path to drop /
mark traffic and protect link
HTTPS HTTPS
4. Controller pushes
OpenFlow rule(s) to
switch(es)
OpenFlow 1.3 Match Fields
line rate filtering using Brocade switches
Reference 2: IETF I2RS Working Group Draft - https://ietf.org/doc/draft-krishnan-i2rs-large-flow-use-case/
57. Comments
• sFlow instrumentation is widely available in switches
http://sflow.org/products/network.php
• Host sFlow (sFlow.net) agent extends visibility into
servers (works with libpcap, iptables, Open vSwitch to
efficiently sample packets in host data plane)
• Common data model ensures strong interoperability
across sFlow data sources
• Streaming counter and packet telemetry across network,
compute and application tiers makes data center
observable
• Observability makes it possible to apply feedback controls
59. Host sFlow monitoring of Linux datapath
Technology Reference
Adapter, bridge,
macvlan, ipvlan
Berkeley Packet
Filter (BPF) sampling
function
http://blog.sflow.com/
2016/02/linux-bridge-
macvlan-ipvlan-adapters.html
Open vSwitch
Kernel datapath
has sFlow support
http://openvswitch.org/
support/config-
cookbooks/sflow/
Linux Firewall
iptables statistic
module random
function with ulog
http://blog.sflow.com/
2010/12/ulog.html
Top of Rack
Switch
ASIC provides
wirespeed monitoring
of attached servers
http://blog.sflow.com/
2010/04/hybrid-server-
monitoring.html
Efficient monitoring of high traffic production workloads
60. Open vSwitch Fall Conference
New OVS instrumentation features aimed at
real-time monitoring of virtual networks
69. Hybrid OpenFlow ECMP testbed
http://blog.sflow.com/2015/01/hybrid-openflow-ecmp-testbed.html
http://mininet.org/
• Simulated ECMP
network for developing
visibility and control
applications
• sFlow support in Open
vSwitch
• OpenFlow for control
70. The sFlow Standard: Scalable, Unified Monitoring of
Networks, Systems and Applications
2012 Velocity Conference