2. Dr Neil Davies
Co-founder, Predictable Network Solutions Ltd
Peter Thompson
CTO, Predictable Network Solutions Ltd
Martin Geddes
Founder, Martin Geddes Consulting Ltd
PREDICTABLE
NETWORK
SOLUTIONS
3. The only ex ante network performance
engineering company in the world.
Consultancy on the future of
telecoms and the Internet.
PREDICTABLE
NETWORK
SOLUTIONS
4. Context for this presentation
We are all in the business of “information translocation”
The timely movement of information from
one computational process to another
The value lies in delivering application outcomes
That people will pay for
You are reading this because you are interested in
delivering successful outcomes
And understanding the causes of failure, so they can be mitigated
You may be working in a culture of deflecting
the attribution of blame
We’d like to help you turn away from the path to the Dark Side
5. What affects the timeliness?
• The timeliness of application outcomes is dependent on the
end-to-end loss and delay characteristics of the
translocation
• We call this end-to-end property ∆Q
– ∆Q applies in each direction – not just the round trip
– These characteristics need to be suitably bounded
• ∆Q depends on the offered load
– “Bandwidth” is an aspect of the relationship between offered
load and ∆Q
This presentation is about measuring ∆Q
– and the benefits that approach brings
6. Good measurement is
NOT about averages
• The average number of legs of a Swedish person is 1.9
– Now find me one!
• Measuring average throughput on a 1Gb link over 10
mins is like measuring the traffic on the M5 motorway
over two years
– No indicator of my likely travel experience
• Need to know the instantaneous properties
– The ∆Q the “next packet” is going to get
• It is all about the probability distribution of quality
attenuation
– This is what determines timeliness of application outcomes
7. One-point measures
• This is the typical information captured by equipment
today
– Counters (e.g. packets passed, packet sizes, packets
dropped)
– Sampled over a period
• Does not capture ∆Q
– Not end-to-end
• Multiple one-point measures don’t help
• Creates an equipment-centric view
– Focuses on the equipment, not the service to the
customers
– Leads to focus on capacity, and ignores schedulability
8. Multipoint measures
• Measure a value between different points
– Not just counting things
• Same “information translocation” at various
points
– Measuring the dynamics of the flow
• Isolates issues, in both space and time
– Excellent diagnostic power
• Leads to a focus on schedulability and trading
– Which in turn focuses on the outcomes for the
customer
9. Different measurement approaches
Average Instantaneous
Single
Point
Offered Load
and Utilisation
(mean values only)
Limited predictive power
Arrival Patterns
Temporal predictive power,
localised assurance
(compliance with
arrival pattern policy)
Multiple
Point
Delay and Loss
(mean and variance)
Spatial predictive power
Temporal and spatial
predictive power
Assurance of both arrival and
service (demand and supply) –
represents all that can be known
about a system (by observation)
PLUS
PLUS
15. How to read the information
• Different views tell different stories
• We’ll see some of those stories in the
following slides
• The focus on V is because that is where the
issues of schedulability manifest themselves
16. Key to following charts
Two point
measures (by time)
GSV view
(by packet)
V (by time) V (by packet size)
V cumulative
distribution
function (main)
V cumulative
distribution
function (tail)
17. E to A direction (user experience)Return Transit (run dd0a2310-d235-495b-8d2f-a4dc
0
0.05
0.1
0.15
0.2
0 50 100 150 200 250
delay(s)
run time (s)
Observed Delay against Experiment Run Time
E->A
0
0.05
0.1
0.15
0.2
0
delay(s)
0.02
0.04
0.06
0.08
0.1
0.12
0.14
delay(s)
Observed Delay Variability (V) against Experiment Run Time
E->A
0.02
0.04
0.06
0.08
0.1
0.12
0.14
delay(s)
Note the delay spike during the test run @ approx 60 seconds in
How can be begin to analyse this performance issue?
18. E -> A (by packet)
This ‘spike’ doesn’t appear to be related to a particular packet size
(note ‘striations’ in the S value is an artefact of 3GPP scheduling)
19. E -> A (Dynamic response)
Removing G and S influences clearly highlights
the magnitude of the contention issue
21. Spatial Isolation (2)
It is occurring
between C and B
NOTE: this is the effect that we are measuring
– NOT the cause (which in this case was not the access
network but elsewhere)
Armed with this information, we can begin to analyse root
causes (e.g. what is over-driving this link?)
24. Summary
• Multipoint distribution based measurement gives
access to all the information available through
observation
– “observation” is key – independent of equipment
– Captures the influence of technology etc
• G,S & V gives you a way of extracting both
temporal and spatial details
• Becomes extremely powerful when combined
with analysis
– E.g. you have a model of what V should be, or what G
and S should be given the network layout
25. Upcoming workshops:
Sustainable Public Service Networks
London, 19th September 2013
Fundamentals of
Network Performance
London, 20th September 2013
www.sustainablebroadband.com
PREDICTABLE
NETWORK
SOLUTIONS
There is a branch of mathematics called Large Deviation Theory that does have something to say about the predictive power of averages. And what it says is not very comforting – in the sense what it means in terms as a predictor of underlying hazards and risks – i.e. it is pretty bad. Capturing distribution gives the ability to assure that arrival patterns are within specification (see QTA’s later)Multipoint measurement gives some level of spatial identification – but suffers from the same issues as A, in that it remains a bad predictor of the hazards and risks.This is the measurement nirvana – it turns out that multipoint instantaneous observation makes available all the information that is possible by observation. In a well designed system this, by the principle of observational bisimularity, is the ultimate evidence of correct operation – including its performance aspects.Although there are people who have prided themselves on capturing (average) data over smaller (and smaller) timescales – the real issue is the number of events that occur in those timeslots. This is where the M5 (major holiday highway in UK) analogy comes in – the number of possible “events” 10 minutes on a 1Gb Ethernet (i.e. packets) is broadly equivalent to the number of events (cars on a three lane highway) in two year. For “averaging” to make sense you would need to be generating them over 20ms to 250ms intervals – no one can afford that.
Note that S is not a number, it is a function from packet size to delay. S is not necessarily a simple line, it may have a more complex structure depending on media quantisation (eg ATM cells, WiFi) and bearer allocation choices (eg 3GPP).