SlideShare a Scribd company logo
1 of 73
Download to read offline
A Melange of Methods for Manipulating Monitored
Data
Converging on Consistency
Neil Gunther @DrQz
en.wikipedia.org/wiki/Neil_J._Gunther
Performance Dynamics
Monitorama PDX
May 6, 2014
SM
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 1 / 52
Introductions
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 2 / 52
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013
Asiana Airlines Flight 214 landing arse-backwards
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013
Asiana Airlines Flight 214 landing arse-backwards (sans tail)
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
“Asiana pilots appear to be overly reliant on instrument-guided landings and lack the
training to touch down manually.” —SFO Commissioner Eleanor Johns
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 4 / 52
A Message from Your Sponsors
Don’t be too reliant on your instruments (strip charts, colored dials, shiny things)
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 5 / 52
Consistency
1 It’s not about pretty pictures
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
4 Data are usually trying to tell you something
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
4 Data are usually trying to tell you something
5 Your interpretation has to be consistent with other data
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
4 Data are usually trying to tell you something
5 Your interpretation has to be consistent with other data
6 Your interpretation has to be consistent with other information
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
4 Data are usually trying to tell you something
5 Your interpretation has to be consistent with other data
6 Your interpretation has to be consistent with other information
This talk is about
Converging on consistency by example
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
The Greatest Scatter Plot
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power Laws
Zipf’s Law of Words
Database Query Times
Eleventh Hour Spikes
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 7 / 52
The Greatest Scatter Plot
The Greatest Scatter Plot
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 8 / 52
The Greatest Scatter Plot
Goggle up! Science ahead...
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 9 / 52
The Greatest Scatter Plot
Some Monitored Data
5 10 15 20
0.00.51.01.52.0
Time
Metric1
5 10 15 20
-2002006001000
Time
Metric2
Two time series, two metrics: Metric 1 and Metric 2
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 10 / 52
The Greatest Scatter Plot
Scatter Plot
0.0 0.5 1.0 1.5 2.0
05001000
Metric 1
Metric2
Are Metric 1 and Metric 2 related in any way?
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 11 / 52
The Greatest Scatter Plot
Linear Regression
0.0 0.5 1.0 1.5 2.0
05001000
Metric 1
Metric2
LSQ fit: Metric2 = 423.94 Metric1 and R2
= 0.82
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 12 / 52
The Greatest Scatter Plot
This is Not the End
This is just the beginning
Need to reach consistency
1 Is the linear fit still a reasonable choice?
2 What is the meaning of the slope ?
3 Willing to extrapolate this model into the future?
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 13 / 52
The Greatest Scatter Plot
The most important scatter plot in history (1929)
le on the expanding universe appeared in PNAS in 1929 [Hubble, E. P. (1929) Proc. Natl. Acad. Sci. USA 15,
that a galaxy’s distance is proportional to its redshift, is so well known and so deeply embedded into the
ough the Hubble diagram, the Hubble constant, Hubble’s Law, and the Hubble time, that the article itself
hough Hubble’s distances have a large systematic error, Hubble’s velocities come chiefly from Vesto
erpretation in terms of the de Sitter effect is out of the mainstream of modern cosmology, this article
ation of the expanding, evolving, and accelerating universe that engages today’s burgeoning field of
Edwin Hub-
‘‘A relation
and radial
tra-galactic
g point in un-
In this brief
e evidence for
es in 20th cen-
g universe.
es recede
nd more dis-
idly in pro-
His graph of
Fig. 1) is the
he equation
t, velocity ϭ
s Law; the
ubble con-
Hubble time.
of cosmic
this is the
the scientific
an expanding
lt is so impor-
ant reference,
eponymous
bble’s aston-
ridge, luminous matter reveals the pres- of acceleration set in are the route to
Fig. 1. Velocity–distance relation among extra-galactic nebulae. Radial velocities, corrected for solar
motion (but labeled in the wrong units), are plotted against distances estimated from involved stars and
mean luminosities of nebulae in a cluster. The black discs and full line represent the solution for solar
motion by using the nebulae individually; the circles and broken line represent the solution combining the
nebulae into groups; the cross represents the mean velocity corresponding to the mean distance of 22
nebulae whose distances could not be estimated individually. [Reproduced with permission from ref. 1
(Copyright 1929, The Huntington Library, Art Collections and Botanical Gardens).]
Metric 1 (x-axis) = distance to the observed star (r)
Metric 2 (y-axis) = recessional velocity of the star (v)
106
parsecs ≡ 1 Mpc = 3.3 million light years
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 14 / 52
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspected v ∼ r
Supports Big Bang hypothesis
2 What does the slope mean?
Slope:
v
r
=
r
t
×
1
r
=
1
t
≡ H0 (Hubble’s constant)
Inverse Hubble constant has units of time tH = 1/H0
tH is the expansion time = Age of Universe!
3 Small problem
Hubble calculated: tH 2 billion years
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspected v ∼ r
Supports Big Bang hypothesis
2 What does the slope mean?
Slope:
v
r
=
r
t
×
1
r
=
1
t
≡ H0 (Hubble’s constant)
Inverse Hubble constant has units of time tH = 1/H0
tH is the expansion time = Age of Universe!
3 Small problem
Hubble calculated: tH 2 billion years
Age of Earth tE 3–5 billion years (Oops!)
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspected v ∼ r
Supports Big Bang hypothesis
2 What does the slope mean?
Slope:
v
r
=
r
t
×
1
r
=
1
t
≡ H0 (Hubble’s constant)
Inverse Hubble constant has units of time tH = 1/H0
tH is the expansion time = Age of Universe!
3 Small problem
Hubble calculated: tH 2 billion years
Age of Earth tE 3–5 billion years (Oops!)
Not consistent
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspected v ∼ r
Supports Big Bang hypothesis
2 What does the slope mean?
Slope:
v
r
=
r
t
×
1
r
=
1
t
≡ H0 (Hubble’s constant)
Inverse Hubble constant has units of time tH = 1/H0
tH is the expansion time = Age of Universe!
3 Small problem
Hubble calculated: tH 2 billion years
Age of Earth tE 3–5 billion years (Oops!)
Not consistent Whaddya gonna do?
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
The Greatest Scatter Plot
0.0 0.5 1.0 1.5 2.0
05001000
Hubble's 1929 Corrected Data
Galactic distance (Mpc)
Recessionalvelocity(km/s)
Hubble even corrected for so-called peculiar velocity (black dots)
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 16 / 52
The Greatest Scatter Plot
0.0 0.5 1.0 1.5 2.0
05001000
Hubble's 1929 Corrected Data
Galactic distance (Mpc)
Recessionalvelocity(km/s)
Slope moved the wrong way
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 17 / 52
The Greatest Scatter Plot
Pay Day 2003
l
w
d
a
d
‘a
e
T
l
a
w
p
n
Z
P
t
a
t
p
e
h
Fig. 3. The Hubble diagram for type Ia supernovae. From the compilation of well observed type Ia
supernovae by Jha (29). The scatter about the line corresponds to statistical distance errors of Ͻ10% per
object. The small red region in the lower left marks the span of Hubble’s original Hubble diagram from
Hubble’s (linear) Law: v = H0r out to 2.3 billion light years
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 18 / 52
The Greatest Scatter Plot
Consistency
1 Hubble took some static for his 1929 paper
2 Couldn’t reach consistency and had to gamble
3 Best measurements (telescopes) at the time
4 Telescopes and measurements improved
5 Converged toward consistency over next decades
6 tH = 2.36 Gy (1929) → tH = 13.89 Gy (2003)
Data was wrong but his interpretation (model) was correct
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 19 / 52
The Greatest Scatter Plot
Consistency
1 Hubble took some static for his 1929 paper
2 Couldn’t reach consistency and had to gamble
3 Best measurements (telescopes) at the time
4 Telescopes and measurements improved
5 Converged toward consistency over next decades
6 tH = 2.36 Gy (1929) → tH = 13.89 Gy (2003)
Data was wrong but his interpretation (model) was correct
Guerrilla Mantra 1.16:
Treating data as something divine is a sin
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 19 / 52
Irregular Time Series
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power Laws
Zipf’s Law of Words
Database Query Times
Eleventh Hour Spikes
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 20 / 52
Irregular Time Series
Irregular Time Series
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 21 / 52
Irregular Time Series
Aggregating Time Series
1
Regular sample intervals:
Samples on tick of a metronome
Computer performance metrics
Weather data
2
Irregular sample intervals:
Missing data (e.g., stock exchanges)
Unequal sampling due to:
Events
Subscriptions (e.g., every 10,0000 sign-ups)
Occasional (e.g., personal weight)
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 22 / 52
Irregular Time Series
Back to Monitorama Boston 2013
Aggregation always assumes the arithmetic mean (AM)
Aggregation of irregular time series came up in @mleinart’s talk
NJG: “Should aggregate rate data using the harmonic mean (HM)”
But harmonic mean is not clear for time series
Cost me a month after Monitorama Boston to figure it out
See my blog post and detailed slides of April 9, 2013
Harmonic Averaging of Monitored Rate Data
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 23 / 52
Irregular Time Series
Back to Monitorama Boston 2013
Aggregation always assumes the arithmetic mean (AM)
Aggregation of irregular time series came up in @mleinart’s talk
NJG: “Should aggregate rate data using the harmonic mean (HM)”
But harmonic mean is not clear for time series
Cost me a month after Monitorama Boston to figure it out
See my blog post and detailed slides of April 9, 2013
Harmonic Averaging of Monitored Rate Data
Which is why Monitorama is cool
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 23 / 52
Irregular Time Series
Equal Intervals
AM
0.0 0.5 1.0 1.5 2.0 2.5
Time
0.5
1.0
1.5
2.0
Metric
Heights : hblue = 1 and hred = 1
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 24 / 52
Irregular Time Series
Arithmetic Mean of Heights
AM
0.0 0.5 1.0 1.5 2.0 2.5
Time
0.5
1.0
1.5
2.0
Metric
AM =
1
2
hblue +
1
2
hred =
1
2
(2 + 1) = 1.5
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 25 / 52
Irregular Time Series
Unequal Intervals (Area = 6)
0 1 2 3 4
Time
0.5
1.0
1.5
2.0
2.5
3.0
Metric
Heights : hblue = 3 and hred = 1
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 26 / 52
Irregular Time Series
AM Leaves a Gap (Area = 6)
AM
gap?
0 1 2 3 4
Time
0.5
1.0
1.5
2.0
2.5
3.0
Metric
AM =
1
2
hblue +
1
2
hred =
1
2
[3 + 1] = 2.0
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 27 / 52
Irregular Time Series
Stretch the Rectangle (Area = 6, Width = 4)
AM
HM
0 1 2 3 4
Time
0.5
1.0
1.5
2.0
2.5
3.0
Metric
HM = 1.5 × 4 = 6
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 28 / 52
Irregular Time Series
Lowers the Height
AM
HM
0 1 2 3 4
Time
0.5
1.0
1.5
2.0
2.5
3.0
Metric
Theorem
HM < AM
Harmonic mean is always smaller than Arithmetic mean of the same samples
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 29 / 52
Irregular Time Series
Monitored Subscription Rates
Samples only occur when subscription count reaches 10,000.
Sampling intervals are unevenly spaced in time over 33 days.
AM
HM
0 5 10 15 20 25 30 35
Time0
1000
2000
3000
4000
Rate
AM and HM are (different) averaged subscription rates.
Only HM gives the correct total time window of 33 days.
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 30 / 52
Irregular Time Series
Consistency
Use HM to aggregate monitored data when the following criteria apply:
R — Rate metric (on y-axis)
A — Async time intervals (on x-axis)
T — Threshold is low vs. high
E — Event data
Example metrics:
Cache-hit rate
Video bit-rate
Call rate
Please send in your examples
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 31 / 52
The Power of Power Laws
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power Laws
Zipf’s Law of Words
Database Query Times
Eleventh Hour Spikes
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 32 / 52
The Power of Power Laws
The Power of Power Laws
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 33 / 52
The Power of Power Laws Zipf’s Law of Words
Example 1: Zipf’s Law
Ranked data is 1000 most common wordforms in UK English based on 29 works of
literature by 18 authors (i.e., 4.6 million words)
Wordform: english word
Abs: absolute frequency (total number of occurrences)
Data format
> td <- read.table("~/../Power Laws/zipf1000.txt",header=TRUE)
> head(td)
Rank Wordform Abs r mod
1 1 the 225300 29 223066.9
2 2 and 157486 29 156214.4
3 3 to 134478 29 134044.8
4 4 of 126523 29 125510.2
5 5 a 100200 29 99871.2
6 6 I 91584 29 86645.5
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 34 / 52
The Power of Power Laws Zipf’s Law of Words
Linear Axes
050000100000150000200000
Ranked 1000 UK English Words
Ranked words (W)
Frequencyofoccurrence(F)
the their us love voice true state eye stand worth service neck land art
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 35 / 52
The Power of Power Laws Zipf’s Law of Words
Log-Log Axes
5e+022e+035e+032e+045e+042e+05
Ranked 1000 UK English Words
Ranked words (W)
Frequencyofoccurrence(F)
the it at would much us love lay eye dare
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 36 / 52
The Power of Power Laws Zipf’s Law of Words
Regression Fit
5e+022e+035e+032e+045e+042e+05
Ranked 1000 UK English Words
Ranked words (W)
Frequencyofoccurrence(F)
the it at would much us love lay eye dare
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 37 / 52
The Power of Power Laws Zipf’s Law of Words
Consistency
Log axes are word frequency (y) and ranked word order (x):
log(y) = −1.13 log(x)
y = x−1.13
y =
1
x1.13
Here, “power” refers to x to the power −1.13 (exponent)
Power laws differ from standard statistical distributions
Power laws carry most of the information in their tail
Fatter tail corresponds to stronger correlations than usual
Power laws imply persistent correlations that have to be explained
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 38 / 52
The Power of Power Laws Zipf’s Law of Words
Consistency
Log axes are word frequency (y) and ranked word order (x):
log(y) = −1.13 log(x)
y = x−1.13
y =
1
x1.13
Here, “power” refers to x to the power −1.13 (exponent)
Power laws differ from standard statistical distributions
Power laws carry most of the information in their tail
Fatter tail corresponds to stronger correlations than usual
Power laws imply persistent correlations that have to be explained
Zipf’s law correlations arise from grammatical rules
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 38 / 52
The Power of Power Laws Database Query Times
Example 2: Database Query Times
0 100 200 300 400 500
0100200300400
Index
orad$Elapstime
Like Zipf’s law, data must be ranked by frequency of occurrence
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 39 / 52
The Power of Power Laws Database Query Times
Visualize Ranked Data
0 100 200 300 400 500
0100200300400
Ranked SQL Times
Index
otr
Impossible to tell functional form of this curve
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 40 / 52
The Power of Power Laws Database Query Times
Try Double-Log Visualization
1 2 5 10 20 50 100 200 500
0.10.51.05.050.0500.0
Log-Log SQL Times
Index
otr
Clearly not power law overall
But first 100 queries do appear to be power law
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 41 / 52
The Power of Power Laws Database Query Times
Three Data Windows
1 2 5 10 20 50 100
100200300400500
Log-Log of SQL-A Times
Index
etA
0 50 100 150
304050607080
Log-Lin of SQL-B Times
Index
etB
0 20 40 60 80
0.0900.0950.1000.1050.110
Log-Lin of SQL-C Times
Index
etC
(A) log-log axes
(B) log-linear axes
(C) log-linear axes
This suggests breaking data across 3 regions:
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 42 / 52
The Power of Power Laws Database Query Times
Regression Analysis
1 2 5 10 20 50 100
100200300400500
Log-Log SQL A-Times
Index
etA
0 50 100 150
304050607080
Log-Lin SQL B-Times
Index
etB
0 20 40 60 80
0.0900.0950.1000.1050.110
Log-Lin SQL C-Times
Index
etC
(A) yA ∼ x−0.4632 power law decay
(B) yB ∼ e−0.0074x exponential decay
(C) yC ∼ e−0.0028x exponential decay
But this is still not enough
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 43 / 52
The Power of Power Laws Database Query Times
Consistency
1 2 5 10 20 50 100
100200300400500
Log-Log SQL A-Times
Index
etA
Power law slope γ = 0.46
Half Zipfian slope γ = 1.0
Correlations stronger than Zipf
Hypothesis
1 Shorter query times (window A) may involve dictionary lookups or other structured data.
Structure provides correlations.
2 Longer queries in window B are unstructured (ad hoc?) and randomized. Weak
correlations produce exponential decay.
3 Ditto for window C.
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 44 / 52
The Power of Power Laws Eleventh Hour Spikes
Example 3: Eleventh Hour Spikes
All Australian businesses were required to register with the Australian Tax Office (ATO)
for an Australian Business Number (ABN) to claim an income tax refund. The ABN
was introduced in Y2K.
Time series data from ABN registrations database.
Period covers March 27 to September 19, 2000
Deadline traffic spike on 31 May, 2000
Similar to rush to meet Obamacare deadline of March 31, 2014.
More details in my CMG Australia 2006 paper.
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 45 / 52
The Power of Power Laws Eleventh Hour Spikes
Complete Time Series
11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 2000 29 8 2000
0
200000
400000
600000
800000
1. 106
ORAConnections
Question: Could the “11th hour” spike have been predicted?
Answer: Yes, but quite involved.
How: Using a power law.
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 46 / 52
The Power of Power Laws Eleventh Hour Spikes
Complete Time Series
11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 2000 29 8 2000
0
200000
400000
600000
800000
1. 106
ORAConnections
Question: Could the “11th hour” spike have been predicted?
Answer: Yes, but quite involved.
How: Using a power law. What else!?
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 46 / 52
The Power of Power Laws Eleventh Hour Spikes
Semi-Log Plot
11 3 2000 21 4 2000 21 5 2000
1 104
2 104
5 104
1 105
2 105
5 105
1 106
2 106
ORAConnections
y-axis is the number of Oracle RDBMS connections (log scale)
Peak growth preceding spike looks almost linear on semi-log plot
Time range: 0–38 days
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 47 / 52
The Power of Power Laws Eleventh Hour Spikes
Statistical Regression on Peaks
11 3 2000 21 4 2000
1 104
2 104
5 104
1 105
2 105
5 105
1 106
ORAConnections
Linear growth on semi-log axes implies exponential function y = AeBt
Fit parameters
Origin: A = 1.14128 × 105
Curvature: B = 0.0175
Doubling period:
ln(2)
B
∼ 6 months
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 48 / 52
The Power of Power Laws Eleventh Hour Spikes
Trend on Linear Axes
11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000
0
200000
400000
600000
800000
1. 106ORAConnections
Exponential forecast looks valid, up to the crosshairs
Significantly underestimates onset of the “11th hour” peak
And rapid drop off after the peak
Faster than exponential suggests power law
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 49 / 52
The Power of Power Laws Eleventh Hour Spikes
Power Law Fit
Exp growth
Power law
11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000
0
200000
400000
600000
800000
1. 106ORAConnections
Log axes are y: connects (y) and time in days (x):
log(y) = −0.6421 log(|x − xc|)
y =
1
|x − xc|0.6421
where peak occurs at xc = 61 days
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 50 / 52
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributions
May have mixed regions of power law and other distributions
Can even predict critical spikes
Power laws signal presence of strong correlations
Explaining those correlations may be more difficult
Zipf’s law took 40 years
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributions
May have mixed regions of power law and other distributions
Can even predict critical spikes
Power laws signal presence of strong correlations
Explaining those correlations may be more difficult
Zipf’s law took 40 years
Remember
Aim for consistency
Learn to talk to God
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributions
May have mixed regions of power law and other distributions
Can even predict critical spikes
Power laws signal presence of strong correlations
Explaining those correlations may be more difficult
Zipf’s law took 40 years
Remember
Aim for consistency
Learn to talk to God (She’s listening)
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
The Power of Power Laws Eleventh Hour Spikes
Performance Dynamics Company
Castro Valley, California
www.perfdynamics.com
perfdynamics.blogspot.com
twitter.com/DrQz
Facebook
Training classes (May 19, 2014)
njgunther@perfdynamics.com
OFF: +1-510-537-5758
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 52 / 52

More Related Content

Similar to Edwin Hubble's 1929 Scatter Plot Supports Expanding Universe Theory

Ib 09 River Holford
Ib 09 River HolfordIb 09 River Holford
Ib 09 River Holfordtudorgeog
 
Using Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESUsing Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESDATAVERSITY
 
FACTOR analysis (July 2014 updated)
FACTOR analysis (July 2014 updated)FACTOR analysis (July 2014 updated)
FACTOR analysis (July 2014 updated)Michael Ling
 
Fault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curvesFault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curveseSAT Journals
 
Time series forecasting of solid waste generation in arusha city tanzania
Time series forecasting of solid waste generation in arusha city   tanzaniaTime series forecasting of solid waste generation in arusha city   tanzania
Time series forecasting of solid waste generation in arusha city tanzaniaAlexander Decker
 
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...ijtsrd
 
Synchronization and inverse synchronization of some different dimensional dis...
Synchronization and inverse synchronization of some different dimensional dis...Synchronization and inverse synchronization of some different dimensional dis...
Synchronization and inverse synchronization of some different dimensional dis...ijccmsjournal
 
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...ijccmsjournal
 
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...ijccmsjournal
 
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...ijccmsjournal
 
Lecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentLecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentDaria Bogdanova
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBPZihui Li
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andeSAT Publishing House
 
FREL uncertainties estimates
FREL uncertainties estimatesFREL uncertainties estimates
FREL uncertainties estimatesCIFOR-ICRAF
 
IRJET- Rainfall Forecasting using Regression Techniques
IRJET- Rainfall Forecasting using Regression TechniquesIRJET- Rainfall Forecasting using Regression Techniques
IRJET- Rainfall Forecasting using Regression TechniquesIRJET Journal
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methodsguest2137aa
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methodsguest9fa52
 

Similar to Edwin Hubble's 1929 Scatter Plot Supports Expanding Universe Theory (20)

Ib 09 River Holford
Ib 09 River HolfordIb 09 River Holford
Ib 09 River Holford
 
Using Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDESUsing Semantic Technology to Drive Agile Analytics - SLIDES
Using Semantic Technology to Drive Agile Analytics - SLIDES
 
FACTOR analysis (July 2014 updated)
FACTOR analysis (July 2014 updated)FACTOR analysis (July 2014 updated)
FACTOR analysis (July 2014 updated)
 
Fault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curvesFault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curves
 
Time series forecasting of solid waste generation in arusha city tanzania
Time series forecasting of solid waste generation in arusha city   tanzaniaTime series forecasting of solid waste generation in arusha city   tanzania
Time series forecasting of solid waste generation in arusha city tanzania
 
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
A Two Stage Estimator of Instrumental Variable Quantile Regression for Panel ...
 
Synchronization and inverse synchronization of some different dimensional dis...
Synchronization and inverse synchronization of some different dimensional dis...Synchronization and inverse synchronization of some different dimensional dis...
Synchronization and inverse synchronization of some different dimensional dis...
 
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
 
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
 
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
Synchronization and Inverse Synchronization of Some Different Dimensional Dis...
 
Lecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentLecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignment
 
A Movement Recognition Method using LBP
A Movement Recognition Method using LBPA Movement Recognition Method using LBP
A Movement Recognition Method using LBP
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms and
 
FREL uncertainties estimates
FREL uncertainties estimatesFREL uncertainties estimates
FREL uncertainties estimates
 
Florian Wellmann: Uncertainties in 3D Models
Florian Wellmann: Uncertainties in 3D ModelsFlorian Wellmann: Uncertainties in 3D Models
Florian Wellmann: Uncertainties in 3D Models
 
IRJET- Rainfall Forecasting using Regression Techniques
IRJET- Rainfall Forecasting using Regression TechniquesIRJET- Rainfall Forecasting using Regression Techniques
IRJET- Rainfall Forecasting using Regression Techniques
 
Chapter 07
Chapter 07Chapter 07
Chapter 07
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methods
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methods
 
Advanced Statistics.pptx
Advanced Statistics.pptxAdvanced Statistics.pptx
Advanced Statistics.pptx
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Edwin Hubble's 1929 Scatter Plot Supports Expanding Universe Theory

  • 1. A Melange of Methods for Manipulating Monitored Data Converging on Consistency Neil Gunther @DrQz en.wikipedia.org/wiki/Neil_J._Gunther Performance Dynamics Monitorama PDX May 6, 2014 SM c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 1 / 52
  • 2. Introductions c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 2 / 52
  • 3. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  • 4. I didn’t do Monitorama Berlin c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  • 5. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  • 6. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes Sorry... Deal with it c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  • 7. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes Sorry... Deal with it SFO runway 28L, 11:28 a.m., July 6, 2013 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  • 8. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes Sorry... Deal with it SFO runway 28L, 11:28 a.m., July 6, 2013 Asiana Airlines Flight 214 landing arse-backwards c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  • 9. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes Sorry... Deal with it SFO runway 28L, 11:28 a.m., July 6, 2013 Asiana Airlines Flight 214 landing arse-backwards (sans tail) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  • 10. “Asiana pilots appear to be overly reliant on instrument-guided landings and lack the training to touch down manually.” —SFO Commissioner Eleanor Johns c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 4 / 52
  • 11. A Message from Your Sponsors Don’t be too reliant on your instruments (strip charts, colored dials, shiny things) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 5 / 52
  • 12. Consistency 1 It’s not about pretty pictures c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  • 13. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  • 14. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  • 15. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math 4 Data are usually trying to tell you something c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  • 16. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math 4 Data are usually trying to tell you something 5 Your interpretation has to be consistent with other data c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  • 17. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math 4 Data are usually trying to tell you something 5 Your interpretation has to be consistent with other data 6 Your interpretation has to be consistent with other information c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  • 18. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math 4 Data are usually trying to tell you something 5 Your interpretation has to be consistent with other data 6 Your interpretation has to be consistent with other information This talk is about Converging on consistency by example c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  • 19. The Greatest Scatter Plot Topics 1 The Greatest Scatter Plot 2 Irregular Time Series 3 The Power of Power Laws Zipf’s Law of Words Database Query Times Eleventh Hour Spikes c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 7 / 52
  • 20. The Greatest Scatter Plot The Greatest Scatter Plot c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 8 / 52
  • 21. The Greatest Scatter Plot Goggle up! Science ahead... c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 9 / 52
  • 22. The Greatest Scatter Plot Some Monitored Data 5 10 15 20 0.00.51.01.52.0 Time Metric1 5 10 15 20 -2002006001000 Time Metric2 Two time series, two metrics: Metric 1 and Metric 2 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 10 / 52
  • 23. The Greatest Scatter Plot Scatter Plot 0.0 0.5 1.0 1.5 2.0 05001000 Metric 1 Metric2 Are Metric 1 and Metric 2 related in any way? c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 11 / 52
  • 24. The Greatest Scatter Plot Linear Regression 0.0 0.5 1.0 1.5 2.0 05001000 Metric 1 Metric2 LSQ fit: Metric2 = 423.94 Metric1 and R2 = 0.82 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 12 / 52
  • 25. The Greatest Scatter Plot This is Not the End This is just the beginning Need to reach consistency 1 Is the linear fit still a reasonable choice? 2 What is the meaning of the slope ? 3 Willing to extrapolate this model into the future? c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 13 / 52
  • 26. The Greatest Scatter Plot The most important scatter plot in history (1929) le on the expanding universe appeared in PNAS in 1929 [Hubble, E. P. (1929) Proc. Natl. Acad. Sci. USA 15, that a galaxy’s distance is proportional to its redshift, is so well known and so deeply embedded into the ough the Hubble diagram, the Hubble constant, Hubble’s Law, and the Hubble time, that the article itself hough Hubble’s distances have a large systematic error, Hubble’s velocities come chiefly from Vesto erpretation in terms of the de Sitter effect is out of the mainstream of modern cosmology, this article ation of the expanding, evolving, and accelerating universe that engages today’s burgeoning field of Edwin Hub- ‘‘A relation and radial tra-galactic g point in un- In this brief e evidence for es in 20th cen- g universe. es recede nd more dis- idly in pro- His graph of Fig. 1) is the he equation t, velocity ϭ s Law; the ubble con- Hubble time. of cosmic this is the the scientific an expanding lt is so impor- ant reference, eponymous bble’s aston- ridge, luminous matter reveals the pres- of acceleration set in are the route to Fig. 1. Velocity–distance relation among extra-galactic nebulae. Radial velocities, corrected for solar motion (but labeled in the wrong units), are plotted against distances estimated from involved stars and mean luminosities of nebulae in a cluster. The black discs and full line represent the solution for solar motion by using the nebulae individually; the circles and broken line represent the solution combining the nebulae into groups; the cross represents the mean velocity corresponding to the mean distance of 22 nebulae whose distances could not be estimated individually. [Reproduced with permission from ref. 1 (Copyright 1929, The Huntington Library, Art Collections and Botanical Gardens).] Metric 1 (x-axis) = distance to the observed star (r) Metric 2 (y-axis) = recessional velocity of the star (v) 106 parsecs ≡ 1 Mpc = 3.3 million light years c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 14 / 52
  • 27. The Greatest Scatter Plot Astronomer Edwin Hubble 1929 1 Is the linear fit still a reasonable choice? Edwin Hubble suspected v ∼ r Supports Big Bang hypothesis 2 What does the slope mean? Slope: v r = r t × 1 r = 1 t ≡ H0 (Hubble’s constant) Inverse Hubble constant has units of time tH = 1/H0 tH is the expansion time = Age of Universe! 3 Small problem Hubble calculated: tH 2 billion years c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
  • 28. The Greatest Scatter Plot Astronomer Edwin Hubble 1929 1 Is the linear fit still a reasonable choice? Edwin Hubble suspected v ∼ r Supports Big Bang hypothesis 2 What does the slope mean? Slope: v r = r t × 1 r = 1 t ≡ H0 (Hubble’s constant) Inverse Hubble constant has units of time tH = 1/H0 tH is the expansion time = Age of Universe! 3 Small problem Hubble calculated: tH 2 billion years Age of Earth tE 3–5 billion years (Oops!) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
  • 29. The Greatest Scatter Plot Astronomer Edwin Hubble 1929 1 Is the linear fit still a reasonable choice? Edwin Hubble suspected v ∼ r Supports Big Bang hypothesis 2 What does the slope mean? Slope: v r = r t × 1 r = 1 t ≡ H0 (Hubble’s constant) Inverse Hubble constant has units of time tH = 1/H0 tH is the expansion time = Age of Universe! 3 Small problem Hubble calculated: tH 2 billion years Age of Earth tE 3–5 billion years (Oops!) Not consistent c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
  • 30. The Greatest Scatter Plot Astronomer Edwin Hubble 1929 1 Is the linear fit still a reasonable choice? Edwin Hubble suspected v ∼ r Supports Big Bang hypothesis 2 What does the slope mean? Slope: v r = r t × 1 r = 1 t ≡ H0 (Hubble’s constant) Inverse Hubble constant has units of time tH = 1/H0 tH is the expansion time = Age of Universe! 3 Small problem Hubble calculated: tH 2 billion years Age of Earth tE 3–5 billion years (Oops!) Not consistent Whaddya gonna do? c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
  • 31. The Greatest Scatter Plot 0.0 0.5 1.0 1.5 2.0 05001000 Hubble's 1929 Corrected Data Galactic distance (Mpc) Recessionalvelocity(km/s) Hubble even corrected for so-called peculiar velocity (black dots) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 16 / 52
  • 32. The Greatest Scatter Plot 0.0 0.5 1.0 1.5 2.0 05001000 Hubble's 1929 Corrected Data Galactic distance (Mpc) Recessionalvelocity(km/s) Slope moved the wrong way c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 17 / 52
  • 33. The Greatest Scatter Plot Pay Day 2003 l w d a d ‘a e T l a w p n Z P t a t p e h Fig. 3. The Hubble diagram for type Ia supernovae. From the compilation of well observed type Ia supernovae by Jha (29). The scatter about the line corresponds to statistical distance errors of Ͻ10% per object. The small red region in the lower left marks the span of Hubble’s original Hubble diagram from Hubble’s (linear) Law: v = H0r out to 2.3 billion light years c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 18 / 52
  • 34. The Greatest Scatter Plot Consistency 1 Hubble took some static for his 1929 paper 2 Couldn’t reach consistency and had to gamble 3 Best measurements (telescopes) at the time 4 Telescopes and measurements improved 5 Converged toward consistency over next decades 6 tH = 2.36 Gy (1929) → tH = 13.89 Gy (2003) Data was wrong but his interpretation (model) was correct c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 19 / 52
  • 35. The Greatest Scatter Plot Consistency 1 Hubble took some static for his 1929 paper 2 Couldn’t reach consistency and had to gamble 3 Best measurements (telescopes) at the time 4 Telescopes and measurements improved 5 Converged toward consistency over next decades 6 tH = 2.36 Gy (1929) → tH = 13.89 Gy (2003) Data was wrong but his interpretation (model) was correct Guerrilla Mantra 1.16: Treating data as something divine is a sin c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 19 / 52
  • 36. Irregular Time Series Topics 1 The Greatest Scatter Plot 2 Irregular Time Series 3 The Power of Power Laws Zipf’s Law of Words Database Query Times Eleventh Hour Spikes c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 20 / 52
  • 37. Irregular Time Series Irregular Time Series c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 21 / 52
  • 38. Irregular Time Series Aggregating Time Series 1 Regular sample intervals: Samples on tick of a metronome Computer performance metrics Weather data 2 Irregular sample intervals: Missing data (e.g., stock exchanges) Unequal sampling due to: Events Subscriptions (e.g., every 10,0000 sign-ups) Occasional (e.g., personal weight) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 22 / 52
  • 39. Irregular Time Series Back to Monitorama Boston 2013 Aggregation always assumes the arithmetic mean (AM) Aggregation of irregular time series came up in @mleinart’s talk NJG: “Should aggregate rate data using the harmonic mean (HM)” But harmonic mean is not clear for time series Cost me a month after Monitorama Boston to figure it out See my blog post and detailed slides of April 9, 2013 Harmonic Averaging of Monitored Rate Data c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 23 / 52
  • 40. Irregular Time Series Back to Monitorama Boston 2013 Aggregation always assumes the arithmetic mean (AM) Aggregation of irregular time series came up in @mleinart’s talk NJG: “Should aggregate rate data using the harmonic mean (HM)” But harmonic mean is not clear for time series Cost me a month after Monitorama Boston to figure it out See my blog post and detailed slides of April 9, 2013 Harmonic Averaging of Monitored Rate Data Which is why Monitorama is cool c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 23 / 52
  • 41. Irregular Time Series Equal Intervals AM 0.0 0.5 1.0 1.5 2.0 2.5 Time 0.5 1.0 1.5 2.0 Metric Heights : hblue = 1 and hred = 1 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 24 / 52
  • 42. Irregular Time Series Arithmetic Mean of Heights AM 0.0 0.5 1.0 1.5 2.0 2.5 Time 0.5 1.0 1.5 2.0 Metric AM = 1 2 hblue + 1 2 hred = 1 2 (2 + 1) = 1.5 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 25 / 52
  • 43. Irregular Time Series Unequal Intervals (Area = 6) 0 1 2 3 4 Time 0.5 1.0 1.5 2.0 2.5 3.0 Metric Heights : hblue = 3 and hred = 1 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 26 / 52
  • 44. Irregular Time Series AM Leaves a Gap (Area = 6) AM gap? 0 1 2 3 4 Time 0.5 1.0 1.5 2.0 2.5 3.0 Metric AM = 1 2 hblue + 1 2 hred = 1 2 [3 + 1] = 2.0 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 27 / 52
  • 45. Irregular Time Series Stretch the Rectangle (Area = 6, Width = 4) AM HM 0 1 2 3 4 Time 0.5 1.0 1.5 2.0 2.5 3.0 Metric HM = 1.5 × 4 = 6 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 28 / 52
  • 46. Irregular Time Series Lowers the Height AM HM 0 1 2 3 4 Time 0.5 1.0 1.5 2.0 2.5 3.0 Metric Theorem HM < AM Harmonic mean is always smaller than Arithmetic mean of the same samples c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 29 / 52
  • 47. Irregular Time Series Monitored Subscription Rates Samples only occur when subscription count reaches 10,000. Sampling intervals are unevenly spaced in time over 33 days. AM HM 0 5 10 15 20 25 30 35 Time0 1000 2000 3000 4000 Rate AM and HM are (different) averaged subscription rates. Only HM gives the correct total time window of 33 days. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 30 / 52
  • 48. Irregular Time Series Consistency Use HM to aggregate monitored data when the following criteria apply: R — Rate metric (on y-axis) A — Async time intervals (on x-axis) T — Threshold is low vs. high E — Event data Example metrics: Cache-hit rate Video bit-rate Call rate Please send in your examples c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 31 / 52
  • 49. The Power of Power Laws Topics 1 The Greatest Scatter Plot 2 Irregular Time Series 3 The Power of Power Laws Zipf’s Law of Words Database Query Times Eleventh Hour Spikes c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 32 / 52
  • 50. The Power of Power Laws The Power of Power Laws c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 33 / 52
  • 51. The Power of Power Laws Zipf’s Law of Words Example 1: Zipf’s Law Ranked data is 1000 most common wordforms in UK English based on 29 works of literature by 18 authors (i.e., 4.6 million words) Wordform: english word Abs: absolute frequency (total number of occurrences) Data format > td <- read.table("~/../Power Laws/zipf1000.txt",header=TRUE) > head(td) Rank Wordform Abs r mod 1 1 the 225300 29 223066.9 2 2 and 157486 29 156214.4 3 3 to 134478 29 134044.8 4 4 of 126523 29 125510.2 5 5 a 100200 29 99871.2 6 6 I 91584 29 86645.5 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 34 / 52
  • 52. The Power of Power Laws Zipf’s Law of Words Linear Axes 050000100000150000200000 Ranked 1000 UK English Words Ranked words (W) Frequencyofoccurrence(F) the their us love voice true state eye stand worth service neck land art c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 35 / 52
  • 53. The Power of Power Laws Zipf’s Law of Words Log-Log Axes 5e+022e+035e+032e+045e+042e+05 Ranked 1000 UK English Words Ranked words (W) Frequencyofoccurrence(F) the it at would much us love lay eye dare c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 36 / 52
  • 54. The Power of Power Laws Zipf’s Law of Words Regression Fit 5e+022e+035e+032e+045e+042e+05 Ranked 1000 UK English Words Ranked words (W) Frequencyofoccurrence(F) the it at would much us love lay eye dare c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 37 / 52
  • 55. The Power of Power Laws Zipf’s Law of Words Consistency Log axes are word frequency (y) and ranked word order (x): log(y) = −1.13 log(x) y = x−1.13 y = 1 x1.13 Here, “power” refers to x to the power −1.13 (exponent) Power laws differ from standard statistical distributions Power laws carry most of the information in their tail Fatter tail corresponds to stronger correlations than usual Power laws imply persistent correlations that have to be explained c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 38 / 52
  • 56. The Power of Power Laws Zipf’s Law of Words Consistency Log axes are word frequency (y) and ranked word order (x): log(y) = −1.13 log(x) y = x−1.13 y = 1 x1.13 Here, “power” refers to x to the power −1.13 (exponent) Power laws differ from standard statistical distributions Power laws carry most of the information in their tail Fatter tail corresponds to stronger correlations than usual Power laws imply persistent correlations that have to be explained Zipf’s law correlations arise from grammatical rules c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 38 / 52
  • 57. The Power of Power Laws Database Query Times Example 2: Database Query Times 0 100 200 300 400 500 0100200300400 Index orad$Elapstime Like Zipf’s law, data must be ranked by frequency of occurrence c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 39 / 52
  • 58. The Power of Power Laws Database Query Times Visualize Ranked Data 0 100 200 300 400 500 0100200300400 Ranked SQL Times Index otr Impossible to tell functional form of this curve c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 40 / 52
  • 59. The Power of Power Laws Database Query Times Try Double-Log Visualization 1 2 5 10 20 50 100 200 500 0.10.51.05.050.0500.0 Log-Log SQL Times Index otr Clearly not power law overall But first 100 queries do appear to be power law c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 41 / 52
  • 60. The Power of Power Laws Database Query Times Three Data Windows 1 2 5 10 20 50 100 100200300400500 Log-Log of SQL-A Times Index etA 0 50 100 150 304050607080 Log-Lin of SQL-B Times Index etB 0 20 40 60 80 0.0900.0950.1000.1050.110 Log-Lin of SQL-C Times Index etC (A) log-log axes (B) log-linear axes (C) log-linear axes This suggests breaking data across 3 regions: c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 42 / 52
  • 61. The Power of Power Laws Database Query Times Regression Analysis 1 2 5 10 20 50 100 100200300400500 Log-Log SQL A-Times Index etA 0 50 100 150 304050607080 Log-Lin SQL B-Times Index etB 0 20 40 60 80 0.0900.0950.1000.1050.110 Log-Lin SQL C-Times Index etC (A) yA ∼ x−0.4632 power law decay (B) yB ∼ e−0.0074x exponential decay (C) yC ∼ e−0.0028x exponential decay But this is still not enough c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 43 / 52
  • 62. The Power of Power Laws Database Query Times Consistency 1 2 5 10 20 50 100 100200300400500 Log-Log SQL A-Times Index etA Power law slope γ = 0.46 Half Zipfian slope γ = 1.0 Correlations stronger than Zipf Hypothesis 1 Shorter query times (window A) may involve dictionary lookups or other structured data. Structure provides correlations. 2 Longer queries in window B are unstructured (ad hoc?) and randomized. Weak correlations produce exponential decay. 3 Ditto for window C. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 44 / 52
  • 63. The Power of Power Laws Eleventh Hour Spikes Example 3: Eleventh Hour Spikes All Australian businesses were required to register with the Australian Tax Office (ATO) for an Australian Business Number (ABN) to claim an income tax refund. The ABN was introduced in Y2K. Time series data from ABN registrations database. Period covers March 27 to September 19, 2000 Deadline traffic spike on 31 May, 2000 Similar to rush to meet Obamacare deadline of March 31, 2014. More details in my CMG Australia 2006 paper. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 45 / 52
  • 64. The Power of Power Laws Eleventh Hour Spikes Complete Time Series 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 2000 29 8 2000 0 200000 400000 600000 800000 1. 106 ORAConnections Question: Could the “11th hour” spike have been predicted? Answer: Yes, but quite involved. How: Using a power law. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 46 / 52
  • 65. The Power of Power Laws Eleventh Hour Spikes Complete Time Series 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 2000 29 8 2000 0 200000 400000 600000 800000 1. 106 ORAConnections Question: Could the “11th hour” spike have been predicted? Answer: Yes, but quite involved. How: Using a power law. What else!? c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 46 / 52
  • 66. The Power of Power Laws Eleventh Hour Spikes Semi-Log Plot 11 3 2000 21 4 2000 21 5 2000 1 104 2 104 5 104 1 105 2 105 5 105 1 106 2 106 ORAConnections y-axis is the number of Oracle RDBMS connections (log scale) Peak growth preceding spike looks almost linear on semi-log plot Time range: 0–38 days c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 47 / 52
  • 67. The Power of Power Laws Eleventh Hour Spikes Statistical Regression on Peaks 11 3 2000 21 4 2000 1 104 2 104 5 104 1 105 2 105 5 105 1 106 ORAConnections Linear growth on semi-log axes implies exponential function y = AeBt Fit parameters Origin: A = 1.14128 × 105 Curvature: B = 0.0175 Doubling period: ln(2) B ∼ 6 months c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 48 / 52
  • 68. The Power of Power Laws Eleventh Hour Spikes Trend on Linear Axes 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 0 200000 400000 600000 800000 1. 106ORAConnections Exponential forecast looks valid, up to the crosshairs Significantly underestimates onset of the “11th hour” peak And rapid drop off after the peak Faster than exponential suggests power law c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 49 / 52
  • 69. The Power of Power Laws Eleventh Hour Spikes Power Law Fit Exp growth Power law 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 0 200000 400000 600000 800000 1. 106ORAConnections Log axes are y: connects (y) and time in days (x): log(y) = −0.6421 log(|x − xc|) y = 1 |x − xc|0.6421 where peak occurs at xc = 61 days c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 50 / 52
  • 70. The Power of Power Laws Eleventh Hour Spikes Consistency Log-log plots are an easy way to test for power law distributions May have mixed regions of power law and other distributions Can even predict critical spikes Power laws signal presence of strong correlations Explaining those correlations may be more difficult Zipf’s law took 40 years c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
  • 71. The Power of Power Laws Eleventh Hour Spikes Consistency Log-log plots are an easy way to test for power law distributions May have mixed regions of power law and other distributions Can even predict critical spikes Power laws signal presence of strong correlations Explaining those correlations may be more difficult Zipf’s law took 40 years Remember Aim for consistency Learn to talk to God c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
  • 72. The Power of Power Laws Eleventh Hour Spikes Consistency Log-log plots are an easy way to test for power law distributions May have mixed regions of power law and other distributions Can even predict critical spikes Power laws signal presence of strong correlations Explaining those correlations may be more difficult Zipf’s law took 40 years Remember Aim for consistency Learn to talk to God (She’s listening) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
  • 73. The Power of Power Laws Eleventh Hour Spikes Performance Dynamics Company Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com twitter.com/DrQz Facebook Training classes (May 19, 2014) njgunther@perfdynamics.com OFF: +1-510-537-5758 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 52 / 52