More Related Content
Similar to Harmonic Mean for Monitored Rate Data
Similar to Harmonic Mean for Monitored Rate Data (20)
Harmonic Mean for Monitored Rate Data
- 1. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
Aggregating Monitored Rate
Data Using the Harmonic
Mean
Progressive notes developed in response to remarks that arose
during the Monitorama Conference, Boston MA, March 28-29, 2013
Neil J. Gunther
Performance Dynamics Company
N.J. Gunther
Last updated November 24, 2013
1
- 2. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Contents
1 Monitoring as Motivation
3
2 Meaning of the Means
8
3 Visual Explanation
13
4 Checking HM Correctness
20
5 Application to Time Series
24
6 Weighted Harmonic Mean
33
7 Accommodating Zero Rates
40
8 Conclusions
51
N.J. Gunther
Last updated November 24, 2013
2
- 4. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
During the presentations at Monitorama, we saw any number of
monitored metrics displayed as a time series, like Fig. 1.
Metric
50 000
40 000
30 000
20 000
10 000
0
200
400
600
800
1000
Time
Figure 1: Typical time series display of a collected metric
Eventually, we need to aggregate these data.
N.J. Gunther
Last updated November 24, 2013
4
- 5. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
Aggregation
Aggregation refers to averaging the monitored data on the boundary of
some time period, T . Such boundaries might occur daily, weekly,
monthly, etc.
A more important question (that is often overlooked) is, what do we
mean by averaging?
The usual assumption is that aggregation means taking the statistical
mean or, what is the same thing, taking the arithmetic average of all the
metric values occurring in each period T .
This may or may not be a valid assumption, depending on 2 things:
1. The type of metric being monitored
2. Whether the metric is sampled or an event
Remark 1. The distinction b/w sampled metrics and event metrics was
never delineated in any Monitorama presentations. More on this later.
N.J. Gunther
Last updated November 24, 2013
5
- 6. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
Types of Metrics
There are only 3 types of metrics (see my Keynote):
1. Time — the fundamental performance metric. Dimension [T ]
Example measurement units: ns, weeks.
2. Counts — integer or decimal number. Dimensionless [φ]
Example measurement units: subscriptions, RSS.
3. Rate — inverse time. Dimension [1/T ] or [T −1 ]
Example measurement units: Gbps, MIPS.
Definition 1. The throughput (X) is a rate metric type. It’s the number
of work units completed (C) per unit time (T ):
C
(1)
T
Example 1. A web server handling C = 30, 000 httpGets every minute
has an average throughput of X = 30000/60 = 500 Gets per second.
X=
N.J. Gunther
Last updated November 24, 2013
6
- 7. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
Graphite Workshop
During the Graphite workshop, aggregating monitored rate data was
mentioned. This caused me to interject the cautionary comment:
The correct way to average rates (inverse-time metrics) is to apply
the harmonic mean, not the arithmetic mean.
At least that’s what the classic computer performance books tell you.
See, e.g., Allen (Academic Press 1990) and Jain (Wiley 1991).
I wasn’t emphatic about it b/c the examples in those textbooks do not
refer to time series. Good thing b/c the usual form of the harmonic mean
doesn’t work for time series!
That’s what I’m going to address here. Goggle up; science ahead.
N.J. Gunther
Last updated November 24, 2013
7
- 9. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
Meaning of the Means – AM
Definition 2 (Arithmetic Mean). The sum on the numbers (iid rvs)
divided by the number of numbers:
X1 + X2 + . . . + XN
=
AM =
N
N
k=1
Xk
N
(2)
Example 2 (Arithmetic mean of the first 100 integers).
AM =
1 + 2 + . . . + 100
50 × 101
=
= 50.50
100
100
In R, the arithmetic mean is calculated simply as:
> mean(1:100)
[1] 50.5
N.J. Gunther
Last updated November 24, 2013
9
- 10. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Meaning of the Means – HM
Definition 3 (Harmonic Mean). The inverse of the arithmetic
mean of the inverses (iid rvs):
HM =
1
1
( X1
N
1
1
+ X2 + . . . +
1
XN
)
=
1
N
N
k=1
1
Xk
−1
(3)
Example 3 (Harmonic mean of the first 100 integers).
HM =
1+
1
2
100
+ ... +
1
100
= 19.28
Since the harmonic mean is not defined in the base R pkg, we write:
> 100/sum(1/1:100) # matches Example 3
[1] 19.27756
or
> 1/mean(1/1:100) # matches eqn.(3)
[1] 19.27756
N.J. Gunther
Last updated November 24, 2013
10
- 11. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
The Ad Nauseam Example
But how do we know when to apply the harmonic mean?
The example used to illustrate the application of HM ad nauseam is a
vehicle covering the same distance at different speeds.
Example 4 (Variable speed trip). Suppose a car travels 100 miles from
city A to city B at 100 mph. But, on the return journey the weather is
bad, so the car is forced to travel at the slower speed of 50 mph. What is
the average speed for the round trip?
The total RTT time is 3 hrs b/c it takes 1 hr to go from A to B and 2
hrs to return at half the speed.
If we assume the arithmetic mean of the speeds, the average speed is:
AM = 1 (100 + 50) or 75 mph. But covering 200 miles at an average
2
speed of 75 mph would take 2 hrs 40 mins, not 3 hrs. Oops!
N.J. Gunther
Last updated November 24, 2013
11
- 12. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
If, however, we apply the harmonic mean:
HM =
1
1
1
( 100 +
2
1
)
50
we get an average speed of 662⁄3 mph. And covering 200 miles at an
average speed of 662⁄3 mph does take 3 hrs.
Remark 2. Notice that HM < AM. This is always true.
In my Graphite workshop mini-talk, I gave the example of database reads
and writes as corresponding to the two different IOPS rates or speeds
executing the same number of IOs, analogous to the same distance.
Proposition 1. The harmonic mean applies when the same amount of
work is done at different rates.
Another common example would be where you want to average the
different throughput rates of the same benchmark measured on different
speed processor systems.
But benchmarking is not monitoring.
N.J. Gunther
Last updated November 24, 2013
12
- 14. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Visual Explanation
Metric
3.0
2.5
2.0
1.5
1.0
0.5
0
1
2
3
4
Time
Figure 2: Invariant areas
The blue and red areas are equal: 3h × 1w = 3w × 1h = 3 squares each.
The areas represent the same count metric (C): distance, IOs, etc.
N.J. Gunther
Last updated November 24, 2013
14
- 15. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
The AM Doesn’t Work
Metric
3.0
2.5
AM
2.0
gap?
1.5
1.0
0.5
0
1
2
3
4
Time
Figure 3: Yellow area corresponds to height AM = 2
Since the yellow area of 6 squares, corresponding to a height AM = 2
[AM = 1 (3 + 1)], is only 3 squares wide, there is a gap 1 square wide.
2
N.J. Gunther
Last updated November 24, 2013
15
- 16. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Correcting the AM Area
Metric
3.0
2.5
2.0
AM
1.5
HM
1.0
0.5
0
1
2
3
4
Time
Figure 4: Squashing the yellow area into the green area
The green area of 6 squares, corresponding to a height HM = 1.5
[HM = 2 × 3/(3 + 1)], now has the correct width (total time).
N.J. Gunther
Last updated November 24, 2013
16
- 17. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Covering All the Columns
Metric
3.0
2.5
2.0
AM
1.5
HM
1.0
0.5
0
1
2
3
4
Time
Figure 5: Harmonic column height (HM) of width 4 units
The original blue and red areas correspond to histogram columns of
different widths. The green HM column has the correct total width.
N.J. Gunther
Last updated November 24, 2013
17
- 18. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Does AM Ever Work?
Yes. The AM is applicable when columns have uniform width.
Metric
2.0
Metric
2.0
AM
1.5
AM
1.5
1.0
1.0
0.5
0.5
0.0
0.5
1.0
1.5
2.0
2.5
Time
0.0
0.5
1.0
1.5
2.0
2.5
Time
Figure 6: AM works for uniform column widths
Most common case and why statisticians use the AM for statistical mean.
And why the HM is not in the base R package.
N.J. Gunther
Last updated November 24, 2013
18
- 19. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
Time Bin Widths
The count per unit time constitutes a rate metric (X = C/T ).
Proposition 2. The harmonic mean (HM) applies to histograms with
columns having the same areas (counts) but different widths . In the
case of monitored data, these different widths constitute different time
bins. This case is most likely to occur with asynchronous event data.
Proposition 3. Since the event counts (C) occur in time (T) on the
x-axis, the y-axis must be a rate metric, e.g. throughput X = C/T .
Events per unit time.
Proposition 4. The arithmetic mean (AM) applies to histograms with
columns having the same widths but different areas (counts). That
turns out to be the most common case b/c the monitored data are
sampled on equal periodic boundaries, like the ticks of a metronome.
N.J. Gunther
Last updated November 24, 2013
19
- 21. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Checking the Correctness of HM
Recalling eqn. (3) for N periods:
HM =
=
1
X1
C
X1
+
1
X2
N
+ ... +
NC
C
+ X2 + . . . +
1
XN
C
XN
(4)
We’ve simply multiplied each interval by the constant count C, as is
appropriate for HM.
Substituting the definition of throughput from eqn. (1) produces:
HM =
NC
T1 + T2 + . . . + TN
(5)
which agrees with the notion
Average (harmonic) rate =
N.J. Gunther
Total counts
Total time
Last updated November 24, 2013
(6)
21
- 22. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
Remark 3. The same counts per period (C), completed at different rates
(Xk ) in the denominator of eqn. (4), are responsible for producing the
nonuniform time intervals (Tk ) in the denominator of HM in eqn. (5).
Theorem 1 (When is HM = AM?). If Tk intervals are the same, as
they are with sampled data, the counts per sample will be different, i.e.,
will have different rates per sample, and HM reduces to AM.
Proof 1. Under these conditions, eqn. (5) for the HM becomes
1
N
C1 + C2 + . . . + CN
C1 + C2 + . . . + CN
=
T + T + ... + T
NT
C1
C2
CN
X1 + X2 + . . . + XN
+
+ ... +
=
T
T
T
N
But this is precisely the definition of AM given by eqn. (2).
N.J. Gunther
Last updated November 24, 2013
22
- 23. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Checking the Examples
We can use eqn. (6) to check that HM is the right type of average.
1. Example 4 with N = 2 speeds (X1 = 100 mph, X2 = 50 mph) over
the same distance (C = 100 miles):
HM =
1
1
1
( 100 +
2
1
)
50
= 662⁄3 mph
200 miles
Total counts
=
= 66.67 mph
Total time
3 hrs
2. Visual HM example with different column widths:
HM =
1
3
= units high
1 1
2
( + 1)
2 3
1
Total counts
6 squares
=
= 1.5 units high
Total time
4 units
N.J. Gunther
Last updated November 24, 2013
23
- 25. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Monitored Subscription Rates
Rate
4000
3000
2000
1000
0
5
10
15
20
25
30
35
Time
Figure 7: Real data: subscription rates over 33 days
Days
9.24932
18.663
27.4192
30.2493
33.0007
Rate
N.J. Gunther
0
0.00
1081.16
1062.28
1142.05
3533.40
3634.56
Last updated November 24, 2013
25
- 26. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Irregular Time Boundaries
Rate
4000
3000
2000
1000
0
5
10
15
20
25
30
35
Time
Figure 8: Since the time-series data are not sampled but triggered
on 10,000 subscriptions, the data points do not fall on regular time
boundaries.
N.J. Gunther
Last updated November 24, 2013
26
- 27. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Rates as Column Heights
Rate
4000
3000
2000
1000
0
5
10
15
20
25
30
35
Time
Figure 9: Irregular time intervals are more easily discerned in a
columnated format. We want to aggregate these data into a single
datum.
N.J. Gunther
Last updated November 24, 2013
27
- 28. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
The numerical subscription rates (Xk ) are:
X1
X2
X3
X4
X5
X6
0.00
1081.16
1062.28
1142.05
3533.40
3634.56
Using R, the AM and HM are:
> hmean <- function(vals) { 1/mean(1/vals) }
> rates <- c(0.00 1081.16 1062.28 1142.05 3533.40 3634.56)
> mean(rates)
# AM
[1] 1742.242
> hmean(rates) # HM
[1] 0
The AM evaluates but the HM fails. Why? From eqn. (5) the HM is
HM =
1 1
(
6 0.0
+
1
1081.16
+
1
1062.28
1
1
+ 1142.05 +
1
3533.40
+
1
)
3634.56
(7)
But the first term in the denominator is infinite and dominates all the
other values. The final inversion “1/∞” produces HM = 0.
N.J. Gunther
Last updated November 24, 2013
28
- 29. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Let’s Try That Again
We don’t need the first data-point. Treat it as the origin of the time
period associated with the X2 data point. To drop it in R, we write:
> rates[-1]
[1] 1081.16 1062.28 1142.05 3533.40 3634.56
> hmean(rates[-1])
[1] 1515.118
which is non-zero and less than AM. That’s encouraging.
Alternatively, we can evaluate HM explicitly as
> length(rates[-1])/sum(1/rates[-1])
[1] 1515.118
Note that the numerator is now 5 rather than 6
> length(rates[-1])
[1] 5
due to dropping the first value.
N.J. Gunther
Last updated November 24, 2013
29
- 30. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Check the HM Value
The measured rates were triggered on a count of 10,000 per period.
The total count is therefore C = 5 × 10, 000 subscriptions.
The total time period is T = 33.0007 days.a
From eqn. (6) the time-averaged harmonic rate is:
XHM =
C
50, 000
=
= 1515.12
T
33.0007
which agrees with hmean(rates[-1]) on the previous page.
Alternatively, only the HM gives the correct total time window
T =
C
50, 000
=
= 33.0007
XHM
1515.12
in agreement with the concept shown in Figure 5.
a Don’t
pay too much attention the decimal digits. I’m only displaying them
for consistency and readability.
N.J. Gunther
Last updated November 24, 2013
30
- 31. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
AM and HM for Subscription Data
Rate
4000
3000
2000
AM
HM
1000
0
0
5
10
15
20
25
30
35
Time
Figure 10: The AM and HM represent the average subscription rate
and therefore correspond to different positions on the y-axis. But,
only the HM gives the correct total time window of 33 days.
N.J. Gunther
Last updated November 24, 2013
31
- 32. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
The Aggregated HM Value
Rate
4000
3000
2000
1000
0
0
5
10
15
20
25
30
35
Time
Figure 11: The HM is the big blue dot that correctly replaces these
subscription-rate data for this time bin (33 days) when they are
aggregated
N.J. Gunther
Last updated November 24, 2013
32
- 34. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Weighted Harmonic Mean
Recalling Example 4, we consider the following generalization of the HM.
Definition 4 (Weighted Harmonic Mean).
WHM =
where the total weight W =
1
W
w
( X1
1
k
1
w
+ X2 + . . . +
2
wk
XN
)
(8)
wk .
Example 5 (Variable speed over different distances). A car travels 50
miles at 40 mph, 60 miles at 50 mph and 40 miles at 60 mph. What is
the average speed of the trip?
The distance weights are: w1 = 50, w2 = 60, w3 = 40. Substituting into
eqn. 8 yields:
50 + 60 + 40
WHM = 50
= 48.13 mph
+ 60 + 40
40
50
60
N.J. Gunther
Last updated November 24, 2013
34
- 35. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Significance of the WHM
Check the preceding calculation in R:
> wts
<- c(50, 60, 40)
> rates <- c(40, 50, 60)
> sum(wts)/(sum(wts/rates))
[1] 48.12834
The counts per period were constant in both Example 4 (Ck = 100
miles) and the example in Section 5 (Ck = 10, 000 subscribers).
Proposition 5. The WHM allows us to calculate HM when counts per
period are distributed arbitrarily within the aggregation time window.
Eqn. (8) can be rewritten with weights as percentages:
WHM =
1
%
( w11
X
+
w2 %
X2
+ ... +
wk %
)
XN
(9)
where wk % = wk /W .
N.J. Gunther
Last updated November 24, 2013
35
- 36. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Determining the Percentage Weights
The percentage weights can be obtained directly from monitored data
using the following steps:
1. Each rate data point rk has an associated time increment ∆tk
2. The product wk = rk × ∆tk is the raw weight (area) for data point k
3. The total weight is W =
wk (total area)
wk
(fraction of total area)
4. The percentage weight is wk % =
W
k
In R, we can write the above calculation as a function with 2 args:
wtspc <- function(rates, tdeltas) {
weights <- rates * tdeltas
totalwt <- sum(weights)
return(weights / totalwt)
}
N.J. Gunther
Last updated November 24, 2013
36
- 37. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Application of WHM to Time Series
Rate
70
60
50
40
30
20
10
0
100
200
300
400
500
Time
Figure 12: Monitored rates for application “GAM”
Aggregation window size is 60 samples with T = 558.83 units
N.J. Gunther
Last updated November 24, 2013
37
- 38. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
> gamrates
[1] 18.68 10.77 16.60 19.69 1.95 22.53 4.99
[13] 6.51 6.80 22.19 4.35 3.90 3.16 9.98
[25] 8.30 6.16 11.93 63.95 21.63 11.37 5.31
[37] 3.35 3.69 6.18 17.51 21.79 8.99 11.83
[49] 14.93 6.38 4.21 3.25 31.02 17.10 20.49
2.50 7.91
5.25 48.49
5.48 3.49
8.26 4.54
3.85 10.66
5.21
5.26
4.96
2.71
4.58
9.73
1.95
3.88
4.02
5.08
5.67
8.49
8.86
6.94
3.70
> gamdeltas
[1] 3.03 4.95 3.59 2.88 30.12 2.66 11.98 21.35 6.47 11.30 5.32 8.94
[13] 8.95 7.42 2.70 12.48 14.06 15.99 5.98 10.68 1.16 10.48 29.67 6.55
[25] 6.40 9.17 4.23 0.85 2.57 4.87 9.67 10.14 16.40 11.39 13.24 6.05
[37] 16.44 16.08 9.41 3.25 2.32 5.67 4.60 7.12 12.96 20.98 12.67 7.48
[49] 3.60 8.18 12.65 16.33 1.89 2.95 2.58 14.48 5.19 12.70 9.87 15.77
Using eqn. (9) and our R function wtspc() we find:
> (whm.gam <- 1 / sum(wtspc(gamrates, gamdeltas) / gamrates))
[1] 5.913534
Check WHM value produces the correct total time T = 558.83 units:
> sum(gamdeltas)
[1] 558.827
> sum(gamdeltas*gamrates) / whm.gam
[1] 558.827
N.J. Gunther
Last updated November 24, 2013
38
- 39. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
WHM Aggregation Result
Rate
70
60
50
40
30
20
10
0
100
200
300
400
500
Time
Figure 13: WHM aggregation of monitored “GAM” rates in Fig. 12
N.J. Gunther
Last updated November 24, 2013
39
- 41. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Handling Zeros in the Time Series
WHM in Sect. 6 worked b/c those data did not contain any zero rate values.
However, with HM of eqn. (7) we already saw that
1
→ ∞ as X → 0
X
Since that single value dominates all the other nonzero terms in the
denominator of HM, the final inversion produces an overall zero value:
HM =
1
→ 0 as X → 0
1/X
The same is true for WHM in eqn. (9).
This dooms the algorithmic use of WHM for general time series.
Since monitored rate metrics can be expected to include zero values in any
aggregation period, we need a way to accommodate them.
N.J. Gunther
Last updated November 24, 2013
41
- 42. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Example 6 (Toy sample rates with zero values).
X1 = 0, X2 = 100, X3 = 100, X4 = 0, X5 = 100
The standard harmonic mean (3) produces the result HM = 0.
> zr <- c(0,100,100,0,100)
> hmean(zr)
[1] 0
Some possible remedies:
Ignore zero values: Pretend the zeros don’t exist and there are only 3
(positive) data values.
HM3,3 =
3/3
1/3
100
+
1/3
100
+
1/3
100
= 100
(10)
Drop zero values: Retain 3 of 5 positive values with weights of 1/5.
HM3,5 =
3/5
1/5
100
+
1/5
100
+
1/5
100
= 100
(11)
Surprise! Ignoring == Dropping
N.J. Gunther
Last updated November 24, 2013
42
- 43. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
But Wait! It Gets Worse
HM3,3 and HM3,5 are both identical to the arithmetic mean!
We can check this in R:
> zr[-which(zr==0)] # drop zeros
[1] 100 100 100
> zpos <- zr[-which(zr==0)]
> hmean(zpos)
# HM
[1] 100
> mean(zpos)
# AM
[1] 100
Proposition 6. Naively including zero rates produces HM = 0. FAIL
Proposition 7. Naively dropping zero rates produces the AM. FAIL
N.J. Gunther
Last updated November 24, 2013
43
- 44. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
A More Careful Approach
We want to find an algorithm that produces 0 < HM < 100 for
Example 6 by accounting for all 5 data points, but not overbiasing due
to the presence of zero values.
Conjecture 1. The zeros in X1 , X4 have weights 1/5 each. Ignore those
terms in the harmonic sum but redistribute their weights across the
weights of the remaining non-zero terms X2 , X3 , X5 .
Each term in the harmonic sum has a weight of 1/5. The 2 zero terms
have a total weight of 2/5. Adding a third of that total zero-term weight
to each of the positive-term weights produces a new weight:
1
3
N.J. Gunther
2
5
+
1
5
Last updated November 24, 2013
44
- 45. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Now, eqn. (11) becomes
3/5
(2/5)/3 + 1/5
100
+
(2/5)/3 + 1/5
100
+
(12)
(2/5)/3 + 1/5
100
In addition, each weight simplifies further as
1
3
2
5
+
1
1
=
5
3
2
5
+
1
5
3
3
=
2
3
1
5
+
1
5
3
3
=
1
3
Hence, (12) reduces to
3/5
1/3
100
+
1/3
100
+
1/3
100
= 60
(13)
which is less than the AM, but not zero, and thus meets our requirement.
Eqn. (13) for the zero-renormalized harmonic mean has the form
ZRHM5,2 =
3
HM3,3
5
(14)
where HM3,3 is the same as eqn. (10).
N.J. Gunther
Last updated November 24, 2013
45
- 46. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
The ZRHM Theorem
Since the 2nd factor in the RHS of eqn. (14) is the usual HM, it could
also be extended to include weighted terms (w%) for irregular counts per
time interval as defined by the WHM. See eqn. (9) in Section 6.
We can now write a general formula for calculating the harmonic mean
of arbitrary rate data.
Theorem 2 (Zero Renormalized Harmonic Mean).
NZ
ZRHM =
NW
1
NZ
NZ
k=1
w%
Xk
−1
(15)
where NW is the total number of data points in the aggregation window,
N0 is the number of zeros and NZ = NW − N0 . (cf. eqn. (3))
Proof 2. See preceding discussion.
N.J. Gunther
Last updated November 24, 2013
46
- 47. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
The ZRHM Algorithm
The following R function implements eqn. (15) of Thm 2 with uniform
weights.
zrhm <- function(tsrates) {
ndatas <- length(tsrates)
nzeros <- length(which(tsrates == 0))
pozdata <- tsrates[which(tsrates != 0)]
nozwt
<- (ndatas - nzeros) / ndatas
nozhm
<- 1 / mean(1 / pozdata)
return(nozwt * nozhm)
}
It takes an arbitrary time series, tsrates, of monitored rate data as its
argument (including zero values) and returns the ZRHM.
N.J. Gunther
Last updated November 24, 2013
47
- 48. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Test Cases
Toy rate data: From Example 6
> zr
[1]
0 100 100
> zrhm(zr)
[1] 60
0 100
which agrees with the manually calculated result.
Subscription data: From Section 5
> sub.rates
[1]
0.00 1081.16 1062.28 1142.05 3533.40 3634.56
> hmean(sub.rates)
[1] 0
> hmean(sub.rates[-1])
[1] 1515.118
> zrhm(sub.rates)
[1] 1262.599
The result, HM−1 = 1515.118, is obtained by not including the zero
value at the origin. When that value is included, ZRHM < HM−1 ,
as expected, but ZRHM > 0, unlike HM = 0.
N.J. Gunther
Last updated November 24, 2013
48
- 49. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
Arbitrary Time Series
Fig. 14 shows a time series of 1000 rate values ranging b/w 0 and 100.
It contains 7 zero values whose locations in time are not known a priori.
Rate
100
80
60
40
20
200
400
600
800
1000
Time
Figure 14: AM = 50.93, HM = 0, ZRHM = 22.03
N.J. Gunther
Last updated November 24, 2013
49
- 50. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
ZRHM Summary
• ZRHM is especially useful if a threshold is defined as a lower bound,
e.g., cache hit-rate, video bit-rate, b/c ZRHM is biased toward
smaller rather than larger values.
• For a string of contiguous zero values can be treated as boundaries
b/w smaller aggregation windows. Take the 1st zero as defining the
end of a aggregation window, last zero as the beginning of next
aggregation window.
• No longer need to confirm the total time T from subareas.
N.J. Gunther
Last updated November 24, 2013
50
- 51. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
8
N.J. Gunther
Conclusions
Last updated November 24, 2013
51
- 52. Harmonic Mean Aggregation
Copyright © 2013 Performance Dynamics
What We Have Learned
• We compared AM vs HM averaging for monitored rate data.
Conventional wisdom says HM is the correct way to average rate
metrics. [See Example 4] But, for monitored data...
• HM assumes counts in each time bin are equal but bins have
different widths. Async event data (intermittent) triggered on a
common count criterion, e.g., every 1000 subscriptions.
• Otherwise, if time bins have same width, as with data collected on
same sample interval, HM = AM. [See Thm 1]
• HM fails if any rate measurement is zero. [See slide 41] Compensate
by using ZRHM. [See Thm 2]
• Since HM < AM, ZRHM is useful for detecting monitored rate falls
to a lower bound.
N.J. Gunther
Last updated November 24, 2013
52
- 53. Copyright © 2013 Performance Dynamics
Harmonic Mean Aggregation
When Should I Use the Harmonic Mean?
You should use the HM, or more accurately ZRHM, to aggregate
monitored data when all of the following criteria apply:
R — Rate metric
A — Async time intervals
T — Too low data values are of interest
E — Event data, not sampled data
Example metrics:
• Cache-hit rate
• Video bit-rate
• Call center service
N.J. Gunther
Last updated November 24, 2013
53