2. What you’ll learnWhat you’ll learn
ToTo createcreate andand interpretinterpret the followingthe following
graphs:graphs:
DotplotDotplot
Stem and leafStem and leaf
Regular Stem and LeafRegular Stem and Leaf
Split Stem and LeafSplit Stem and Leaf
Back-to-Back Stem and LeafBack-to-Back Stem and Leaf
HistogramHistogram
Time PlotTime Plot
OgiveOgive
3. To learn how to display and describe quantitative data weTo learn how to display and describe quantitative data we
will be using some baseball statistics. The following tablewill be using some baseball statistics. The following table
shows the number of home runs in a single season forshows the number of home runs in a single season for
three well-known baseball players: Hank Aaron, Barrythree well-known baseball players: Hank Aaron, Barry
Bonds, and Babe Ruth.Bonds, and Babe Ruth.
Hank Aaron Barry Bonds Babe Ruth
13 32 16 40 54 46
27 44 25 37 59 41
26 39 24 34 35 34
44 29 19 49 41 22
30 44 33 73 46
39 38 25 25
40 47 34 47
34 34 46 60
45 40 37 54
44 20 33 46
24 42 49
4. DotplotDotplot
Label the horizontal axis with the name of theLabel the horizontal axis with the name of the
variable and title the graphvariable and title the graph
Scale the axis based on the values of theScale the axis based on the values of the
variablevariable
Mark a dot (we’ll use x’s) above the number onMark a dot (we’ll use x’s) above the number on
the axis corresponding to each data valuethe axis corresponding to each data value
Ruth
20 25 30 35 40 45 50 55 60
Number of Home Runs in a Single Season Dot Plot
5. Describing a DistributionDescribing a Distribution
We describe a distribution (the values theWe describe a distribution (the values the
variable takes on and how often it takesvariable takes on and how often it takes
these values) using the acronymthese values) using the acronym SOCSSOCS
SShape–hape– We describe the shape of a distribution inWe describe the shape of a distribution in
one of two ways:one of two ways:
Symmetric/Approx. SymmetricSymmetric/Approx. Symmetric
Symmetric
-3 -2 -1 0 1 2 3
Collection 1 Dot Plot
Uniform
-3 -2 -1 0 1 2 3 4
Shape Dot Plot
6. SkewedSkewed
RightRight LeftLeft
Notice that the direction of the “skew” is the sameNotice that the direction of the “skew” is the same
direction as the “tail”direction as the “tail”
LeftSkewed
-3 -2 -1 0 1 2 3 4
Shape Dot Plot
RightSkewed
-4 -3 -2 -1 0 1 2 3 4
Shape Dot Plot
“tail” “tail”
7. •OOutliers: These are observations that weutliers: These are observations that we
would consider “unusual”. Pieces of datawould consider “unusual”. Pieces of data
that don’t “fit” the overall pattern of the data.that don’t “fit” the overall pattern of the data.
Babe Ruth had two seasonsBabe Ruth had two seasons
that appear to be somewhatthat appear to be somewhat
different than the rest of hisdifferent than the rest of his
career. Thesecareer. These maymay bebe
“outliers“outliers””
(We’ll learn a numerical way to(We’ll learn a numerical way to
determine if observations aredetermine if observations are
truly “unusual” later)truly “unusual” later)
The season in which BarryThe season in which Barry
Bonds hit 73 home runsBonds hit 73 home runs
does not appear to fit thedoes not appear to fit the
overall pattern. This pieceoverall pattern. This piece
of dataof data maymay be an outlier.be an outlier. Bonds
10 20 30 40 50 60 70 80
Number of Home Runs in a Single Season Dot Plot
Unusual observation???
Ruth
20 25 30 35 40 45 50 55 60 65
Number of Home Runs in a Single Season Dot Plot
Unusual observation???
8. CCenter: A single value that describes the entireenter: A single value that describes the entire
distribution. A “typical” value that gives a concisedistribution. A “typical” value that gives a concise
summary of the whole batch of numbers.summary of the whole batch of numbers.
A typical season for Babe Ruth appears to beA typical season for Babe Ruth appears to be
approximately 46 home runsapproximately 46 home runs
Ruth
20 25 30 35 40 45 50 55 60 65
Number of Home Runs in a Single Season Dot Plot
*We’ll learn about three different numerical measures of center in the next
section
9. SSpread: Since we knowpread: Since we know
that not everyone isthat not everyone is
typical, we need to alsotypical, we need to also
talk about the variation oftalk about the variation of
a distribution. We needa distribution. We need
to discuss if the values ofto discuss if the values of
the distribution are tightlythe distribution are tightly
clustered around theclustered around the
center making it easy tocenter making it easy to
predict or do the valuespredict or do the values
vary a great deal from thevary a great deal from the
center making predictioncenter making prediction
more difficult?more difficult?
Ruth
20 25 30 35 40 45 50 55 60 65
Number of HomeRuns inaSingleSeason Dot Plot
Babe Ruth’s number of home runs in a
single season varies from a low of 23 to
a high of 60.
*We’ll learn about three different numerical measures of spread in the next
section.
10. Distribution Description usingDistribution Description using
SOCSSOCS
The distribution of Babe Ruth’s number of homeThe distribution of Babe Ruth’s number of home
runs in a single season isruns in a single season is approximatelyapproximately
symmetricsymmetric11
withwith two possible unusualtwo possible unusual
observations at 23 and 25 home runsobservations at 23 and 25 home runs..22
HeHe
typically hits about 46typically hits about 4633
home runs in a season.home runs in a season.
Over his career, the number of home runs hasOver his career, the number of home runs has
varied from a low of 23 to a high of 60.varied from a low of 23 to a high of 60.44
1-Shape 2-Outliers
3-Center 4-Spread
11. Stem and Leaf PlotStem and Leaf Plot
Creating a stem and leaf plotCreating a stem and leaf plot
Order the data points fromOrder the data points from
least to greatestleast to greatest
Separate each observationSeparate each observation
into ainto a stemstem (all but the(all but the
rightmost digit) and arightmost digit) and a leafleaf (the(the
final digit)—Ex. 123-> 12final digit)—Ex. 123-> 12
(stem): 3 (leaf)(stem): 3 (leaf)
In a T-chart, write the stemsIn a T-chart, write the stems
vertically in increasing order onvertically in increasing order on
the left side of the chart.the left side of the chart.
On the right side of the chartOn the right side of the chart
writewrite eacheach leaf to the right ofleaf to the right of
its stem, spacing the leavesits stem, spacing the leaves
equallyequally
Include a key and title for theInclude a key and title for the
graphgraph
Hank Aaron
1 3
2 0 4 6 7 9
3 0 2 4 4 8 9 9
4 0 0 4 4 4 4 5 7
4 6 = 46
Key
Number of Home Runs in a
Single Season
12. Split Stem and Leaf PlotSplit Stem and Leaf Plot
If the data in a distribution is concentrated in justIf the data in a distribution is concentrated in just
a few stems, the picture may be morea few stems, the picture may be more
descriptive if we “split” the stemsdescriptive if we “split” the stems
When we “split” stems we want the sameWhen we “split” stems we want the same
number of digits to be possible in each stem.number of digits to be possible in each stem.
This means that each original stem can be splitThis means that each original stem can be split
into 2 or 5 new stems.into 2 or 5 new stems.
A good rule of thumb is to have a minimum of 5A good rule of thumb is to have a minimum of 5
stems overallstems overall
Let’s look at how splitting stems changes theLet’s look at how splitting stems changes the
look of the distribution of Hank Aaron’s homelook of the distribution of Hank Aaron’s home
run data.run data.
13. Split each stem into 2Split each stem into 2
new stems. Thisnew stems. This
means that the firstmeans that the first
stem includes thestem includes the
leaves 0-4 and theleaves 0-4 and the
second stem has thesecond stem has the
leaves 5-9leaves 5-9
Splitting the stemsSplitting the stems
helps us to “see” thehelps us to “see” the
shape of theshape of the
distribution in thisdistribution in this
case.case.
Hank Aaron
1 3
1
2 0 4
2 6 7 9
3 0 2 4 4
3 8 9 9
4 0 0 4 4 4 4
4 5 7
Number of Home Runs in a
Single Season
Key
4 6 = 46
14. Back-to-Back Stem and LeafBack-to-Back Stem and Leaf
Back-to-Back stemBack-to-Back stem
and leaf plots allowand leaf plots allow
us to quicklyus to quickly
compare twocompare two
distributions.distributions.
Use SOCS toUse SOCS to
make comparisonsmake comparisons
betweenbetween
distributionsdistributions
Aaron Ruth
3 1
1
4 0 2 2
9 7 6 2 5
4 4 2 0 3 4
9 9 8 3 5
4 4 4 4 0 0 4 1 1
7 5 4 6 6 6 7 9
5 4 4 9
5
6 0
Number of Home Runs in a Single
Season
Key
4 6 = 46
15. Advantages and Disadvantages ofAdvantages and Disadvantages of
dotplots/stem and leaf plotsdotplots/stem and leaf plots
AdvantagesAdvantages
Preserves each piecePreserves each piece
of dataof data
Shows features of theShows features of the
distribution withdistribution with
regards to shape—regards to shape—
such as clusters, gaps,such as clusters, gaps,
outliers, etcoutliers, etc
DisadvantagesDisadvantages
If creating by hand,If creating by hand,
large data sets can belarge data sets can be
cumbersomecumbersome
Data that is widelyData that is widely
varied may be difficultvaried may be difficult
to graphto graph
16. HistogramsHistograms
A histogram is one of the most common graphsA histogram is one of the most common graphs
used for quantitative variables.used for quantitative variables.
Although a histogram looks like a bar chartAlthough a histogram looks like a bar chart
there are some important differencesthere are some important differences
In a histogram, the “bars” touch each otherIn a histogram, the “bars” touch each other
Histograms do not necessarily preserve individualHistograms do not necessarily preserve individual
data piecesdata pieces
Changing the “scale” or “bin width” can drasticallyChanging the “scale” or “bin width” can drastically
alter the picture of the distribution, so caution mustalter the picture of the distribution, so caution must
be used when describing a distribution when only abe used when describing a distribution when only a
histogram has been usedhistogram has been used
17. Creating a histogramCreating a histogram
Divide the range ofDivide the range of
data into classes ofdata into classes of
equal width. Countequal width. Count
the number ofthe number of
observations in eachobservations in each
class. (Rememberclass. (Remember
that the width isthat the width is
somewhat arbitrarysomewhat arbitrary
and you might chooseand you might choose
a different width thana different width than
someone else)someone else)
Barry Bonds:Barry Bonds:
Data Ranges from 16Data Ranges from 16
to 73, so we chooseto 73, so we choose
for our classesfor our classes
1515 ≤ # of HR ≤ 19≤ # of HR ≤ 19
..
..
..
7070 ≤ # of HR ≤ 75≤ # of HR ≤ 75
We can thenWe can then
determine the countsdetermine the counts
for each “bin”for each “bin”
18. So the frequencySo the frequency
distribution looks like:distribution looks like:
The horizontal axisThe horizontal axis
represents therepresents the
variable values, sovariable values, so
using the lower boundusing the lower bound
of each class to scaleof each class to scale
is appropriate.is appropriate.
The vertical axis canThe vertical axis can
representrepresent
FrequencyFrequency
Relative frequencyRelative frequency
Cumulative frequencyCumulative frequency
Relative cumulativeRelative cumulative
frequencyfrequency
We’ll use frequencyWe’ll use frequency
Class Frequency
15-24 3
25-34 6
35-44 4
45-54 2
55-64 0
65-74 1
19. Label and scale your axes. Title your graphLabel and scale your axes. Title your graph
Draw a bar that represents the frequency forDraw a bar that represents the frequency for
each class. Remember that the bars of theeach class. Remember that the bars of the
histograms should touch each other.histograms should touch each other.
20. InterpretationInterpretation
We interpret a histogram in the same wayWe interpret a histogram in the same way
we interpret a dotplot or stem and leafwe interpret a dotplot or stem and leaf
plot.plot.
ALWAYS useALWAYS use
S O C SS O C S
ShapeShape OutliersOutliers
CenterCenter SpreadSpread
21. Time PlotsTime Plots
Sometimes, our data is collected atSometimes, our data is collected at
intervals over time and we are looking forintervals over time and we are looking for
changes or patterns that have occurred.changes or patterns that have occurred.
We use a time plot for this type of dataWe use a time plot for this type of data
A time plot uses both the horizontal andA time plot uses both the horizontal and
vertical axes.vertical axes.
The horizontal axis represents the timeThe horizontal axis represents the time
intervalsintervals
The vertical axis represents the variableThe vertical axis represents the variable
valuesvalues
22. Creating a Time PlotCreating a Time Plot
Label and scale theLabel and scale the
axes. Title youraxes. Title your
graph.graph.
Plot a pointPlot a point
corresponding to thecorresponding to the
data taken at eachdata taken at each
time intervaltime interval
A line segment drawnA line segment drawn
between each pointbetween each point
may be helpful to seemay be helpful to see
patterns in the datapatterns in the data
Year HR Year HR
1986 16 1994 37
1987 25 1995 33
1988 24 1996 42
1989 19 1997 40
1990 33 1998 37
1991 25 1999 34
1992 34 2000 49
1993 46 2001 73
BondsHR
10
20
30
40
50
60
70
80
Year
1986 1990 1994 1998 2002
Barry Bonds Line Scatter Plot
23. Describing Time PlotsDescribing Time Plots
When describing timeWhen describing time
plots, you should look forplots, you should look for
trends in the datatrends in the data
Although the number ofAlthough the number of
home runs do not show ahome runs do not show a
constant increase fromconstant increase from
year to year we note thatyear to year we note that
overall, the number ofoverall, the number of
home runs made byhome runs made by
Barry Bond has increasedBarry Bond has increased
over time with the mostover time with the most
notable increase beingnotable increase being
between 1999 and 2001.between 1999 and 2001.
BondsHR
10
20
30
40
50
60
70
80
Year
1986 1990 1994 1998 2002
Barry Bonds Line Scatter Plot
24. Relative frequency, CumulativeRelative frequency, Cumulative
frequency, Percentiles, and Ogivesfrequency, Percentiles, and Ogives
Sometimes we are interested in describingSometimes we are interested in describing
the relative position of an observationthe relative position of an observation
For example: you have no doubtablyFor example: you have no doubtably
been told at one time or another that youbeen told at one time or another that you
scored at the 80scored at the 80thth
percentile. This meanspercentile. This means
that 80% of the people taking the testthat 80% of the people taking the test
score the same or lower than you did.score the same or lower than you did.
How can we model this?How can we model this?
25. OgiveOgive
(Relative cumulative frequency graph)(Relative cumulative frequency graph)
We first startWe first start
by creating aby creating a
frequencyfrequency
tabletable
We’ll look atWe’ll look at
how eachhow each
column iscolumn is
created in thecreated in the
next fewnext few
slidesslides
# of home Relative
runs in a Relative Cumulative Cumulative
season Frequency Frequency Frequency Frequency
15-24 3 0.1875 3 0.1875
25-34 6 0.375 9 0.5625
35-44 4 0.25 13 0.8125
45-54 2 0.125 15 0.9375
55-64 0 0.0 15 0.9375
65-74 1 0.0625 16 1.0000
26. Relative FrequencyRelative Frequency
The # of home runs… andThe # of home runs… and
the frequency are the samethe frequency are the same
columns as we created forcolumns as we created for
the histogram.the histogram.
To find the values for theTo find the values for the
“Relative Frequency”“Relative Frequency”
column find the following:column find the following:
Frequency ValueFrequency Value
Total # ofTotal # of = Relative Frequency= Relative Frequency
observationsobservations
# of home *
runs in a Relative
season Frequency Frequency
15-24 3 0.1875
25-34 6 0.375
35-44 4 0.25
45-54 2 0.125
55-64 0 0.0
65-74 1 0.0625
* Within rounding, this column should equal 1
27. Cumulative FrequencyCumulative Frequency
Cumulative frequencyCumulative frequency
simply adds thesimply adds the
counts in thecounts in the
frequency column thatfrequency column that
fall in or below thefall in or below the
current class level.current class level.
For Example: to findFor Example: to find
the “13”, add thethe “13”, add the
frequencies in thefrequencies in the
oval:oval:
3+6+4+2+0+1=163+6+4+2+0+1=16
# of home
runs in a Relative Cumulative
season Frequency Frequency Frequency
15-24 3 0.1875 3
25-34 6 0.375 9
35-44 4 0.25 13
45-54 2 0.125 15
55-64 0 0.0 15
65-74 1 0.0625 16
28. Relative Cumulative FrequencyRelative Cumulative Frequency
Relative cumulativeRelative cumulative
frequency divides thefrequency divides the
cumulative frequencycumulative frequency
by the total number ofby the total number of
observationsobservations
For Example:For Example:
.8125 = 13/16.8125 = 13/16
# of
ho
m
e Relative
runs in a Relative Cumulative Cumulative
season Frequency Frequency Frequency Frequency
15-24 3 0.1875 3 0.1875
25-34 6 0.375 9 0.5625
35-44 4 0.25 13 0.8125
45-54 2 0.125 15 0.9375
55-64 0 0.0 15 0.9375
65-74 1 0.0625 16 1.0000
Sum 16 1
29. Creating the OgiveCreating the Ogive
Label and scale the axesLabel and scale the axes
Horizontal: VariableHorizontal: Variable
Vertical: Relative Cumulative FrequencyVertical: Relative Cumulative Frequency
(percentile)(percentile)
Plot a point corresponding to the relativePlot a point corresponding to the relative
cumulative frequency in each class interval atcumulative frequency in each class interval at
thethe left endpoint of theleft endpoint of the nextnext classclass intervalinterval
The last point you should plot should be at aThe last point you should plot should be at a
height of 100%height of 100%
30. # of home Relative
runs in a Cumulative
season Frequency *
15-24 0.1875
25-34 0.5625
35-44 0.8125
45-54 0.9375
55-64 0.9375
65-74 1.0000
A line segment from point to point can be added for
analysis
31. Types of Info from OgivesTypes of Info from Ogives
Finding an individual observation within theFinding an individual observation within the
distributiondistribution
Find the relative standing of a season in whichFind the relative standing of a season in which
Barry Bonds hit 40 home runsBarry Bonds hit 40 home runs
A season with 40 home runs lies at the 60th
percentile, meaning that
approximately 60% of his seasons had 40 or less home runs
32. Locating an observation corresponding to aLocating an observation corresponding to a
percentile.percentile.
How many home runs must be hit in a seasonHow many home runs must be hit in a season
to correspond to the 75to correspond to the 75thth
percentile?percentile?
To be better than 75% of Mr. Bonds season, approximately 42
home runs must be hit.
33. A little History on the word OgiveA little History on the word Ogive
(sometimes called an Ogee)(sometimes called an Ogee)
It was first used by Sir FrancisIt was first used by Sir Francis
Galton, who borrowed a term fromGalton, who borrowed a term from
architecture to describe thearchitecture to describe the
cumulative normal curve (morecumulative normal curve (more
about that next chapter).about that next chapter).
The ogive in architecture was aThe ogive in architecture was a
common decorative element incommon decorative element in
many of the English Churchesmany of the English Churches
around 1400. The picture at rightaround 1400. The picture at right
shows the door to the Church ofshows the door to the Church of
The Holy Cross at the village ofThe Holy Cross at the village of
Caston in Norfolk. In this image youCaston in Norfolk. In this image you
can see the use of the ogive in thecan see the use of the ogive in the
design of the door and repeated indesign of the door and repeated in
the windows above.the windows above.
Find more about this term atFind more about this term at
MathwordsMathwords..