The 7 habits of highly effective statisticians

The seven habits of highly
effective statisticians
Stephen Senn
Consultant Statistician, Edinburgh, UK
© Stephen Senn 2020 1

A Question to Keep You Amused
Consider a ‘coin of ignorance’
   
 
, 1 ,
1, 0 1
P H P T
f
 
 
  
  
The coin is tossed 100 times. If X is the number of heads,
which of these two is more likely?:
 
 
50
100 ?
P X
P X


100!/(50!50!)  1029 sequences
One sequence
 Is the
probability
of a head
Every
value of 
is equally
likely

Of course, this is an ironic title
• Any statistician knows that you should think in terms of the three Cs:
• Causation
• Control
• Comparison
• To which a fourth might be added
• Counterfactuals
• The question of interest is
• What habits have a beneficial effect on your probability of being an effective
statistician?
• Many effective statisticians will be in the habit of taking breakfast. This
doesn’t make taking breakfast a cause of being an effective statistician.
That which
would have
happened
had you
acted
differently

And my advice is hypocritical
• I earn my living as a statistician promoting, using and evaluating
numerical evidence
• Based on studies with
• Control
• Randomisation
• Replication
• I am proposing instead to give you advice based on one uncontrolled
example
• Me

The magnificent seven
• Read
• Listen ( & see)
• Understand
• Think
• Do
• Calculate
• Communicate
• Include some classics in your reading
• Fit the answer to the problem not vice versa
• Requires some subject matter comprehension
• It’s not just a matter of mathematics (but it also is)
• The devil is the detail and doing discovers it
• Use calculations to increase, not instead of understanding
• Think hard about what the simplest honest way is to
communicate the message

I am not going to go through this list in detail
• Instead I shall illustrate some of these points by a few examples I shall
present
• Invalid inversion
• Regression to the mean
• Some statistical ‘howlers’
• These will illustrate between them the value of
• Understand
• Communicate
• Think
• Do
• Read
• Calculate
What happened to Listen?
That’s where you come in!

A Simple Example of ‘Invalid Inversion’
• Most women do not suffer from breast cancer
• It would be a mistake to conclude, however, that most breast cancer
victims are not women
• To do so would be to transpose the conditionals
• This is an example of invalid inversion
• Why is this important?
• People regularly confuse the probability of the data given the
hypothesis with the probability of the hypothesis given the data
• Misinterpretation of P-values is linked to this
7(c) Stephen Senn

Some Plausible Figures for the UK
8(c) Stephen Senn

Probability breast cancer given female = 550/31,418=0.018
9(c) Stephen Senn

Probability female given breast cancer =550/553=0.995
10(c) Stephen Senn

The difference is in the denominator
The numerator is the same
11(c) Stephen Senn
Invalid inversion is an error caused by mistaking the relevant marginal class
550/31418 or 550/553

A Little Maths
 
 
 
 
 
 
       Unless ,
P A B
P A B
P B
P A B
P B A
P A
P B P A P A B P B A




 
So invalid inversion is equivalent to a confusion of the marginal probabilities. The
same joint probability is involved in the two conditional probabilities but different
marginal probabilities are involved
12(c) Stephen Senn

The Regression Analogue
Predicting Y from X is not the same as predicting X from Y.
2
2
XY
Y X
X
XY
X Y
Y








Note the similarity with the probability case.
The numerator (the covariance) is a statistic of joint variation.
The denominators (the variances) are statistics of marginal variation. These
marginal statistics are not the same.
13(c) Stephen Senn
The difference is in the denominator
The numerator is the same

Dimensional analysis
• Consider the example of regressing weight from height and vice versa
• Suppose you put height in cm into your ‘black’ box to predict weight in kg
• The input is in cm
• The output is in kg
• You must multiply the cm by a regression coefficient that is in kg/cm
• The covariance is in units of kg x cm and you divide by a variance that is in cm2 to get
kg/cm
• Suppose you put weight in kg into your black box to predict height in cm
• You must multiply the kg in a coefficient that is in cm/kg
• The numerator is the covariance in both cases
• A different variance is used for the denominator

Just to make that perfectly clear
𝑤𝑒𝑖𝑔ℎ𝑡 𝑘𝑔 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑘𝑔 +
𝑐𝑜𝑣 𝑐𝑚 × 𝑘𝑔
𝑣𝑎𝑟 𝑐𝑚 × 𝑐𝑚
× ℎ𝑒𝑖𝑔ℎ𝑡 𝑐𝑚
ℎ𝑒𝑖𝑔ℎ𝑡 𝑐𝑚 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑐𝑚 +
𝑐𝑜𝑣(𝑐𝑚 × 𝑘𝑔)
𝑣𝑎𝑟(𝑘𝑔 × 𝑘𝑔)
× 𝑤𝑒𝑖𝑔ℎ𝑡(𝑘𝑔)

Morals
• Think carefully about basic and fundamental concepts in probability and
statistics
• Seek an understanding that is not just mathematical but that reveals why
things have to be the way they are
• Make parallels
• Regression is similar to conditional probability in some way
• Dimensional analysis (a tool used by physicists and engineers) is very valuable
• Find the simplest way to communicate important points
• Proofs are good but not for this
• Examples are excellent
• Read widely and seek different explanations of the same thing

Regression to the Mean
A Simulated Example
• Diastolic blood pressure (DBP)
• Mean 90mmHg
• Between patient variance 50mmHg2
• Within patient variance 15 mmHg2
• Boundary for hypertensive 95 mmHg
• Simulation of 1000 patients whose DBP at baseline
and outcome are shown
• Blue consistent normotensive
• Red Consistent hypertensive
• Orange hypertensive/normotensive or vice versa
17(c) Stephen Senn

18(c) Stephen Senn
What you will
see if all
patients
are followed up

19(c) Stephen Senn
What you will
see if hypertensive
patients
are followed up

(c) Stephen Senn 20
Mean at baseline and
outcome are the same
Mean at outcome is
lower than at baseline
All patients are hypertensive
at baseline
Many are not at outcome

Probably not the best way to explain this
Who wrote this?
Senn, S. J. (1988). How much of the placebo 'effect' is really statistical
regression? [letter]. Statistics in Medicine, 7(11), 1203

Doing and calculating avoids stupid mistakes
Stupid mistake Cure
Proposing allocation ratios of 7:5:3
for a three armed trial.
Calculate the minimum block size.
Hint: It’s 105.
Proposing some software for cross-
over trials that could adjust the
treatments to which patients are
allocated depending on results in
earlier periods.
Try do this is real time.
Hint: This may help you learn that patients do not
arrive simultaneously in a clinical trial.
Claim that the use of placebos in
clinical trials is unethical if there is
an effective treatment.
Run a clinical trial in a serious disease where there
is a partially effective treatment.
Hint: How do you avoid withdrawing the partially
effective treatment from some patients?

Advice on Understanding, Thinking, Reading
etc.
• Mathematics is important
• But it’s not enough
• Statistics is not a branch of mathematics although probability theory is
• Applications are important
• Loving your data
• Getting to know the application area
• Biology!
• Pharmacology!
• Reading the classics is good for you
• Especially Fisher

That problem
The two events are equally likely. In fact,
 
1
, , 0,1, .
1
n P X k k n
n
   

L
Proof could involve some or all of the following:
marginal, conditional and joint probabilities
calculus
Bayes theorem
posterior probability
predictive distribution
proof by induction

Intuition
Imagine one billion tosses.
Your posterior probability would have to be very close to the
observed relative frequency, which would be close to the
‘true’ value.
But your prior probability says every true value is equally
likely.
Therefore, every observable ratio is equally likely.
But the result is also trivially true for n = 1. It is hardly
surprising, therefore, if the result is true for every value of n
between 1 and 1 billion.

Moral
• It is important to think about your assumptions carefully
• If you do this you can understand what they imply
• Trying simple cases is helpful
• If you do this you can often see what the solution must be
• Extreme cases (one billion tosses) can also be helpful
• The mathematical solution is valuable but it is not a substitute for this
• Statistics is more than just mathematics
• It is also science and philosophy

real problem
real problem
operational
problem
solution application
solution application
idealised
problem
Mathematics
Statistics

In the mathematical formulation of any problem it is necessary
to base oneself on some appropriate idealizations and
simplification…... One loses sight of the original nature of the
problem, falls in love with the idealization, and then blames
reality for not conforming to it.
de Finetti, (1975).
It seems a pity that while we statisticians have an opportunity
to rate as first-class scientists we should settle for the rather
dreary role of second-class mathematicians.
George Box (1990)

Statistics is a subject where everything has to
be understood three times
•In terms of mathematics
•In terms of philosophy
•In terms of application

•Finally, I would like
to leave you with
this question
•Did you know there
are only 120 days
to Christmas?
Traditional Polish Present
Piernik
Alternative suggestion
3rd edition out soon

The 7 habits of highly effective statisticians

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The 7 habits of highly effective statisticians

Similar to The 7 habits of highly effective statisticians (20)

More from Stephen Senn

More from Stephen Senn (19)

Recently uploaded

Recently uploaded (20)

The 7 habits of highly effective statisticians

Editor's Notes