This document provides advice on habits that make statisticians effective. It discusses the importance of understanding causation, control, comparison and counterfactuals when thinking about effectiveness. It warns against proposing habits as causes without proper evaluation. Seven key habits are identified: read, listen, understand, think, do, calculate, and communicate. The document illustrates these habits through examples of invalid inversion, regression to the mean, and statistical mistakes. It emphasizes understanding concepts fundamentally rather than just mathematically and finding simple ways to communicate ideas.
7. A Simple Example of ‘Invalid Inversion’
• Most women do not suffer from breast cancer
• It would be a mistake to conclude, however, that most breast cancer
victims are not women
• To do so would be to transpose the conditionals
• This is an example of invalid inversion
• Why is this important?
• People regularly confuse the probability of the data given the
hypothesis with the probability of the hypothesis given the data
• Misinterpretation of P-values is linked to this
7(c) Stephen Senn
9. Some Plausible Figures for the UK
Probability breast cancer given female = 550/31,418=0.018
9(c) Stephen Senn
10. Some Plausible Figures for the UK
Probability female given breast cancer =550/553=0.995
10(c) Stephen Senn
11. The difference is in the denominator
The numerator is the same
11(c) Stephen Senn
Invalid inversion is an error caused by mistaking the relevant marginal class
550/31418 or 550/553
12. A Little Maths
Unless ,
P A B
P A B
P B
P A B
P B A
P A
P B P A P A B P B A
So invalid inversion is equivalent to a confusion of the marginal probabilities. The
same joint probability is involved in the two conditional probabilities but different
marginal probabilities are involved
12(c) Stephen Senn
13. The Regression Analogue
Predicting Y from X is not the same as predicting X from Y.
2
2
XY
Y X
X
XY
X Y
Y
Note the similarity with the probability case.
The numerator (the covariance) is a statistic of joint variation.
The denominators (the variances) are statistics of marginal variation. These
marginal statistics are not the same.
13(c) Stephen Senn
The difference is in the denominator
The numerator is the same
17. Regression to the Mean
A Simulated Example
• Diastolic blood pressure (DBP)
• Mean 90mmHg
• Between patient variance 50mmHg2
• Within patient variance 15 mmHg2
• Boundary for hypertensive 95 mmHg
• Simulation of 1000 patients whose DBP at baseline
and outcome are shown
• Blue consistent normotensive
• Red Consistent hypertensive
• Orange hypertensive/normotensive or vice versa
17(c) Stephen Senn
20. (c) Stephen Senn 20
Mean at baseline and
outcome are the same
Mean at outcome is
lower than at baseline
All patients are hypertensive
at baseline
Many are not at outcome
If you know why the title of this talk is extremely stupid, then you clearly know something about control, data and reasoning: in short, you have most of what it takes to be a statistician. If you have studied statistics then you will also know that a large amount of anything, and this includes successful careers, is luck.
In this talk I shall try share some of my experiences of being a statistician in the hope that it will help you make the most of whatever luck life throws you, In so doing, I shall try my best to overcome the distorting influence of that easiest of sciences hindsight. Without giving too much away, I shall be recommending that you read, listen, think, calculate, understand, communicate, and do. I shall give you some example of what I think works and what I think doesn’t
In all of this you should never forget the power of negativity and also the joy of being able to wake up every day and say to yourself ‘I love the small of data in the morning’.
30 minutes presentation plus 5 minutes questions
This example is covered in chapter 4 of
Senn, S. J. (2003). Dicing with Death. Cambridge: Cambridge University Press.
See
Senn, S. J. (2013). Invalid inversion. Significance, 10(2), 40-42
Since we are calculating the probability of having breast cancer given that someone is female, we condition on being ‘female’. We thus strike out the column ‘male’ as being irrelevant.
The probability we require is the joint frequency ‘breast cancer’ and ‘female’ divide by the relevant marginal frequency ‘female’
Since we are calculating the probability of being female given that someone suffering from breast cancer, we condition on suffering from breast cancer ’. We thus strike out the column ‘not suffering from breast cancer ’ as being irrelevant.
The probability we require is the joint frequency ‘breast cancer’ and ‘female’ divide by the relevant marginal frequency ‘suffering from breast cancer ’
Extract of GenStat program
"To simulate regression to the mean"
"This version used to try and reproduce the numbers selected (285)in original version
of Significance paper"
"Set parameters"
SCALAR NSIM,mean,betvar,withvar,cut,lower,upper;VALUE=1000,90,50,15,95,60,120
TEXT xlabel,ylabel,title; VALUES='DBP at Baseline (mmHg)','DBP at Outcome (mmHg)','Diastolic blood pressure'
"Begin simulation"
FOR [NTIMES=1000]
GRANDOM [DISTRIBUTION=Normal; NVALUES=NSIM; SEED=0; MEAN=mean; VARIANCE=betvar] True
GRANDOM [DISTRIBUTION=Normal; NVALUES=NSIM; SEED=0; MEAN=0; VARIANCE=withvar] E1
CALCULATE X=True+E1
CALCULATE HBase=X>=cut
CALCULATE Check=SUM(HBase)
IF Check.EQ.285
PRINT Check; DECIMALS=0
EXIT [CONTROL=for]
ENDIF
ENDFOR
VARIATE [NVALUES=2]Xline1,Xline2,Xline3,Yline1,Yline2,Yline3
CALCULATE Xline1=cut
CALCULATE Yline1$[1],Yline1$[2]=lower,upper
CALCULATE Xline2$[1],Xline2$[2]=lower,upper
CALCULATE Yline2=cut
CALCULATE Xline3$[1],Xline3$[2]=lower, upper
CALCULATE Yline3$[1],Yline3$[2]=lower, upper
See
Senn, S. J. (2009). Three things every medical writer should know about statistics. The Write Stuff, 18(3), 159-162
These are prime numbers. The minimum block size is thus the product of them all and that is 105.
By the time the last patient has completed period two (say) many of the patients will have completed the whole trial.
The way to run such a trial is as an add-on trial. All patients receive the current therapy as standard and they receive either placebo or the new treatment in addition. Trials of HIV infection were often of this sort and (correctly) described as placebo controlled.
Senn, S. J. (2001). The Misunderstood Placebo. Applied Clinical Trials, 10(5), 40-46
See
Senn, S. J. (1998). Mathematics: governess or handmaiden? Journal of the Royal Statistical Society Series D-The Statistician, 47(2), 251-259