SlideShare a Scribd company logo
1 of 48
The Replication Crises and its
Constructive Role in the
Philosophy of Statistics
Deborah G Mayo
November 3, 2018
What’s the constructive role of the
replication crisis?
• High profile failures of replication have resulted in
much soul-searching among statisticians
• Why do I say it has (or should have) a very
constructive role in philosophy of statistics?
2
What’s failed replication?
• Results found statistically significant are
not found significant by an independent
group, using new subjects, stricter
protocols and preregistration
3
Paradox of Replication
• Crisis of Replication: it’s too difficult to
replicate the small P-values others found
when we use preregistered protocols
• Leading to the complaint: It’s too easy to
get low P-values
4
That it’s too easy when you abuse or cheat teaches
a lot about:
I. Non-fallacious uses of statistical tests
II. Rationale for the role of probability in tests
III. How to reformulate tests
5
Most findings are false?
“Several methodologists have pointed out that the high
rate of nonreplication of research discoveries is a
consequence of the convenient, yet ill-founded strategy
of claiming conclusive research findings solely on the
basis of a single study assessed by formal statistical
significance, typically for a p-value less than 0.05.” …
It can be proven that most claimed research findings are
false.” (John Ioannidis 2005, 0696)
6
7
I. Non-fallacious tests
“[W]e need, not an isolated record, but a reliable
method of procedure. In relation to the test of
significance, we may say that a phenomenon is
experimentally demonstrable when we know how to
conduct an experiment which will rarely fail to give
us a statistically significant result.” (Fisher 1947, 14)
8
Fisher’s Simple Significance Test
“…to test the conformity of the particular data
under analysis with H0 in some respect:
…we find a function T = t(y) of the data, to be
called the test statistic, such that
• the larger the value of T the more inconsistent
are the data with H0;
• The random variable T = t(Y) has a
(numerically) known probability distribution
when H0 is true.
…the p-value corresponding to any t0bs as
p = p(t) = Pr(T ≥ t0bs; H0)”
(Mayo and Cox 2006, 81)
9
Testing Reasoning
• If even larger differences than t0bs occur fairly
frequently under H0 (i.e., P-value is not small),
there’s scarcely evidence of incompatibility
with H0
• Small P-value indicates some underlying
discrepancy from H0 because very probably you
would have seen a less impressive difference
than t0bs were H0 true.
• This still isn’t evidence of a genuine statistical
effect H1, let alone a scientific conclusion H*
Stat-Sub fallacy H => H*
10
Fallacy of rejection
• H* makes claims that haven’t been probed by the
statistical test
• The moves from experimental interventions to H*
don’t get enough attention–but your statistical
account should block it.
11
Neyman-Pearson (N-P) tests:
A null and alternative hypotheses H0, H1
that are exhaustive
H0: μ ≤ 0 vs. H1: μ > 0
• So this fallacy of rejection H1H* is impossible
• Rejecting H0 only indicates statistical alternatives
H1 (how discrepant from null)
12
Despite philosophical debates
between Fisher & N-P
• They both fall under tools for “appraising and
bounding the probabilities (under respective
hypotheses) of seriously misleading interpretations
of data” (Birnbaum 1970, 1033)–error probabilities
• I place all under the rubric of error statistics
• Confidence intervals, N-P and Fisherian tests,
resampling, randomization.
13
N-P and Fisher showed error
control is lost with selective
reporting
Sufficient finagling—cherry-picking, P-hacking,
significance seeking, multiple testing, look
elsewhere—may practically guarantee a preferred
claim H gets support, even if it’s unwarranted by
evidence
14
Minimal principle for evidence
If the test had little or no capability of finding
flaws with H (even if H is incorrect), then
agreement between data x0 and H provides
poor (or no) evidence for H
Such a test fails a minimal requirement for
evidence (severity principle)
• Holds outside of formal tests, to estimation,
prediction.
15
II. Key to revising roles of error
probabilities
• What bothers you with selective reporting,
cherry picking, stopping when the data look
good (biasing selection effects)?
• Not problems about long-runs—
16
We cannot say the case at hand has done a
good job of avoiding the sources of
misinterpreting data
21 Word Solution: Report Sampling
Plan in Methods Section
• Replication researchers (re)discovered that data-
dependent hypotheses are a major source of
spurious significance levels.
“We report how we determined our sample size, all
data exclusions (if any), all manipulations, and all
measures in the study.”
(Simmons, Nelson, and Simonsohn 2012, 4)
18
Fishing for significance
(nominal vs. actual)
Suppose that twenty sets of differences have
been examined, that one difference seems large
enough to test and that this difference turns out
to be ‘significant at the 5 percent level.’ ….The
actual level of significance is not 5 percent,
but 64 percent! (Selvin 1970, 104)
(Morrison & Henkel’s Significance Test controversy
1970!)
19
Spurious P-Value
• He reports: Such results would be difficult to
achieve under the assumption of H0
• When in fact such results are common under
the assumption of H0
• Calls for adjusting the P-value to reflect the
actual error probability
20
Yet some accounts of evidence object
“Two problems that plague frequentist inference:
multiple comparisons and multiple looks, or…data
dredging and peeking at the data. The frequentist
solution to both problems involves adjusting the P-
value…
But adjusting the measure of evidence because
of considerations that have nothing to do with
the data defies scientific sense” (Goodman 1999,
1010)
(To his credit, he’s open about this; heads the Meta-Research
Innovation Center at Stanford) 21
Likelihood Principle (LP)
A pivotal disagreement in the philosophy of statistics
wars:
In classical Bayesian and likelihoodist accounts, the
import of the data is via the ratios of likelihoods of
hypotheses
Pr(x0;H0)/Pr(x0;H1)
Condition on fixed data x0, hypotheses vary
22
Hacking (1965)
• “Law of Likelihood”: x support hypothesis H0
less well than H1 if,
Pr(x;H0) < Pr(x;H1)
(abandoned in 1980)
• “there always is such a rival hypothesis viz., that
things just had to turn out the way they actually
did” (Barnard 1972, 129).
23
Error Probability
• Pr(H0 is less well supported than H1 ; H0)
for some H1 or other
24
All error probabilities violate LP
(even without selection effects):
Sampling distributions, significance levels, power, all
depend on something more [than the likelihood
function]–something that is irrelevant in Bayesian
inference–namely the sample space
(Lindley 1971, 436)
The LP implies…the irrelevance of predesignation,
of whether a hypothesis was thought of beforehand
or was introduced to explain known effects
(Rosenkrantz 1977, 122)
25
How might intuitively unwarranted
inferences be blocked (without error
probabilities)?
Give a high prior probability to H0: no effect, in a
Bayesian analysis
26
Harold Jeffreys
“If mere improbability of the observations, given the
hypothesis, was the criterion, any hypothesis
whatever would be rejected. Everybody rejects the
conclusion” (Jeffreys 1939/1961, 385).
Add one of two things: error probabilities of the
method, or prior probabilities in the hypotheses
27
Problems with appealing to priors
to block inferences based on
selection effects
• It still wouldn’t show what researchers had
done wrong—battle of beliefs
• The believability of data-dredged hypotheses
is what makes them so seductive
• Additional source of flexibility, priors and
biasing selection effects
28
No help with our key problem
• How to distinguish the warrant for a single
hypothesis H with different methods
(e.g., one has biasing selection effects, another,
pre-registered results and precautions)?
• Since there’s a single H, its prior would be the
same
29
Criticisms of P-hackers lose force
• Wanting to promote an account that
downplays error probabilities, the researcher
deserving criticism is given a life-raft:
30
Bem’s “Feeling the Future” 2011:
ESP?
• Daryl Bem (2011): subjects do better than chance
at predicting the (erotic) picture shown in the
future
• Some locate the start of the Replication Crisis
With Bem
• Bem admits data dredging
• Bayesian critics resort to a default Bayesian prior
to (a point) null hypothesis
31
Bem’s Response
“Whenever the null hypothesis is sharply defined but
the prior distribution on the alternative hypothesis is
diffused over a wide range of values, as it is [here] it
boosts the probability that any observed data will be
higher under the null hypothesis than under the
alternative.
This is known as the Lindley-Jeffreys paradox: A
frequentist [can always] be contradicted by a
…Bayesian analysis that concludes that the same data
are more likely under the null.” (Bem et al. 2011, 717)
32
III Reformulate Tests: P-values don’t
give an effect size
Severity function: SEV(Test T, data x, claim C)
• Tests are reformulated in terms of a discrepancy γ
from H0
• Instead of a binary cut-off (significant or not) the
particular outcome is used to infer discrepancies
that are or are not warranted
33
1-sided Normal test:
H0: μ ≤ 0 vs. H1: μ > 0 (Let σ = 1 n = 100)
Reject H0 whenever M ≥ 2SE: M ≥ 0.2
M is the sample mean (significance level = .025)
Let M = .2, so I reject H0.
1SE = s/√n = .1
What can you infer?
34
Some ask: Does this mean I can infer μ = .3?
• Inferences not in terms of points, but μ > 0 + γ
• Do we have evidence for μ > .3?
No.
• 84% of the time, M would have been larger than it is
even if μ = .3: SEV(μ > .3) is low (.16)
Pr (M < .2; .3 ) = .16
35
Even inferring μ > .2 is lousy
SEV(μ > .2) = .5
36
Improves on confidence intervals
which inherit problems of N-P
tests
• We do not fix a single confidence level,
• The evidential warrant for different points
in any interval are distinguished
• Go beyond a “performance goal”
37
Quick sum-up
• Main source of hand-wringing stems from
biasing selection effects
• These alter error probabilities of methods
• They don’t alter evidence in accounts that
obey the Likelihood Principle
• To a follower of the LP, the error
statistician is considering “imaginary data”
and “intentions”
38
• To the severe tester, the LP precludes key way to
block spurious results:
What’s the value of preregistered reports?
It’s that your appraisal is altered once you consider
the probability that some hypotheses, stopping
point, …or other could have led to a false positive
• Constructive role of replication crisis:
Biasing selection effects impinge on error
probabilities
Error probabilities impinge on well-testedness
39
• Can block inferences without appeal to error
probabilities: background beliefs (probabilism)
• Gives a life-raft to the P-hacker and cherry
picker; puts blame in the wrong place
• Significance tests are a small part of error
statistics, need reformulation and a new
rationale
• Error probabilities used to assess how well-
probed claims are (probativism)
40
41
References
• Barnard, G. (1972). ‘The Logic of Statistical Inference (Review of “The Logic of
Statistical Inference” by Ian Hacking)’, British Journal for the Philosophy of Science
23(2), 123–32.
• Bem, J. 2011. “Feeling the Future: Experimental Evidence for Anomalous
Retroactive Influences on Cognition and Affect”, Journal of Personality and Social
Psychology 100(3), 407-425.
• Bem, J., Utts, J., and Johnson, W. 2011. “Must Psychologists Change the Way
They Analyze Their Data?”, Journal of Personality and Social Psychology 101(4),
716-719.
• Birnbaum, A. 1970. “Statistical Methods in Scientific Inference (letter to the
Editor).” Nature 225 (5237) (March 14): 1033.
• Fisher, R. A. 1947. The Design of Experiments 4th ed., Edinburgh: Oliver and Boyd.
• Goodman SN. 1999. “Toward evidence-based medical statistics. 2: The Bayes
factor,” Annals of Internal Medicine 1999; 130:1005 –1013.
• Hacking, I. (1965). Logic of Statistical Inference. Cambridge: Cambridge University
Press.
• Hacking, I. (1980). ‘The Theory of Probable Inference: Neyman, Peirce and
Braithwaite’, in Mellor, D. (ed.), Science, Belief and Behavior: Essays in Honour of
R. B. Braithwaite, Cambridge: Cambridge University Press, pp. 141–60.
• Ioannidis, J. (2005). “Why Most Published Research Findings are False”, PLoS
Medicine 2(8), 0696–0701.
• Jeffreys, H. ([1939]/ 1961). Theory of Probability. Oxford: Oxford University
Press.
42
• Lindley, D. V. 1971. “The Estimation of Many Parameters.” In Foundations of
Statistical Inference, edited by V. P. Godambe and D. A. Sprott, 435–455. Toronto:
Holt, Rinehart and Winston.
• Mayo, D. G. 1996. Error and the Growth of Experimental Knowledge. Science and
Its Conceptual Foundation. Chicago: University of Chicago Press.
• Mayo, D. G. 2018. Statistical Inference as Severe Testing: How to Get Beyond the
Statistics Wars, Cambridge: Cambridge University Press.
• Mayo, D. G. and Cox, D. R. (2006). "Frequentist Statistics as a Theory of Inductive
Inference” in Rojo, J. (ed.) The Second Erich L. Lehmann Symposium: Optimality,
2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical
Statistics: 247-275.
• Mayo, D. G., and A. Spanos. 2006. “Severe Testing as a Basic Concept in a
Neyman–Pearson Philosophy of Induction.” British Journal for the Philosophy of
Science 57 (2) (June 1): 323–357.
• Mayo, D. G., and A. Spanos. 2011. “Error Statistics.” In Philosophy of
Statistics, edited by Prasanta S. Bandyopadhyay and Malcolm R. Forster,
7:152–198. Handbook of the Philosophy of Science. The Netherlands:
Elsevier.
• Morrison, D. E., and R. E. Henkel, ed. 1970. The Significance Test
Controversy: A Reader. Chicago: Aldine De Gruyter.
• Pearson, E. S. & Neyman, J. (1930). “On the problem of two samples”, Joint
Statistical Papers by J. Neyman & E.S. Pearson, 99-115 (Berkeley: U. of
Calif. Press). First published in Bul. Acad. Pol.Sci. 73-96.
43
• Rosenkrantz, R. 1977. Inference, Method and Decision: Towards a Bayesian
Philosophy of Science. Dordrecht, The Netherlands: D. Reidel.
• Savage, L. J. 1962. The Foundations of Statistical Inference: A Discussion.
London: Methuen.
• Selvin, H. 1970. “A critique of tests of significance in survey research. In The
significance test controversy, edited by D. Morrison and R. Henkel, 94-106.
Chicago: Aldine De Gruyter.
• Simmons, J. Nelson, L. and Simonsohn, U. (2012). “A 21 Word Solution”,
Dialogue: The Official Newsletter of the Society for Personality and Social
Psychology, 26(2), 4–7.
• Wagenmakers, E-J., 2007. “A Practical Solution to the Pervasive Problems of P
values”, Psychonomic Bulletin & Review 14(5): 779-804.
44
SEV(μ > μ1) = Pr( M < .2; μ = .3 )
= Pr( Z < -1) = .16
Z = (.2 - .3)/.1 = -1
45
Severity for Test T+:
SEV(T+, d(x0), claim C)
Normal testing: H0: μ ≤ μ0 vs. H1: μ > μ0 known σ;
discrepancy parameter γ; μ1 = μ0 +γ; d0 = d(x0)
(observed value of test statistic) √n(M - μ0)/σ
SIR: (Severity Interpretation with low P-values)
• (a): (high): If there’s a very low probability that so
large a d0 would have resulted, if μ were no greater
than μ1, then d0 it indicates μ > μ1: SEV(μ > μ1) is
high.
• (b): (low) If there is a fairly high probability that d0
would have been larger than it is, even if μ = μ1, then
d0 is not a good indication μ > μ1: SEV(μ > μ1) is low.46
SIN: (Severity Interpretation for
Negative results)
• (a): (high) If there is a very high probability
that d0 would have been larger than it is, were
μ > μ1, then μ ≤ μ1 passes the test with high
severity: SEV(μ ≤ μ1) is high.
• (b): (low) If there is a low probability that d0
would have been larger than it is, even if μ >
μ1, then μ ≤ μ1 passes with low severity:
SEV(μ ≤ μ1) is low.
47
Jimmy Savage on the LP:
“According to Bayes' theorem,…. if y is the
datum of some other experiment, and if it
happens that P(x|µ) and P(y|µ) are
proportional functions of µ (that is,
constant multiples of each other), then
each of the two data x and y have exactly
the same thing to say about the values of
µ…” (Savage 1962, p. 17)
48

More Related Content

What's hot

Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma jemille6
 
Mayo: Day #2 slides
Mayo: Day #2 slidesMayo: Day #2 slides
Mayo: Day #2 slidesjemille6
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayojemille6
 
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"jemille6
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talkjemille6
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperChristian Robert
 
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...jemille6
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaperjemille6
 
D. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics WarsD. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics Warsjemille6
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)jemille6
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardChristian Robert
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2jemille6
 
Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1jemille6
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16jemille6
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
 
Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13jemille6
 
Senn repligate
Senn repligateSenn repligate
Senn repligatejemille6
 

What's hot (20)

Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma
 
Mayo: Day #2 slides
Mayo: Day #2 slidesMayo: Day #2 slides
Mayo: Day #2 slides
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayo
 
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paper
 
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
 
D. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics WarsD. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics Wars
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF Harvard
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2
 
Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1Phil 6334 Mayo slides Day 1
Phil 6334 Mayo slides Day 1
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
 
Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13
 
Mayod@psa 21(na)
Mayod@psa 21(na)Mayod@psa 21(na)
Mayod@psa 21(na)
 
Senn repligate
Senn repligateSenn repligate
Senn repligate
 

Similar to D. G. Mayo: The Replication Crises and its Constructive Role in the Philosophy of Statistics

P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learningjemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statisticsjemille6
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualtiesjemille6
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500jemille6
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualtiesjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)jemille6
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”jemille6
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)jemille6
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)jemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
hypothesis testing overview
hypothesis testing overviewhypothesis testing overview
hypothesis testing overviewi i
 
Hypothesis testing pdf bhavana.pdf
Hypothesis testing pdf bhavana.pdfHypothesis testing pdf bhavana.pdf
Hypothesis testing pdf bhavana.pdfSimhadri Bhavana
 
20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhdHimanshuSharma723273
 

Similar to D. G. Mayo: The Replication Crises and its Constructive Role in the Philosophy of Statistics (20)

P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learning
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statistics
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualties
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualties
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
hypothesis testing overview
hypothesis testing overviewhypothesis testing overview
hypothesis testing overview
 
Hypothesis testing pdf bhavana.pdf
Hypothesis testing pdf bhavana.pdfHypothesis testing pdf bhavana.pdf
Hypothesis testing pdf bhavana.pdf
 
20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd
 
Hypothesis Testing.pptx
Hypothesis Testing.pptxHypothesis Testing.pptx
Hypothesis Testing.pptx
 

More from jemille6

D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...jemille6
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...jemille6
 
D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides jemille6
 
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and BoundariesT. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundariesjemille6
 

More from jemille6 (20)

D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...
 
D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides
 
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and BoundariesT. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
 

Recently uploaded

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 

Recently uploaded (20)

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 

D. G. Mayo: The Replication Crises and its Constructive Role in the Philosophy of Statistics

  • 1. The Replication Crises and its Constructive Role in the Philosophy of Statistics Deborah G Mayo November 3, 2018
  • 2. What’s the constructive role of the replication crisis? • High profile failures of replication have resulted in much soul-searching among statisticians • Why do I say it has (or should have) a very constructive role in philosophy of statistics? 2
  • 3. What’s failed replication? • Results found statistically significant are not found significant by an independent group, using new subjects, stricter protocols and preregistration 3
  • 4. Paradox of Replication • Crisis of Replication: it’s too difficult to replicate the small P-values others found when we use preregistered protocols • Leading to the complaint: It’s too easy to get low P-values 4
  • 5. That it’s too easy when you abuse or cheat teaches a lot about: I. Non-fallacious uses of statistical tests II. Rationale for the role of probability in tests III. How to reformulate tests 5
  • 6. Most findings are false? “Several methodologists have pointed out that the high rate of nonreplication of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05.” … It can be proven that most claimed research findings are false.” (John Ioannidis 2005, 0696) 6
  • 7. 7
  • 8. I. Non-fallacious tests “[W]e need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result.” (Fisher 1947, 14) 8
  • 9. Fisher’s Simple Significance Test “…to test the conformity of the particular data under analysis with H0 in some respect: …we find a function T = t(y) of the data, to be called the test statistic, such that • the larger the value of T the more inconsistent are the data with H0; • The random variable T = t(Y) has a (numerically) known probability distribution when H0 is true. …the p-value corresponding to any t0bs as p = p(t) = Pr(T ≥ t0bs; H0)” (Mayo and Cox 2006, 81) 9
  • 10. Testing Reasoning • If even larger differences than t0bs occur fairly frequently under H0 (i.e., P-value is not small), there’s scarcely evidence of incompatibility with H0 • Small P-value indicates some underlying discrepancy from H0 because very probably you would have seen a less impressive difference than t0bs were H0 true. • This still isn’t evidence of a genuine statistical effect H1, let alone a scientific conclusion H* Stat-Sub fallacy H => H* 10
  • 11. Fallacy of rejection • H* makes claims that haven’t been probed by the statistical test • The moves from experimental interventions to H* don’t get enough attention–but your statistical account should block it. 11
  • 12. Neyman-Pearson (N-P) tests: A null and alternative hypotheses H0, H1 that are exhaustive H0: μ ≤ 0 vs. H1: μ > 0 • So this fallacy of rejection H1H* is impossible • Rejecting H0 only indicates statistical alternatives H1 (how discrepant from null) 12
  • 13. Despite philosophical debates between Fisher & N-P • They both fall under tools for “appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data” (Birnbaum 1970, 1033)–error probabilities • I place all under the rubric of error statistics • Confidence intervals, N-P and Fisherian tests, resampling, randomization. 13
  • 14. N-P and Fisher showed error control is lost with selective reporting Sufficient finagling—cherry-picking, P-hacking, significance seeking, multiple testing, look elsewhere—may practically guarantee a preferred claim H gets support, even if it’s unwarranted by evidence 14
  • 15. Minimal principle for evidence If the test had little or no capability of finding flaws with H (even if H is incorrect), then agreement between data x0 and H provides poor (or no) evidence for H Such a test fails a minimal requirement for evidence (severity principle) • Holds outside of formal tests, to estimation, prediction. 15
  • 16. II. Key to revising roles of error probabilities • What bothers you with selective reporting, cherry picking, stopping when the data look good (biasing selection effects)? • Not problems about long-runs— 16
  • 17. We cannot say the case at hand has done a good job of avoiding the sources of misinterpreting data
  • 18. 21 Word Solution: Report Sampling Plan in Methods Section • Replication researchers (re)discovered that data- dependent hypotheses are a major source of spurious significance levels. “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.” (Simmons, Nelson, and Simonsohn 2012, 4) 18
  • 19. Fishing for significance (nominal vs. actual) Suppose that twenty sets of differences have been examined, that one difference seems large enough to test and that this difference turns out to be ‘significant at the 5 percent level.’ ….The actual level of significance is not 5 percent, but 64 percent! (Selvin 1970, 104) (Morrison & Henkel’s Significance Test controversy 1970!) 19
  • 20. Spurious P-Value • He reports: Such results would be difficult to achieve under the assumption of H0 • When in fact such results are common under the assumption of H0 • Calls for adjusting the P-value to reflect the actual error probability 20
  • 21. Yet some accounts of evidence object “Two problems that plague frequentist inference: multiple comparisons and multiple looks, or…data dredging and peeking at the data. The frequentist solution to both problems involves adjusting the P- value… But adjusting the measure of evidence because of considerations that have nothing to do with the data defies scientific sense” (Goodman 1999, 1010) (To his credit, he’s open about this; heads the Meta-Research Innovation Center at Stanford) 21
  • 22. Likelihood Principle (LP) A pivotal disagreement in the philosophy of statistics wars: In classical Bayesian and likelihoodist accounts, the import of the data is via the ratios of likelihoods of hypotheses Pr(x0;H0)/Pr(x0;H1) Condition on fixed data x0, hypotheses vary 22
  • 23. Hacking (1965) • “Law of Likelihood”: x support hypothesis H0 less well than H1 if, Pr(x;H0) < Pr(x;H1) (abandoned in 1980) • “there always is such a rival hypothesis viz., that things just had to turn out the way they actually did” (Barnard 1972, 129). 23
  • 24. Error Probability • Pr(H0 is less well supported than H1 ; H0) for some H1 or other 24
  • 25. All error probabilities violate LP (even without selection effects): Sampling distributions, significance levels, power, all depend on something more [than the likelihood function]–something that is irrelevant in Bayesian inference–namely the sample space (Lindley 1971, 436) The LP implies…the irrelevance of predesignation, of whether a hypothesis was thought of beforehand or was introduced to explain known effects (Rosenkrantz 1977, 122) 25
  • 26. How might intuitively unwarranted inferences be blocked (without error probabilities)? Give a high prior probability to H0: no effect, in a Bayesian analysis 26
  • 27. Harold Jeffreys “If mere improbability of the observations, given the hypothesis, was the criterion, any hypothesis whatever would be rejected. Everybody rejects the conclusion” (Jeffreys 1939/1961, 385). Add one of two things: error probabilities of the method, or prior probabilities in the hypotheses 27
  • 28. Problems with appealing to priors to block inferences based on selection effects • It still wouldn’t show what researchers had done wrong—battle of beliefs • The believability of data-dredged hypotheses is what makes them so seductive • Additional source of flexibility, priors and biasing selection effects 28
  • 29. No help with our key problem • How to distinguish the warrant for a single hypothesis H with different methods (e.g., one has biasing selection effects, another, pre-registered results and precautions)? • Since there’s a single H, its prior would be the same 29
  • 30. Criticisms of P-hackers lose force • Wanting to promote an account that downplays error probabilities, the researcher deserving criticism is given a life-raft: 30
  • 31. Bem’s “Feeling the Future” 2011: ESP? • Daryl Bem (2011): subjects do better than chance at predicting the (erotic) picture shown in the future • Some locate the start of the Replication Crisis With Bem • Bem admits data dredging • Bayesian critics resort to a default Bayesian prior to (a point) null hypothesis 31
  • 32. Bem’s Response “Whenever the null hypothesis is sharply defined but the prior distribution on the alternative hypothesis is diffused over a wide range of values, as it is [here] it boosts the probability that any observed data will be higher under the null hypothesis than under the alternative. This is known as the Lindley-Jeffreys paradox: A frequentist [can always] be contradicted by a …Bayesian analysis that concludes that the same data are more likely under the null.” (Bem et al. 2011, 717) 32
  • 33. III Reformulate Tests: P-values don’t give an effect size Severity function: SEV(Test T, data x, claim C) • Tests are reformulated in terms of a discrepancy γ from H0 • Instead of a binary cut-off (significant or not) the particular outcome is used to infer discrepancies that are or are not warranted 33
  • 34. 1-sided Normal test: H0: μ ≤ 0 vs. H1: μ > 0 (Let σ = 1 n = 100) Reject H0 whenever M ≥ 2SE: M ≥ 0.2 M is the sample mean (significance level = .025) Let M = .2, so I reject H0. 1SE = s/√n = .1 What can you infer? 34
  • 35. Some ask: Does this mean I can infer μ = .3? • Inferences not in terms of points, but μ > 0 + γ • Do we have evidence for μ > .3? No. • 84% of the time, M would have been larger than it is even if μ = .3: SEV(μ > .3) is low (.16) Pr (M < .2; .3 ) = .16 35
  • 36. Even inferring μ > .2 is lousy SEV(μ > .2) = .5 36
  • 37. Improves on confidence intervals which inherit problems of N-P tests • We do not fix a single confidence level, • The evidential warrant for different points in any interval are distinguished • Go beyond a “performance goal” 37
  • 38. Quick sum-up • Main source of hand-wringing stems from biasing selection effects • These alter error probabilities of methods • They don’t alter evidence in accounts that obey the Likelihood Principle • To a follower of the LP, the error statistician is considering “imaginary data” and “intentions” 38
  • 39. • To the severe tester, the LP precludes key way to block spurious results: What’s the value of preregistered reports? It’s that your appraisal is altered once you consider the probability that some hypotheses, stopping point, …or other could have led to a false positive • Constructive role of replication crisis: Biasing selection effects impinge on error probabilities Error probabilities impinge on well-testedness 39
  • 40. • Can block inferences without appeal to error probabilities: background beliefs (probabilism) • Gives a life-raft to the P-hacker and cherry picker; puts blame in the wrong place • Significance tests are a small part of error statistics, need reformulation and a new rationale • Error probabilities used to assess how well- probed claims are (probativism) 40
  • 41. 41
  • 42. References • Barnard, G. (1972). ‘The Logic of Statistical Inference (Review of “The Logic of Statistical Inference” by Ian Hacking)’, British Journal for the Philosophy of Science 23(2), 123–32. • Bem, J. 2011. “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect”, Journal of Personality and Social Psychology 100(3), 407-425. • Bem, J., Utts, J., and Johnson, W. 2011. “Must Psychologists Change the Way They Analyze Their Data?”, Journal of Personality and Social Psychology 101(4), 716-719. • Birnbaum, A. 1970. “Statistical Methods in Scientific Inference (letter to the Editor).” Nature 225 (5237) (March 14): 1033. • Fisher, R. A. 1947. The Design of Experiments 4th ed., Edinburgh: Oliver and Boyd. • Goodman SN. 1999. “Toward evidence-based medical statistics. 2: The Bayes factor,” Annals of Internal Medicine 1999; 130:1005 –1013. • Hacking, I. (1965). Logic of Statistical Inference. Cambridge: Cambridge University Press. • Hacking, I. (1980). ‘The Theory of Probable Inference: Neyman, Peirce and Braithwaite’, in Mellor, D. (ed.), Science, Belief and Behavior: Essays in Honour of R. B. Braithwaite, Cambridge: Cambridge University Press, pp. 141–60. • Ioannidis, J. (2005). “Why Most Published Research Findings are False”, PLoS Medicine 2(8), 0696–0701. • Jeffreys, H. ([1939]/ 1961). Theory of Probability. Oxford: Oxford University Press. 42
  • 43. • Lindley, D. V. 1971. “The Estimation of Many Parameters.” In Foundations of Statistical Inference, edited by V. P. Godambe and D. A. Sprott, 435–455. Toronto: Holt, Rinehart and Winston. • Mayo, D. G. 1996. Error and the Growth of Experimental Knowledge. Science and Its Conceptual Foundation. Chicago: University of Chicago Press. • Mayo, D. G. 2018. Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, Cambridge: Cambridge University Press. • Mayo, D. G. and Cox, D. R. (2006). "Frequentist Statistics as a Theory of Inductive Inference” in Rojo, J. (ed.) The Second Erich L. Lehmann Symposium: Optimality, 2006, Lecture Notes-Monograph Series, Volume 49, Institute of Mathematical Statistics: 247-275. • Mayo, D. G., and A. Spanos. 2006. “Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction.” British Journal for the Philosophy of Science 57 (2) (June 1): 323–357. • Mayo, D. G., and A. Spanos. 2011. “Error Statistics.” In Philosophy of Statistics, edited by Prasanta S. Bandyopadhyay and Malcolm R. Forster, 7:152–198. Handbook of the Philosophy of Science. The Netherlands: Elsevier. • Morrison, D. E., and R. E. Henkel, ed. 1970. The Significance Test Controversy: A Reader. Chicago: Aldine De Gruyter. • Pearson, E. S. & Neyman, J. (1930). “On the problem of two samples”, Joint Statistical Papers by J. Neyman & E.S. Pearson, 99-115 (Berkeley: U. of Calif. Press). First published in Bul. Acad. Pol.Sci. 73-96. 43
  • 44. • Rosenkrantz, R. 1977. Inference, Method and Decision: Towards a Bayesian Philosophy of Science. Dordrecht, The Netherlands: D. Reidel. • Savage, L. J. 1962. The Foundations of Statistical Inference: A Discussion. London: Methuen. • Selvin, H. 1970. “A critique of tests of significance in survey research. In The significance test controversy, edited by D. Morrison and R. Henkel, 94-106. Chicago: Aldine De Gruyter. • Simmons, J. Nelson, L. and Simonsohn, U. (2012). “A 21 Word Solution”, Dialogue: The Official Newsletter of the Society for Personality and Social Psychology, 26(2), 4–7. • Wagenmakers, E-J., 2007. “A Practical Solution to the Pervasive Problems of P values”, Psychonomic Bulletin & Review 14(5): 779-804. 44
  • 45. SEV(μ > μ1) = Pr( M < .2; μ = .3 ) = Pr( Z < -1) = .16 Z = (.2 - .3)/.1 = -1 45
  • 46. Severity for Test T+: SEV(T+, d(x0), claim C) Normal testing: H0: μ ≤ μ0 vs. H1: μ > μ0 known σ; discrepancy parameter γ; μ1 = μ0 +γ; d0 = d(x0) (observed value of test statistic) √n(M - μ0)/σ SIR: (Severity Interpretation with low P-values) • (a): (high): If there’s a very low probability that so large a d0 would have resulted, if μ were no greater than μ1, then d0 it indicates μ > μ1: SEV(μ > μ1) is high. • (b): (low) If there is a fairly high probability that d0 would have been larger than it is, even if μ = μ1, then d0 is not a good indication μ > μ1: SEV(μ > μ1) is low.46
  • 47. SIN: (Severity Interpretation for Negative results) • (a): (high) If there is a very high probability that d0 would have been larger than it is, were μ > μ1, then μ ≤ μ1 passes the test with high severity: SEV(μ ≤ μ1) is high. • (b): (low) If there is a low probability that d0 would have been larger than it is, even if μ > μ1, then μ ≤ μ1 passes with low severity: SEV(μ ≤ μ1) is low. 47
  • 48. Jimmy Savage on the LP: “According to Bayes' theorem,…. if y is the datum of some other experiment, and if it happens that P(x|µ) and P(y|µ) are proportional functions of µ (that is, constant multiples of each other), then each of the two data x and y have exactly the same thing to say about the values of µ…” (Savage 1962, p. 17) 48