Talk at the 3rd European Conference on Argumentation
ABSTRACT: Specialized fields may at any time invent new inference rules—that is, new warrants—to improve on their stock of resources for drawing and defending conclusions. Yet disagreement over the acceptability of an invented warrant can always be re-opened. Randomized Clinical Trial is widely regarded as the gold standard for making inferences about causal relationships between medical treatments and patient outcomes. Once controversial, RCT achieved broad acceptance within the field as a result of warrant-establishing arguments circulating in the medical literature starting in the 1950s. And RCT has accumulated a very impressive track record of generating new conclusions that withstand critical scrutiny.
Here we look at two emerging innovations whose purpose is to support reasoning about health, offering ways to generate different classes of conclusions. These innovations could be seen as complementary to RCTs, but for both there are also hints of challenge to the enormous prestige of RCTs. We see this most particularly in the gap that has developed between the RCT-generated fact base and the decisions doctors and health policy officials have to make about treatments for patients. We’ve mentioned before that specialized inference methods that become stabilized within an expert community can meet unexpected challenges when they become components of reasoning by other communities. The two innovations considered here each allow us to explore the tensions that arise from the contrasting perspectives of scientists, clinicians, and patients.
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
Beyond Randomized Clinical Trials: emerging innovations in reasoning about health--ECA2019--2019-06-25
1. Beyond Randomized Clinical
Trials: Emerging Innovations in
Reasoning About Health
Jodi Schneider and Sally Jackson
3rd European Conference on Argumentation – ECA 2019
Groningen, The Netherlands
2019-06-25
2. Claim: Patient P should be given
a 10-day course of penicillin.
Data: Patient P appears to have
streptococcal pharnyngitis.
.
What warrants a clinician’s choice of
treatment for a patient with a given ailment?
3. Claim: Patient P should be given
a 10-day course of penicillin.
Data: Patient P appears to have
streptococcal pharnyngitis.
.
Warrant:
“A 10-day course of penicillin is the
treatment of choice for streptococcal
pharyngitis.” (Jenicek & Hitchcock, p.
35)
4. Claim: “A 10-day course of
penicillin is the treatment of
choice for streptococcal
pharyngitis.” (Jenicek &
Hitchcock, p. 35)
Data: Observations of some
number of patients given
alternative treatments for
streptococcal pharnyngitis.
.
Warrant: Randomized Clinical Trial
“Positive results of well-designed and
well-executed randomized clinical
trials prove causal effectiveness.”
(Jenicek & Hitchcock, p. 35)
5. Claim: Patient P should be given
a 10-day course of penicillin.
Data: Patient P appears to have
streptococcal pharnyngitis; and
penicillin is the treatment of
choice (best supported).
.
Warrant:
Use treatments that are supported by
high quality (RCT-based) evidence of
effectiveness.
6. Claim: Patient P should be given
a 10-day course of penicillin.
Data: Patient P appears to have
streptococcal pharnyngitis; and
penicillin is the treatment of
choice (best supported).
.
Warrant:
Use treatments that are supported by
high quality (RCT-based) evidence of
effectiveness.
CHALLENGES TO THE WARRANT
8. simple randomized clinical trial
patients & providers
recruited
Treatment B
Treatment A measurements
measurements
random allocation
blinding, other controls
monitoring
which group did better?
statistical
comparison
protocol approved
From Schneider & Jackson ISSA 2018
9. Phase I. Tests a new biomedical intervention in a small
group of people (e.g. 20-80) for the first time to determine
efficacy and evaluate safety (e.g., determine a safe
dosage range and identify side effects).
Phase II. Study the biomedical or behavioral intervention
in a larger group of people (several hundred) to determine
efficacy and further evaluate safety.
Phase III. Study to determine efficacy of the biomedical or
behavioral intervention in large groups of people (from
several hundred to several thousand) by comparing the
intervention to other standard or experimental
interventions as well as to monitor adverse effects, and to
collect information that will allow the interventions to be
used safely.
Phase IV. Studies conducted after the intervention has
been marketed. These studies are designed to monitor
the effectiveness of the approved intervention in the
general population and to collect information about any
Drug D can be safely given to
healthy people at dose level L.
For a certain class of patients,
Drug D given under controlled
conditions has “efficacy.”
Drug D is beneficial/harmful on
average; Drug D1 is better than
Drug D2.
Drug D performs well/poorly
for the patient population
under clinical conditions.
NIH Definitions Illustrative Factual Claims
11. Treatment T is good (on average)
for relief of condition C
Synthesis of available evidence
from RCTs support the efficacy of
Treatment T
12. Means/End Premise:
Treatment T is good (on average)
for relief of condition C
Synthesis of available evidence
from RCTs support the efficacy of
Treatment T
Conclusion:
Treatment T should
be given to Patient P
13. Circumstance:
Patient P has
Condition C
Means-End:
Treatment T is able to provide
relief of condition C
Conclusion:
Treatment T should
be given to Patient P
Goal:
Patient P seeks
relief from
Condition C
Synthesis of available evidence
from RCTs support the efficacy of
Treatment T
14. Circumstance:
Patient P has
Condition C
Means-End:
Treatment T is able to provide
relief of condition C
Conclusion:
Treatment T should
be given to Patient P
Goal:
Patient P seeks
relief from
Condition C
Synthesis of available evidence
from RCTs support the efficacy of
Treatment T
16. Pragmatic Trials
Schwartz & Lellouch (1967): Explanatory and pragmatic attitudes in
therapeutic trials.
• Explanatory: aim is to rigorously establish a causal relationship.
• Pragmatic: aim is to support therapeutic choice.
A good explanation does not always provide adequate support for
therapeutic choice.
17. Pragmatic Trials
Trials designed with a pragmatic attitude differ from those designed
with an explanatory attitude in several ways:
1. Choice of comparison groups.
18. Pragmatic Trials
Trials designed with a pragmatic attitude differ from those designed
with an explanatory attitude in several ways:
1. Choice of comparison groups.
2. Specification of treatments.
19. Pragmatic Trials
Trials designed with a pragmatic attitude differ from those designed
with an explanatory attitude in several ways:
1. Choice of comparison groups.
2. Specification of treatments.
3. Choice of outcome measures.
20. Pragmatic Trials
Early lack of uptake for Schwartz & Lellouch:
• Explanatory attitude defined scientific excellence
• Pragmatic attitude too lacking in specificity and control
2009 republication of Schwartz and Lellouch—a late emerging
recognition that explanatory trials stop short of what a clinician needs
21. Pragmatic Trials
Van der Velden et al. (2019): When Oncologic Treatment Options
Outpace the Existing Evidence: Contributing Factors and a Path Forward
22. Circumstance:
Patient P has
Condition C
Means-End:
Treatment T1 is able to provide
relief of condition C
Conclusion:
Treatment T1 should
be given to Patient P
Goal:
Patient P seeks
relief from
Condition C
Synthesis of available evidence
from RCTs support the efficacy of
Treatment T
When options outrun evidence.
The same argument could yield multiple
contradictory conclusions if multiple
treatments are known to have efficacy.
• Treatment T1 has efficacy
• Treatment T2 has efficacy
• Treatment T3 has efficacy
23. Circumstance:
Patient P has
Condition C
Means-End:
Any of Treatments T1, T2, or T3
might be most effective for P
Conclusion:
Patient P should be invited
into a pragmatic trial
Goal:
Patient P seeks
relief from
Condition C
Synthesis of available evidence
from RCTs support the efficacy of
each treatment
When options outrun evidence.
The same argument could yield multiple
contradictory conclusions if multiple
treatments are known to have efficacy.
• Treatment T1 has efficacy
• Treatment T2 has efficacy
• Treatment T3 has efficacy
24. Pragmatic Trials
Van der Velden et al. (2019): When Oncologic Treatment Options
Outpace the Existing Evidence: Contributing Factors and a Path Forward
Action Items:
• Researchers and funders should increase support for pragmatic studies that
can be conducted in routine clinical care settings.
• Researchers and funders should prioritize pragmatic trials that are informed
by broad stakeholder input, including providers, patients, and their families.
25. Pragmatic Trials
Van Staa et al. (2012): Pragmatic randomised trials using routine
electronic health records: putting them to the test
“A revolution is long overdue in the technical and research governance
frameworks for testing widely used interventions whose relative merits
are unknown. Narrowly restricted studies with questionable external
validity need not be the norm.”
27. Circumstance:
Patient P has
Condition C
Means-End:
Treatment T is able to provide
relief of condition C
Conclusion:
Treatment T should
be given to Patient P
Goal:
Patient P seeks
relief from
Condition C
Synthesis of available evidence
from RCTs support the efficacy of
Treatment T
28. Circumstance:
Patient P has
Condition C
Means-End:
Treatment T is able to provide
relief of condition C
Conclusion:
Treatment T should
be given to Patient P
Goal:
Patient P seeks
relief from
Condition C
“Average” benefit does not assure
individual benefit.
Treatment effects often vary from person
to person; something beneficial on
average may fail for some.
29. N-of-1 Trials
N-of-1: An experiment performed on a single subject.
Replicated N-of-1: The same experiment performed multiple times on
different people.
Defining characteristic: designed to draw conclusions about treatment
effects for each individual subject. Each subject serves as “their own
control.”
31. N-of-1 Trials
"n-of-1 trials can blur the boundaries between clinical practice and
clinical research, making research more like practice and practice more
like research. Making research more like practice is desirable to
increase the relevance and generalizability of clinical research findings.
On the other hand, making practice more like research will create
opportunities for developing the clinical evidence base by enhancing
systematic data collection on the comparative effectiveness of
treatments by real health care professionals treating real patients."
[Kravitz book, pages 7-8].
32. Circumstance:
Patient P has
Condition C
Means-End:
Each of Treatments T1, T2, or T3
is effective on average
Conclusion:
Patient P should conduct an
N-of-1 trial
Goal:
Patient P seeks
relief from
Condition C
Synthesis of available evidence
from RCTs support the efficacy of
each treatment
33. N-of-1 Trials
Guyatt G, Rennie D, Meade MO, Cook DJ. (2015). Users' Guides to the Medical Literature:
A Manual for Evidence-Based Clinical Practice, 3rd ed. McGraw-Hill Companies.
35. Data Claim
Warranting rule
Dependable because
backed by:
Material
assurances
Procedural
assurances
Institutional
assurances
Source of figure: Jackson & Schneider 2018. Cochrane Review as a "Warranting Device" for Reasoning About Health.
Warranting Devices
36.
37. Circumstance:
Patient P has Condition C
Many treatments for C have some
form of support; some without
enough support look promising
Means-End:
Enrolling patients in pragmatic
trials fills gaps in knowledge
while providing good care
Goal:
Patient P seeks
relief from
Condition C
Conclusion:
Patient P should be asked to
enroll in a pragmatic trial
T1-3 supported by RCT 0 or RCT I
T4 supported by RCT II or RCT III
T5 supported by large-scale observational study
T6-7 supported by anecdote
T8-9 supported by traditional practice
T10 supported only conjecturally
Editor's Notes
ABSTRACT: Specialized fields may at any time invent new inference rules—that is, new warrants—to improve on their stock of resources for drawing and defending conclusions. Yet disagreement over the acceptability of an invented warrant can always be re-opened. Randomized Clinical Trial is widely regarded as the gold standard for making inferences about causal relationships between medical treatments and patient outcomes. Once controversial, RCT achieved broad acceptance within the field as a result of warrant-establishing arguments circulating in the medical literature starting in the 1950s. And RCT has accumulated a very impressive track record of generating new conclusions that withstand critical scrutiny.
Here we look at two emerging innovations whose purpose is to support reasoning about health, offering ways to generate different classes of conclusions. These innovations could be seen as complementary to RCTs, but for both there are also hints of challenge to the enormous prestige of RCTs. We see this most particularly in the gap that has developed between the RCT-generated fact base and the decisions doctors and health policy officials have to make about treatments for patients. We’ve mentioned before that specialized inference methods that become stabilized within an expert community can meet unexpected challenges when they become components of reasoning by other communities. The two innovations considered here each allow us to explore the tensions that arise from the contrasting perspectives of scientists, clinicians, and patients.
A practical question, for treating patients, is What justifies the choice of treatment?
There has been huge progress over past 100 years in the resources available. Not just accumulation of facts but also emergence of new methods of inference.
Jenicek & Hitchcock suggested that a conclusion of this kind is warranted by a generalization like this one—and that this warrant itself has been previously established by use of another, more general warrant.
Here, the warrant from the previous slide is now a claim.
Jenicek and Hitchcock argue that the warrant for the clinical reasoning about treating strep infections is generated by a very flexible inference method known as a Randomized Clinical Trial, or RCT. Our own prior work on RCTs focuses on how such inference rules are invented, then defended as better than whatever inference methods had been used before, and then fine-tuned in practice as problems are exposed. We call these inventions “warranting devices,” to acknowledge that they include not only the inference rule itself, but also material, procedural, and institutional assurances that become conditions on the use of the rule.
By contrast, in the Jenicek & Hitchcock version of this diagram, the RCT is both a warrant and a warrant generator. Sally & I don’t like the idea that the specific empirical generalization is considered a warrant, so we’ll propose a different way of thinking about this, but otherwise, we are quite aligned with Jenicek and Hitchcock in believing that RCT is in fact an inference rule applied to observations to arrive at causal claims. Many other fields have their own versions of such rules, including psychology and communication.
In our version of this diagram, there is a small but important change: First, What is known about the treatment (penicillin) is included here as part of the data. Second, there is a more general warrant focusing on the relative quality of methods for generating inference relevant to clinical care: “Use treatments that are supported by high quality (RCT-based) evidence of effectiveness.”. We think this is truer to the idea of a warrant as an inference license, and certainly truer to the distinction commonly made now in computation argumentation between information nodes and inference nodes.
This diagram helps us to focus on whether it is reasonable in the abstract to choose treatments based on what evidence there may happen to be from RCTs.
RCT is widely regarded as the gold standard for making inferences about causal relationships between medical treatments and patient outcomes.
But patients and clinicians are starting to demand better evidence than is provided by RCTs, and they are starting to question the general idea of finding one preferred treatment for all kinds of patients. The challenges to the warrant as shown here do not have to do with whether RCTs produce valid causal claims. They have to do with how this inference rule works out in actual clinical decision-making. And the literature on these challenges is found in discussion of the two innovations we are discussing: pragmatic trials, and N-of-1 trials.
To understand what limitations there might be on knowledge produced by RCTs, we need to understand quite concretely how medical research is done. The logic of RCT is easily appreciated: The experimenter sets up conditions to try to isolate one possible cause of a response while creating equivalency on all other possible causes. The only part of this that is not completely obvious is the part contributed by statistical methods for deciding whether an observed difference is adequate evidence of treatment effect.
While the logic of the RCT is quite easy to understand, conducting an RCT has become a highly regulated affair that makes it impossible for anyone acting outside complex institutional environments. For example, no one can enroll patients in an experiment of this kind without getting a protocol approved at multiple levels. For a decade or more, researchers have had to register their trials before beginning to recruit, and gradually it has become common for protocols to be published even before any results have been obtained (so that the community can know what things are being tried). And we don’t just conduct one RCT and then start trying to apply its findings to practice. For drug treatments, a long sequence of trials is required, conducted in phases that take years to complete, with each phase producing a different kind of fact.
In the US, many types of medical research must occur in phases that carefully protect the people who will be involved in the research. Anything labelled clinical research means research on human subjects. Approval to conduct research on human subjects may require prior evidence of safety from animal testing or other laboratory methods.
Phase I trials typically aim to determine whether a safe form of the proposed treatment can be found. They may compare dose levels, for example. And they usually don’t give any evidence of the ability of the treatment to help the patient population—because they use healthy volunteers, not sick people.
Phase II trials try to show that the treatment actually does something for people with a relevant condition. But they often restrict participation to people with that condition, and no others, to create the greatest possible clarity in interpretation of the results. Efficacy here means that under well-controlled conditions and the best possible patients, the treatment seems to work.
Phase III trials require more subjects and may allow for a much broader demonstration of effectiveness.
Drugs that have passed Phase III may apply for FDA approval, and that’s the meaning of what you see in Phase IV about studies conducted after marketing. The main improvement in evidence provided by Phase IV is that observations are made under ordinary clinical conditions, with all kinds of uncontrolled variations.
It takes a very long time to get through all of the work, and all of the bureaucracy, associated with clinical trialing of treatments, and at any point in time, the kind of claims that are actually warranted vary by which phase has or has not been completed.
Something very important to notice is that regardless of what we know about a drug, a doctor can’t give it to a patient if it hasn’t been approved and marketed.
To look more deeply at the clinician’s reasoning, we’d like to shift from the Toulmin layout to a schematic view of the clinician’s reasoning. We think that level of reasoning is well represented as practical reasoning. We want to get from what is known about available treatments for a condition to a decision about what to do with a patient.
The practical reasoning scheme generates a course of action to achieve a goal under some set of conditions, and its key component is a premise connecting the action to the goal. The kinds of claims generated by RCTs are very suitable as means/end premises.
We add that the patient has the condition and wants it treated, and we now have a complete practical argument for using Treatment T.
This view is useful in helping us to understand two sets of proposals that are gaining momentum in clinical research. Here we highlight that what the clinician wants from research is knowledge about what will happen to a patient under each possible treatment option. And what people are starting to notice is that conventional RCTs, even whole piles of them, may provide only a very weak form of means-end premise. Overall, what is known is likely to be that Treatment T has had a positive average effect under the most favorable conditions possible for observing the effect.
A quick preview of the kinds of issues that are surfacing:
Treatment T may not be the ONLY one that’s good on average—options may outrun evidence. A decision to use T would not be any better justified than a decision to use something else with similar factual support.
There may be no evidence at all for how patients like P react to T. P may have comorbidities that would have made P ineligible for any trial, for example.
3. Even if Treatment T is good on average for patients like P, there may be no way to know where P’s benefit will be near the average—or much below it. The clinician may have no basis for guessing how likely it is that patient P will benefit.
4. And of course, other factors may make T undesirable; for example, it may not be covered by P’s insurance, or may not be something P’s care provider can administer.
We turn now to Pragmatic Trials, and then later, to N-of-1 trials, to see what sorts of innovative strategies are emerging in response to these concerns.
The term pragmatic trial comes from a 50-year-old article by Schwartz and Lellouch. They were arguing that medical research should be designed with therapeutic choice in mind. By contrast, most clinical research was being shaped by explanatory aims, that short of what clinicians would need in order to make good decisions for their patients. They identified several procedural differences that follow from this difference in aims.
Comparison groups. That comparison groups should be formed at random from a common pool is not disputed by Schwartz and Lellouch. Their concerns are with how the common pool is developed, and with what happens when individuals from this common pool drop out after random assignment to a treatment. They argue that in such cases, statistical analysis may be conducted either on the premise that the dropouts are simply people for whom the treatment was unsuitable (that is, people who have nothing to tell us about the potential efficacy of the treatment), or on the premise that the treatment is problematic in some way (by virtue of failing for some of those it aims to benefit). As they put it, “in the first [explanatory] case the class of patient is defined to fit the predetermined treatments, while in the second [pragmatic] the treatments are defined to fit the predetermined class of patients” (p. 643).
Treatments. When two proposed treatments are to be compared, it will normally be the case that each considered individually is a complex assembly of components, including the form in which the treatment would most conveniently be administered, the time over which it would typically be administered, the setting in which it would ideally be administered, and much more. The explanatory attitude strives toward a contrast in which as many of these components as possible are equalized between the treatments to be compared, while a pragmatic attitude strives for a contrast between the optimal arrangement for each of the treatments. Conducting the comparison between two (artificially) equalized treatments invites the possibility that neither treatment works up to its potential. Conducting the comparison between two optimized treatments allows for all manner of confusion over exactly what makes the better of the two treatments better.
Outcomes. Schwartz and Lellouch point out that a pragmatic attitude prefers outcome measures that are close to what a patient and clinician are trying to accomplish with a course of treatment: a feeling of well-being, a remission of pain, a return to normal activity, an extension of life, or something similar. Some of these outcomes are inconvenient or unethical in research, and others (anything involving patient self-assessment) have known validity problems. All sorts of surrogate measures based on blood samples, biopsies, or various kinds of scans provide more convenient endpoints for trials conducted with an explanatory attitude, and they also look like “harder” evidence of effects. An explanatory attitude toward testing statins can use blood cholesterol levels as evidence that the statin affected something known to correlate with heart health; a pragmatic attitude toward testing statins would want evidence that they extend life or improve its quality. That is not guaranteed by change in the correlated variable unless that variable is known to be on the causal path to heart health.
The medical research community did not really take up the proposals Schwartz and Lellouch made, largely ignoring the article for a decade or two. RCT by then had become virtually synonymous with the explanatory attitude, and the kinds of design choices associated with the explanatory attitude were increasingly associated with scientific rigor and with excellence in research performance. And increasingly, people expected that conventional RCTs would yield the kind of knowledge that would guide clinical practice, even though the studies had not been designed around the questions that arise in practice.
The arguments that have developed more recently suggest that research done with an explanatory attitude may never provide what clinicians need. For example, all of the available research may have been conducted on a restricted class of patients, and the clinician might have a patient who would positively have been excluded. Very commonly, people are excluded for having comorbid conditions. So the clinician may be unable to determine whether a treatment that has worked well in highly controlled conditions will actually work for the patient at hand. And what should a clinician do if a half dozen treatments have shown efficacy under various experimental conditions? How does the clinician decide what to choose? We located many articles from many different medical specialties complaining of just these sorts of problems.
So even though everyone agrees that conventional RCTs do a good job of generating factual conclusions, they don’t always generate the “right” facts for the eventual practical purpose.
It’s important to know that the energy behind pragmatic trials is coming from practical dilemmas faced by doctors and patients. As one cross-disciplinary team of cancer specialists put it, their treatment options outpace the evidence available to choose among them.
So the clinician’s situation looks like this. Each of several treatments has research evidence backing its efficacy, but none has direct evidence of effectiveness for patients like P.
And the solution here COULD be to try to find a non-scientific way of choosing among the 3 options, but what van der Velden et al. argue is that this situation calls for creating the kind of knowledge that is needed by integrating research into clinical practice.
The conclusion does not have to be a choice of one treatment; it can be a decision to enter a process in which a treatment will be assigned, in order to build evidence about how each treatment performs with a wide range of diverse patients under a very wide range of clinical conditions.
Van der Velden et al. argue for incorporation of research into all routine clinical practice. There are all kinds of obstacles to this that need solution, but the idea is straightforward: To assure that research is designed to answer practical needs, situate it within the practice it is supposed to support.
Here are two action items from the “path forward” they advocate:
Researchers and funders should increase support for pragmatic studies that can be conducted in routine clinical care settings.
Researchers and funders should prioritize pragmatic trials that are informed by broad stakeholder input, including providers, patients, and their families.
Schwartz and Lellouch were arguing for something a bit more radical than using pragmatic trials to build on a base of findings from conventional clinical trials. And there are proponents of this more radical view.
Van Staa et al. (2012) proposed launching pragmatic trials around unanswered questions of clinical practice, whether or not there is existing research from clinical trials—when there is no evidence, and maybe no options that have been investigated at all. Their proposal leverages the rise of electronic health records to identify prospective enrollees into pragmatic trials, and one intriguing thing about their idea is that what is lost in rigorous control may be gained back in sheer size of the patient population that can be included in a trial.
They point out something almost invisible outside the research community: the degree to which the search for knowledge is regulated. Innovation in reasoning about health will nearly always include innovating in governance as well as in inference.
So pragmatic trials may contribute to solving the problem of options outrunning evidence, but are likely to be even more important for cases where no options at all have made their way into phased trialing.
To get re-oriented, let’s go back to a diagram we showed earlier, meant to expose the research-based means-end premise as a source of problems in clinical reasoning. One important characteristic of conventional RCTs is that they produce claims about average treatment effects, and the average is often for a group that the current patient does not belong to. A positive average treatment effect tells us that if the treatment is given to many patients, on the average they will benefit. But that is far short of an assurance that any particular patient will benefit.
This little illustration shows why average benefit does not assure individual benefit. In technical terms, if there is a nonzero patient x treatment interaction, then the treatment benefit will differ from patient to patient. In a conventional RCT, where a patient is assigned to one condition or another, we do not have a way to assess what would have happened to that patient under the other treatment condition. Especially when there are multiple options for treatment, the question of which treatment is best for a given patient cannot be answered by looking at which treatment is best on average.
This is the insight at the heart of what are called N-of-1 trials. In experiments, N conventionally refers to the number of observation units. Conventional RCTs enroll many patients, so N will be some modest number like 20 in a Phase I trial, or a larger number like several hundred in a Phase II or Phase III trial, or a MUCH larger number like several thousand in a Phase IV trial.
An N-of-1 trial may be conducted for the sole purpose of choosing a course of treatment for one individual, but the findings can also be aggregated with results from other individuals. A replicated N-of-1 experiment might look a lot like a conventional RCT—except that it is designed to allow computation of an effect size for each individual, not for each group.
So how do you design to get a treatment effect size computed for each individual? N-of-1 trials involve rotating through experimental conditions, taking measurements over and over from the same person. They let one individual, or any number of individuals, rotate through alternatives, evaluating the result each time. So to decide which of two pain medications works best for a given individual, the person takes both, on different occasions, over and over, according to some schedule for deciding which medication to take on each occasion.
N-of-1 trials give the most direct evidence possible for what works best for the individual patient—at least when it is in fact possible for all options to be tried by the same patient.
Not every condition is suitable for comparative N-of-1 trials. They are best applied to chronic conditions that are relatively stable, where the treatment has a fast onset (and ideally a short half-life) [pallative care, p 473]. As presently conceived, N-of-1 trials are not suitable for areas such as surgery, where an irreversible treatment may be given, or critical care/emergency medicine, where a patient being stabilized cannot serve as their own control but rather should be compared with other patients receiving a different treatment.
Placeholder mage clipped from:
https://www.slideshare.net/Cochrane.Collaboration/ida-cochrane-future-sim/22
We like this observation from one of the major proponents of N-of-1 trials. Making practice more like research acknowledge what we don’t know, and in the case of N-of-1 trials, it lets the individual patient try all of the plausible treatments.
We should note that plenty of people are already conducting their own N-of-1 trials, but without the kind of infrastructure we will need if we want to be able to successfully aggregate large numbers of trials.
This should look familiar. Here, what is known about the treatment options is insufficient to justify any choice among them, but it is sufficient to justify a decision to compare them all at the individual level. And as with pragmatic trials, we should be expecting a lot of evolution in design ideas for exactly how to conduct these experiments, especially for cases where the patient cannot actually serve as his or her own control.
Some advocates of evidence-based practice see N-of-1 trials as the highest form of evidence—as the top of an evidence pyramid of individual study designs (UG p11) or as one of the highest forms of evidence on treatment benefits and treatment harms, alongside systematic reviews
In our prior work we’ve been focused on new inference methods—new ways to draw conclusions that are either better than old ways of drawing conclusions, or that allow us to draw entirely new kinds of conclusions. The central conceptual advance has been the idea of a warranting device—a proposed inference rule that generates conclusions whose quality is partly dependent on various kinds of assurances provided by the community that deploys the device. We aren’t prepared to say whether pragmatic trials and N-of-1 trials are new warranting devices, mainly because the work of building out these assurances has not been done—as it has been for RCTs and for Cochrane Reviews, the devices we’ve studied before.
What we have learned from this study is that a well-stabilized device, even one that has been as successful as RCT, will have limits that are exposed only in argumentative practice. RCT’s weakness is that it takes us only partway toward the practical purpose of choosing treatments for patients, and this is exposed in the gaps experienced by practitioners, as well as in various forms of overreach.
But even so, none of the arguments in favor of pragmatic trials or N-of-1 trials are arguments against RCTs. On the contrary, both are infused with the spirit of experimenting and committed to extending it further and faster. But as may be intuitively clear, both of these innovation share the potential to change the way we look at RCTs.
We observe in closing that both N-of-1 and pragmatic trials benefit from environmental conditions that did not exist when these proposals first appeared: an overall datafication in health, the rise of electronic health records, and the rise of data science. It is perhaps not surprising that strategies known by mid-century are only starting to seem really feasible now.
Something to chew on. We’ve seen that conventional RCTs provide very tenuous support for clinical decisions in all kinds of cases: when there are large numbers of possible treatments, each with its own support; when we know that each of several treatments can be effective but don’t know which is best for any given individual; when there simply isn’t any evidence on how a treatment will work when a patient has multiple conditions; and so on. We’ve been modeling knowledge about treatments as a kind of backing for the means-end premise in practical reasoning. But it seems equally reasonable to treat a characterization of the whole body of current knowledge as part of the circumstantial premise. (dotted arrows show the two possibilities)