This document provides a summary of a presentation given by Nick van Terheyden on how IBM's Watson technology can be applied in healthcare settings. The presentation discusses how Watson can help clinicians access up-to-date medical information and make evidence-based decisions by analyzing large amounts of structured and unstructured data. It also outlines some potential use cases for Watson in areas like differential diagnosis, medication dosing, and treatment recommendations tailored to individual patients. However, the document notes that challenges remain around integrating Watson with existing healthcare systems and addressing concerns that technology may not be able to replace the human aspects of medical care.
Pipeline session speech and medical intelligence – revolutionizing the doctor...
Calling WatsonTM to Ward the Data Tsunami
1. Calling Watson™ to Ward
8 Stat
Nick van Terheyden, MD
Chief Medical Information Officer – Clinical Language Understanding
Nuance Communications Inc
Wednesday, February 2
9:45 - 10:45 AM
DISCLAIMER: The views and opinions expressed in this presentation are those of the author and do not
necessarily represent official policy or position of HIMSS.
Watson™ and DeepQA™ are trade names of IBM
3. Learning Objectives
• Recognize how technology can bring real-time knowledge
and the latest clinical developments to the clinicians‟
workflow.
• Define IBM‟s Watson™ - an insight into the DeepQA™
process, the complexities and details of the DeepQA™
challenge, and how these tools and techniques can be
applied in a clinical context.
• Summarize the progress to date on the development, and
implementation behind the scenes on Watson in healthcare.
• Demonstrate the data tsunami challenge faced in the
clinical settings and how artificial intelligence technology
like Watson™ can offer new means for rapid access to
critical, specific and highly relevant data with corresponding
links to underlying evidence.
• Identify an interim pathway for attendees to develop their
own concrete steps to create an information rich yet
physician friendly environment
Watson™ and DeepQA™ are trade names of IBM
4. Medicine used to be
simple, ineffective and
relatively safe.
Now it is complex, effective
and potentially dangerous
Sir Cyril Chantler, Kings Fund Chantler C. The role and education of doctors in the delivery of health
care.
Lancet 1999;353:1178-81u
5. Lifestyle defines „Group Health‟
60 % - 80%
of Group Health issues may be
preventable
– 58% Reduction in Diabetes – 60% Fewer Cardiac
with lifestyle modification Events
Hambrecht Circulation 2004;109:1371-78
Tuomilehto, 2001 NEJM 344(18): 1343-50
– 60% Less Cancer – 44% Reduction in total
De Lorgeril, Arch Int Med 1998;158:1181-87 mortality (NNT=16)
Lyon Heart Study, Circulation 1999;99:779-85
– 83% less Heart Disease – 45% Reduction in total
– 91% less Diabetes mortality (NNT=2.4)
Nurses Health Study, NEJM 2000;343:16-22, NEJM
2001;345:790-97 Indian Heart Study, BMJ 1992;304:1015-19
– 73% less CHD – 40% Mortality Reduction
GISSI-Prevenzione, Med.Diet AHA11/01: Marchioli
– 69% less Cancer
HALE Project. Knoops JAMA 2004;292:1433-1439 – 67% Mortality Reduction
Indo-Med Study, Lancet 2002;360:1455-61]
5
2009 Continua Health Alliance Brigitte Piniewski, MD
6. Modifiable Health
0 Age 25 65
Wellness
60-80% Lifestyle
Pre-Illness
Unpredictable Health
Predictable (Rules-based) Health
Illness
Death
6
2008 2009 Continua Health Alliance Brigitte Piniewski, MD 6
7. To put it another way…. Age
Wellness
0 25 65
Pre-Illness
Fun
No Fun
Illness
Death
7
2008 2009 Continua Health Alliance Brigitte Piniewski, MD 7
8. Preventive Medicine – A warning
Age
0 25 65
Wellness
$$$ $$$?
60-80% Lifestyle
Pre-Illness
Unpredictable Health
Predictable (Rules-based) Health
Illness
Death
8
2008 2009 Continua Health Alliance Brigitte Piniewski, MD 8
9. Challenge – Clinical Knowledge-Processing Burden
“Current medical
practice relies
heavily on the Knowledge processing requirement
unaided mind to
recall a great
amount of detailed
knowledge – a
process which, to
This gap
the detriment of all
injures patients
stakeholders, has
repeatedly been Knowledge processing capacity
shown unreliable”
Crane and Raymond
The Permanente Journal
Winter 2003 Volume 7 No.1
Kaiser Permanente Institute for
Health Policy
Years ago Today
Slide courtesy of Dr Mike Bainbridge
10. Information Overload – Big Data
• Watson™ can sift through 200 million pages in 3 secs
– Graphic/analogy
• Medical information doubling every 5 years
– Reference
• Brent James, MD, MStat, Chief Quality Officer, Intermountain
Health Care; subject of The New York Times article “If Health
Care is Going to Change, Dr. Brent James Will Lead the Way”
• http://www.nytimes.com/2009/11/08/magazine/08Healthcare-
t.html?pagewanted=all
• 1.8 zetabytes of information created this year –
majority of it unstructured – 57 Billion 32Gb iPods
(Source: IDC)
– That‟s enough information to fill 57 billion 32GB Apple
iPads (which could build a mountain of iPads 25 times
higher than Mt Fuji
11.
12.
13.
14. Time To Market
• Studies suggest that it takes an average of
17 years for research evidence to reach
clinical practice (it took 25 years for Beta
blockers Rx for heart patients) (1)
• It takes an estimated average of 17 years
for only 14% of new scientific discoveries to
enter day-to-day clinical practice (2)
• Roughly 5% of autopsies reveal lethal
diagnostic errors for which a correct
diagnosis coupled with treatment could
have averted death
1. Balas, E. A., & Boren, S. A. (2000). Yearbook of Medical Informatics: Managing Clinical Knowledge for Health Care Improvement. Stuttgart, Germany:
Schattauer Verlagsgesellschaft mbH
2. Westfall, J. M., Mold, J., & Fagnan, L. (2007). Practice-based research - "Blue Highways" on the NIH roadmap. JAMA, 297(4), p. 403.
3. Shojania, KG, Burton EC, McDonald KM, Goldman L Changes in rates of autopsy-detected diagnostic errors over time: a systematic review. JAMA.
2003;289(21):2849-22856
15. Current Rate of Use for Selected Procedures
Clinical Procedure Landmark Trial Current Rate of Use
Flu Vaccination 1968 (7) 55% (8)
Thrombolytic therapy 1971 (9) 20% (10)
Pneumococcal vaccination 1977 (11) 35.6% (8)
Diabetic eye exam 1981 (4) 38.4% (6)
Beta blockers after MI 1982 (12) 61.9% (6)
Mammography 1982 (13) 70.4% (6)
Cholesterol screening 1984 (14) 65% (15)
Fecal occult blood test 1986 (16) 17% (17)
Diabetic foot care 1983 (18) 20% (19)
1. Balas, E. A., & Boren, S. A. (2000). Yearbook of Medical Informatics: Managing Clinical Knowledge for Health Care
Improvement. Stuttgart, Germany: Schattauer Verlagsgesellschaft mbH
16. Reading to Keep up – Information Overload
• Today's experienced clinician needs close to 2 million pieces
of information to practice medicine
• Doctors subscribe to an average of seven journals
representing over 2,500 new articles each year, making it
literally impossible to keep up-to-date with the latest
information about diagnosis, prognosis and therapy
• Comparison of the time required for reading (for general
medicine, enough to examine 19 articles per day, 365 days
per year ) with the time available (well under an hour per
week by British medical consultants, even on self-reports ).
• Furthermore, the interpretation of patient data is difficult
and complicated, mainly because the required expert
knowledge in each of the many different medical fields is
enormous and the information available for the individual
patient is multi-disciplinary, imprecise and very often
incomplete.
17. Meet Gerard Donovan….
Cardiology Radiology Billing Plant Administration Pharmacy Food Lab About that Bill
$3,943 $1,290 $1,433 services $3,233
Intensive Care
$17,664
Operating
Room
$36,127
... and his 150 medical staff...
19. Watson™ DeepQA™
Technology
• Analyzing large volumes of structured
and unstructured data
• Interprets and understands natural
language questions
• Generates and evaluates hypothesis
and quantifies confidence in answers
• Supports iterative dialog to refine
results
• Adapts and learns over time improving
results
20. DeepQA™: The Technology Behind
Watson™
Learned Models
help combine and
weigh the Evidence
Evidence Balance
Sources & Combine
Answer Models Models
Sources Deep
Question Answer Evidence Models Models
Evidence
Candidate Scoring Retrieval 100,000’s Scores from
Primary 1000’s of
Scoring
many Deep Analysis
Answer Models Models
Search Pieces of Evidence Algorithms
Generation
100’s Possible
Answers
Multiple 100’s
Interpretations sources
Question & Final Confidence
Question Hypothesis Hypothesis and Evidence
Topic Synthesis Merging &
Decomposition Generation Scoring
Analysis Ranking
Hypothesis Hypothesis and Evidence Answer &
Generation Scoring Confidence
...
21. Architecture
User Experience
By Nuance and Partners…..
…..community of consumers
– large and small
CLU…… Cloud to Cloud DeepQA™
Solutions for
….community of Healthcare
EMRs Content
Publishers
Large
Institutional …..community of
Providers CASE Content Partners
22. Comparison
• Not simple search
• Analysis of multiple concurrent
complex contributing conditions and
factors
23. Question and Answer Sets
Success
• Question: This hormone deficiency is
associated with Kallmann's syndrome.
– Passage: Isolated deficiency of GnRH or its
receptor causes failure of normal pubertal
development and amenorrhea in women. This
disorder is termed Kallmann syndrome when
it is accompanied by anosmia and has also
been termed idiopathic hypogonadotropic
hypogonadism (IHH).”
• Answer: GnRH
• Notes: We know that “GnRH” is a hormone
(from the ontology) so that lets us choose it
as the most likely answer.
24. Question and Answer Sets
Miss
• Question: Eponym from Victorian literature
for obesity hypoventilation syndrome.
– Correct passage: Obesity-hypoventilation
syndrome is also known as pickwickian
syndrome, in reference to Charles Dickens‟…
– Correct answer: Pickiwickian Syndrome
– Wrong passage: Other clinical features
associated with obesity-hypoventilation
syndrome are daytime hypersomnolence and
cor pulmonale.
– Wrong answer: cor pulmonale
25. Potential Use Cases
• If We Only Knew What We Knew
– Bringing Evidence to the Point of Care
– Consumption of medical records, results etc offering differential diagnosis and
probability analysis with links to underlying literature sources
– Draws on the specifics of a patient case and vast volumes of clinical data and medical
– Highly granular results tailored to a particular patient‟s
conditions, demographics, history
– True personalization of medicine based on large cohort historical data analysis
• Acting on What We Know
– Medication dosage: guidelines, clinical research findings for specific patient
– Adverse drug reactions: computational model + research database populated by
Watson
– Treatment Options: contextualized to patient
– Standard of Care: aligning treatment to standards
– Trending guidelines: recently published, pre-official
– Post-Operative Discharge and Follow up
– Entry of symptoms or symptomatic trends can trigger alerts for follow up
– Ongoing refinement based on dynamic interaction and learning
– Medical avatar for treatment and management of chronic conditions
26. Long Term Objectives
• Creation of a state of the art system oriented to evidence
based decision making in healthcare, where such a system
– Reports the suggested decisions and decision processes
– Reports the aggregated data from clinical processes
– Defined as real-time or retrospective system
– Designed to assist medical professions involved in the patient life cycle, in
diagnosis and treatment of a patient
• Applying and expanding Watson‟s framework in conjunction
with Clinical Language Understanding, medical data and
medical ontology
• Integrated into medical workflow and learn over time
27. Challenges
• Ambiguous human language
• Integration with existing systems – extract
of complete data set for history, results etc
– Often in disparate systems
– Non standard interfaces
– Non standard format
– Unstructured narrative
• Patient interaction with technology vs
humans
– Telemedicine and consumer trend towards
home based care
28. Replacing the Doctor?
• Study done by the Mayo Clinic in 2006
identified the most important characteristics
patients feel a good doctor must possess
• The Ideal clinician is
– confident,
– empathetic,
– humane,
– personal,
– forthright,
– respectful, and
– thorough
• These facets are entirely human and will be
hard for technology to replace
Mayo Clin Proc. 2006;81(3):338-344
29. Questions
For More information I can be reached at
Nick van Terheyden, MD
Chief Medical Information Officer,
Nuance Communications
www.nuance.com/healthcare
E-Mail drnick@nuance.com
drnic1@gmail.com
Twitter http://twitter.com/drnic1
Voice of the Doctor http://drvoice.blogspot.com/
LinkedIn http://www.linkedin.com/in/nickvt
Plaxo http://nvt.myplaxo.com
FaceBook http://facebook.com/drnic1
Google Voice (301) 355-0877
30. Calling Watson™ to Ward
8 Stat
Nick van Terheyden, MD
Chief Medical Information Officer – Clinical Language Understanding
Nuance Communications Inc
Wednesday, February 2
9:45 - 10:45 AM
Editor's Notes
Background on technology and Watson™/Jeopardy and the data Tsunami we face in h/cHow DeepQA™ WorksDeepQA™ applied to HealthcareCurrent Example of Medical Intelligence (CTRM)Future Use Cases
15 years to get clinical studies into practice - The average rate of increase in use of 9 clinical procedures based on landmark studies and found that the average rate of increase in use was 3.2% per year, thus 15.6 years were required on average for 50% implementation. - Balas and Boren do not estimate how long it takes to conduct the research! They effectively start from when that research is submitted for publication.Cardiologists hide medical errors. A recent article surveying the professionalism of doctors by specialty found that almost 2/3rds of cardiologists admitted that they had recently refused to report a serious medical error that they had direct personal knowledge of to any authority (Campbell, et al., 2007).
9 landmark studies and the rate of use in the most current published study which is indicated by the reference number immediately following the percent rate of useThese figures are almost certainly an underestimate of the time it takes to translate research to impacts and anoverestimate of the percent of studies that survive to contribute to utilization
Combines large amounts of unstructured data with structured data to be analyzed together Understands ambiguous and imprecise questions using sophisticated natural language algorithms Identifies many answers to questions with evidence to "explain" rationale for answers Enables iterative and interactive question and answering to refine and improve results Learns from additional evidence, additional questions and mistakes to improve accuracy over time
Massively Parallel Probabilistic Evidence-Based Architecture Generates and scores many hypotheses using a combination of 1000’s Natural Language Processing, Information Retrieval, Machine Learning and Reasoning Algorithms. These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find.<click> Watson – the computer system we developed to play Jeopardy! is based on the DeepQAsoftatearchtiecture.Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high.Remember, the intended meaning of natural language is ambiguous, tacit and highly contextual. The computer needs to consider many possible meanings, attempting to find the evidence and inference paths that are most confidently supported by the data.So, the primary computational principle supported by the DeepQA architecture is to assume and pursue multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and evaluate many different competing evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question might means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability. <UIMA Mention>For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community.Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson.In the first step, Question and Category analysis, parsing algorithms decompose the question into its grammatical components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”.In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system.In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. Note that Watson, to compete on Jeopardy! is not connected to the internet.These searches are performed over a combination of unstructured data, natural language documents, and structured data, available data bases and knowledge bases fed to Watson during training.The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is very little confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus at this point on generating a broad set of hypotheses, – or for this application what we call them “Candidate Answers”. To implement this step for Watson we integrated and advanced multiple open-source text and KB search components.After candidate generation DeepQA also performs Soft Filtering where it makes parameterized judgments about which and how many candidate answers are most likely worth investing more computation given specific constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and speed, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the threshold would be eliminated from consideration entirely at this point.In Hypothesis & Evidence Scoring the candidate answers are first scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”?For each candidate answer many pieces of additional Evidence are search for. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages and score the likelihood that the passage supports or refutes the correctness of the candidate answer. These algorithms may consider variations in grammatical structure, word usage, and meaning.In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire. They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts.Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure.No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained.DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.
Notes: This is easy if you know that Charles Dickens wrote Victorian literature. This is not part of medical inference, though, so we do not cover that, and an incorrect answer is preferred because its passage matched the query better. Without knowing about Victorian literature, there is not enough other information in the question to reliably find the correct answer.
Post Op Discharge:Patient hospital discharge instructions and treatment planSymptoms: set expectations, detect risksAugments nurse follow-up and tracks recovery until follow-up appointmentMulti-channel options: phone, IM, web, mobile SMS, app