SlideShare a Scribd company logo
1 of 22
Download to read offline
Automating the formalization of clinical
guidelines using information extraction:
an overview of recent lexical approaches

05 August 2011

Phil Gooch
Centre for Health Informatics
City University, London UK
Clinical guidelines

• Contain recommendations for best practice based on systematic
 reviews of clinical evidence, consensus statements and expert opinion.
• Goal is to reduce variation in medical care by promoting the most
 effective treatments, and to provide a means of quality control in clinical
 practice via audit
• Produced by a variety of organizations (e.g. NICE, RCP, SIGN) in a
 variety of document formats usually not conducive to use at the point of
 care.
Clinical decision support (CDS)

•   Aims to provide diagnostic and treatment recommendations and
    advice at the point of care, i.e. information tailored for the specific
    patient under consideration by the clinician during a consultation
•   CDS systems require a knowledge base (KB), usually derived from
    guidelines, consisting of declarative knowledge (penicillin is-a
    antibiotic) and procedural (if…then) rules, and some sort of electronic
    patient record system (EPR)
Computer-interpretable guidelines

•   Early systems ‘computerized’ guidelines by making them available ‘on
    the computer’, e.g. as HTML or PDF
     • Did not lead to improved guideline compliance or use!
•   To standardize the format of the knowledge-base, ease development
    of CDS, and to improve guideline use at the point of care, a number of
    formalisms for representing guidelines have been developed
Computer-interpretable guidelines (CIGs)

Rule-based: ‘if ... then’, e.g. Arden Syntax for individual clinical decisions
   LET Last_HgA1C BE READ LATEST {"HgA1C Value"};
   LET Diabetic_Patient BE READ LATEST {"Problem: Diabetes"};
   if Diabetic_Patient and Last_HgA1C Occurred not within past 6 months and Last_HgA1C is less
      than or equal 7
   then conclude true;

Document based, e.g. GEM, for complete guideline documents in XML
OO expression query languages e.g. GELLO:
 observation.code == ‘SBP’ AND observation.value > 140 AND assessment.code ==‘LVF’

Task-network models (TNM), e.g. GLIF, Asbru, PROforma, for workflow-like
 modelling of decisions over time
Formalization of guidelines into a CIG model

•     Declarative: Mapping clinical concepts in the guideline to terms within a
      controlled vocabulary (e.g. UMLS) or ‘virtual medical record’
•     Procedural: Identification and extraction of eligibility criteria, clinical
      actions (tests, treatment regimes, referrals), temporal constraints and
      if…then decision rules
•     Translation to a formal model, e.g. PROforma, GLIF, Asbru
•     Time-consuming, iterative, manual process as the guideline text tends to
      assume background knowledge, is incomplete or contains ambiguity and
      vague terms
Example CIG fragment (Asbru)

<plan name="Doxycycline : 100 mg orally twice a day for 7 days"
   plan_id="plan52769441">
      <cyclical_plan plan_id="plan5675512">
        <frequency value="12" unit="hour"/>
      </cyclical_plan>
      <duration>
        <min value="7" unit="day"/>
        <max value="7" unit="day"/>
      </duration>
   </plan>
Examples of vague guideline statements

Underspecification:
• Avoid the use of highly intensive management strategies to achieve
  an HbA1c level less than 6.5% (48 mmol/mol)

•   Monitor HbA1c every 2–6 months (according to individual need) until it
    is stable on unchanging treatment

Qualitative terms requiring mapping to numeric values or ranges:
• The moderate use of alcohol may increase HDL-cholesterol

•   If blood pressure remains uncontrolled on adequate doses of three
    drugs, consider adding a fourth and/or seeking expert advice
Information extraction for guideline formalization

• Helpful to automate
    • Knowledge base construction: text to formal model translation
    • Identification of opportunities for decision support: mapping
      guideline concepts and rules to concepts in the EPR
    • Measurement of guideline compliance
Information extraction approaches

•   Bottom-up: identification of individual clinical terms, temporal
    expressions, units of measure
     • Look-up lists, regular expressions
     • Shallow parsing to identify noun phrases
     • Terminology services: UMLS, MetaMap
     • Co-reference resolution: WordNet

•   Top-down: identification of guideline structure: preamble, eligibility,
    recommendations, ‘action’ sentences and rules
     • Shallow parsing to identify verb phrases
     • Ontologies for semantic relations, e.g. UMLS Semantic Network
     • Use of linguistic guideline patterns (see later)
Mapping text to UMLS concepts - problems

• Identification of clinical terms is dependent on context:
- family history of congestive heart failure
- probable diagnosis of congestive heart failure
- no evidence of congestive heart failure
- patient does not have established cardiovascular disease


• Clearly just identifying the raw concepts congestive heart failure and
 cardiovascular disease and mapping them to UMLS terms is
 inadequate.
Mapping guideline text to UMLS concepts - problems

• Guideline documents are typically large (100 pages), in PDF or XML
 format
• Requires guideline text to be segmented to enable efficient processing
- How best to segment the text that maximizes contextual clinical concept
 identification?
Solutions: Text segmentation
• Customised phrase chunker to identify candidate terms:
 - Noun phrases (NP), prepositional phrases (PP), verb phrases (VP)
 - Neoclassical combining forms phrases (Token groups containing
   Latin/Greek prefixes, roots, suffixes)
 - Past-participle and gerund NPs:
   - 'results in increased blood pressure', 'fasting blood glucose'
 - List expansion:
   - 'mild, moderate and severe hypertension → mild hypertension,
      moderate hypertension and severe hypertension'
   - 'lowering of heart rate and blood pressure → lowering of heart
      rate and lowering of blood pressure'
 - Abbreviation expansion: 'waist circumference (WC)'
Solutions: GATE-MetaMap Server integration plugin

- Extracts clinical concepts, in context, from large guideline texts in
 multiple formats and encodings (PDF, XML, RTF, ASCII, UTF-8)
- Exchanges data/annotations with a MetaMap server
- Implements Unicode Normalization Forms for UTF-8 → ASCII
- Provides flexible text chunking options
- Optimises input data to MetaMap for mapping to UMLS concepts
- Integrates with other information extraction pipelines
GATE-MetaMap integration module
Guideline patterns

Serban et al. (2007), examples:

(med_context, target_group, recommendation_operator, med_action)

In the event of [pregnancy]med_context, [patients with diabetes]target_group
   [should]recommendation_op be[prescribed calcium channel blocker]med_action


(target_group, med_context, med_goal)

For [diabetic patients]target_group with [kidney damage]med_context the [blood
   pressure target is130/80]med_goal
Extracting guideline recommendations
Extracting guideline recommendations


… and rules from guideline text
Information extraction from patient data
Patient data: automatic spelling correction
Patient data: automatic spelling correction
Patient data: WordNet mappings for coreferencing

More Related Content

Similar to Automating the formalization of clinical guidelines using information extraction

Automatic summarization of medical literature
Automatic summarization of medical literatureAutomatic summarization of medical literature
Automatic summarization of medical literatureharinithiyagarajan4
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Nursing Practice Literature Evaluation Table Article Paper.pdf
Nursing Practice Literature Evaluation Table Article Paper.pdfNursing Practice Literature Evaluation Table Article Paper.pdf
Nursing Practice Literature Evaluation Table Article Paper.pdfbkbk37
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsMMS Holdings
 
Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Wolfgang Kuchinke
 
Designing and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryDesigning and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryKerstin Forsberg
 
D. informatics theory
D. informatics theoryD. informatics theory
D. informatics theoryloveobi25
 
Clinical case studies and SPSS
Clinical case studies and SPSSClinical case studies and SPSS
Clinical case studies and SPSSAmit Sharma
 
Medical Applications of Decision Support System DSS
Medical Applications of Decision Support System DSSMedical Applications of Decision Support System DSS
Medical Applications of Decision Support System DSSKhaled Elkhrashy
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 
Structured Reporting in Cath Lab.ppt
Structured Reporting in Cath Lab.pptStructured Reporting in Cath Lab.ppt
Structured Reporting in Cath Lab.pptssuser6b98b0
 
Big Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiBig Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiJohn Cai
 
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.pptHCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.pptMadeeshShaik
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizLuis Marco Ruiz
 
HL7 Synthetic CDA generator -Final presentation
HL7 Synthetic CDA generator -Final presentationHL7 Synthetic CDA generator -Final presentation
HL7 Synthetic CDA generator -Final presentationNikhil Kassetty
 
Ehr challenges [bigdata]
Ehr challenges [bigdata]Ehr challenges [bigdata]
Ehr challenges [bigdata]Nesma Almoazamy
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Koray Atalag
 

Similar to Automating the formalization of clinical guidelines using information extraction (20)

Automatic summarization of medical literature
Automatic summarization of medical literatureAutomatic summarization of medical literature
Automatic summarization of medical literature
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Nursing Practice Literature Evaluation Table Article Paper.pdf
Nursing Practice Literature Evaluation Table Article Paper.pdfNursing Practice Literature Evaluation Table Article Paper.pdf
Nursing Practice Literature Evaluation Table Article Paper.pdf
 
Natural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health RecordsNatural Language Processing to Curate Unstructured Electronic Health Records
Natural Language Processing to Curate Unstructured Electronic Health Records
 
Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).
 
Designing and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryDesigning and launching the Clinical Reference Library
Designing and launching the Clinical Reference Library
 
D. informatics theory
D. informatics theoryD. informatics theory
D. informatics theory
 
Clinical case studies and SPSS
Clinical case studies and SPSSClinical case studies and SPSS
Clinical case studies and SPSS
 
Medical Applications of Decision Support System DSS
Medical Applications of Decision Support System DSSMedical Applications of Decision Support System DSS
Medical Applications of Decision Support System DSS
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
Structured Reporting in Cath Lab.ppt
Structured Reporting in Cath Lab.pptStructured Reporting in Cath Lab.ppt
Structured Reporting in Cath Lab.ppt
 
Big Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiBig Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John Cai
 
CDISCs_SDTM_basics.ppt
CDISCs_SDTM_basics.pptCDISCs_SDTM_basics.ppt
CDISCs_SDTM_basics.ppt
 
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.pptHCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
HCLSIG$$Drug_Safety_and_Efficacy$CDISCs_SDTM_basics.ppt
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco Ruiz
 
HL7 Synthetic CDA generator -Final presentation
HL7 Synthetic CDA generator -Final presentationHL7 Synthetic CDA generator -Final presentation
HL7 Synthetic CDA generator -Final presentation
 
Ehr challenges [bigdata]
Ehr challenges [bigdata]Ehr challenges [bigdata]
Ehr challenges [bigdata]
 
Stroke Prediction
Stroke PredictionStroke Prediction
Stroke Prediction
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
 

Recently uploaded

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Automating the formalization of clinical guidelines using information extraction

  • 1. Automating the formalization of clinical guidelines using information extraction: an overview of recent lexical approaches 05 August 2011 Phil Gooch Centre for Health Informatics City University, London UK
  • 2. Clinical guidelines • Contain recommendations for best practice based on systematic reviews of clinical evidence, consensus statements and expert opinion. • Goal is to reduce variation in medical care by promoting the most effective treatments, and to provide a means of quality control in clinical practice via audit • Produced by a variety of organizations (e.g. NICE, RCP, SIGN) in a variety of document formats usually not conducive to use at the point of care.
  • 3. Clinical decision support (CDS) • Aims to provide diagnostic and treatment recommendations and advice at the point of care, i.e. information tailored for the specific patient under consideration by the clinician during a consultation • CDS systems require a knowledge base (KB), usually derived from guidelines, consisting of declarative knowledge (penicillin is-a antibiotic) and procedural (if…then) rules, and some sort of electronic patient record system (EPR)
  • 4. Computer-interpretable guidelines • Early systems ‘computerized’ guidelines by making them available ‘on the computer’, e.g. as HTML or PDF • Did not lead to improved guideline compliance or use! • To standardize the format of the knowledge-base, ease development of CDS, and to improve guideline use at the point of care, a number of formalisms for representing guidelines have been developed
  • 5. Computer-interpretable guidelines (CIGs) Rule-based: ‘if ... then’, e.g. Arden Syntax for individual clinical decisions LET Last_HgA1C BE READ LATEST {"HgA1C Value"}; LET Diabetic_Patient BE READ LATEST {"Problem: Diabetes"}; if Diabetic_Patient and Last_HgA1C Occurred not within past 6 months and Last_HgA1C is less than or equal 7 then conclude true; Document based, e.g. GEM, for complete guideline documents in XML OO expression query languages e.g. GELLO: observation.code == ‘SBP’ AND observation.value > 140 AND assessment.code ==‘LVF’ Task-network models (TNM), e.g. GLIF, Asbru, PROforma, for workflow-like modelling of decisions over time
  • 6. Formalization of guidelines into a CIG model • Declarative: Mapping clinical concepts in the guideline to terms within a controlled vocabulary (e.g. UMLS) or ‘virtual medical record’ • Procedural: Identification and extraction of eligibility criteria, clinical actions (tests, treatment regimes, referrals), temporal constraints and if…then decision rules • Translation to a formal model, e.g. PROforma, GLIF, Asbru • Time-consuming, iterative, manual process as the guideline text tends to assume background knowledge, is incomplete or contains ambiguity and vague terms
  • 7. Example CIG fragment (Asbru) <plan name="Doxycycline : 100 mg orally twice a day for 7 days" plan_id="plan52769441"> <cyclical_plan plan_id="plan5675512"> <frequency value="12" unit="hour"/> </cyclical_plan> <duration> <min value="7" unit="day"/> <max value="7" unit="day"/> </duration> </plan>
  • 8. Examples of vague guideline statements Underspecification: • Avoid the use of highly intensive management strategies to achieve an HbA1c level less than 6.5% (48 mmol/mol) • Monitor HbA1c every 2–6 months (according to individual need) until it is stable on unchanging treatment Qualitative terms requiring mapping to numeric values or ranges: • The moderate use of alcohol may increase HDL-cholesterol • If blood pressure remains uncontrolled on adequate doses of three drugs, consider adding a fourth and/or seeking expert advice
  • 9. Information extraction for guideline formalization • Helpful to automate • Knowledge base construction: text to formal model translation • Identification of opportunities for decision support: mapping guideline concepts and rules to concepts in the EPR • Measurement of guideline compliance
  • 10. Information extraction approaches • Bottom-up: identification of individual clinical terms, temporal expressions, units of measure • Look-up lists, regular expressions • Shallow parsing to identify noun phrases • Terminology services: UMLS, MetaMap • Co-reference resolution: WordNet • Top-down: identification of guideline structure: preamble, eligibility, recommendations, ‘action’ sentences and rules • Shallow parsing to identify verb phrases • Ontologies for semantic relations, e.g. UMLS Semantic Network • Use of linguistic guideline patterns (see later)
  • 11. Mapping text to UMLS concepts - problems • Identification of clinical terms is dependent on context: - family history of congestive heart failure - probable diagnosis of congestive heart failure - no evidence of congestive heart failure - patient does not have established cardiovascular disease • Clearly just identifying the raw concepts congestive heart failure and cardiovascular disease and mapping them to UMLS terms is inadequate.
  • 12. Mapping guideline text to UMLS concepts - problems • Guideline documents are typically large (100 pages), in PDF or XML format • Requires guideline text to be segmented to enable efficient processing - How best to segment the text that maximizes contextual clinical concept identification?
  • 13. Solutions: Text segmentation • Customised phrase chunker to identify candidate terms: - Noun phrases (NP), prepositional phrases (PP), verb phrases (VP) - Neoclassical combining forms phrases (Token groups containing Latin/Greek prefixes, roots, suffixes) - Past-participle and gerund NPs: - 'results in increased blood pressure', 'fasting blood glucose' - List expansion: - 'mild, moderate and severe hypertension → mild hypertension, moderate hypertension and severe hypertension' - 'lowering of heart rate and blood pressure → lowering of heart rate and lowering of blood pressure' - Abbreviation expansion: 'waist circumference (WC)'
  • 14. Solutions: GATE-MetaMap Server integration plugin - Extracts clinical concepts, in context, from large guideline texts in multiple formats and encodings (PDF, XML, RTF, ASCII, UTF-8) - Exchanges data/annotations with a MetaMap server - Implements Unicode Normalization Forms for UTF-8 → ASCII - Provides flexible text chunking options - Optimises input data to MetaMap for mapping to UMLS concepts - Integrates with other information extraction pipelines
  • 16. Guideline patterns Serban et al. (2007), examples: (med_context, target_group, recommendation_operator, med_action) In the event of [pregnancy]med_context, [patients with diabetes]target_group [should]recommendation_op be[prescribed calcium channel blocker]med_action (target_group, med_context, med_goal) For [diabetic patients]target_group with [kidney damage]med_context the [blood pressure target is130/80]med_goal
  • 18. Extracting guideline recommendations … and rules from guideline text
  • 20. Patient data: automatic spelling correction
  • 21. Patient data: automatic spelling correction
  • 22. Patient data: WordNet mappings for coreferencing