SlideShare a Scribd company logo
1 of 108
Download to read offline
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Natural Language Processing
for biomedical text mining
Thierry Hamon
LIMSI, CNRS, Université Paris-Saclay, Orsay, France
Université Paris 13, Sorbonne Paris Cité, Villetaneuse, France
hamon@limsi.fr
14/06/2017
1/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Context
Most of the data are unstructured
about 90% of the data produced in 2011 (1.8 trillion of
gigabytes) [Oracle, 2011]
85% of data produced in compagnies
Unstructured data: textual data
Important source of information
Accessing and reading are costly, time-consuming and
sometimes impossible
Need of methods for
information retrieval and information extraction
2/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Context
In biomedical domain, constant increase of amount of
Scientific Medical literature
Scientific papers in digital libraries or portal
Medical, pharmacological, epidemiological reports
Electronic Health Records in hospitals
Discharge summaries
Radiological reports
Patient-related textual data
documents explaining diseases to patients, health
behaviors
social media (online discussion forums, twitter messages)
3/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Context
Example: Scientific article publications
Medline (U.S. National Library of Medicine bibliographic
database) - https://www.ncbi.nlm.nih.gov/pubmed/
Evolution of the number of references to articles in life
sciences
Citations Added to MEDLINE® per Year
Currently: More than 27 million references
4/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
What is text mining?
Objective: Extraction of useful and non-trivial knowledge from
texts
Extraction of information
useful for a given application
from textual data, i.e. writen in natural language
Collecting and linking this information
Feed databases or knowledge bases with information
extracted from texts
Indirectly: allow data mining on unstructured/textual data
5/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Data mining vs. Text mining
Data mining
Methods and algorithms to explore structured data, issued
from databases, data warehouse or knowledge bases
Objectives: Highlight rules, identify trends or behaviours
which are invisible to humans
6/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Data mining vs. Text mining
Data mining
Methods and algorithms to explore structured data, issued
from databases, data warehouse or knowledge bases
Objectives: Highlight rules, identify trends or behaviours
which are invisible to humans
Text mining
Methods and algorithms to explore unstructured data, i.e.
texts written in Natural Language
Objectives: Extraction and categorisation of information
available in the texts
6/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
What are text mining applications?
EHR:
Search and find relevant information, Hospital information
system
Provide synthetic views of patient-related information
EHR / Scientific literature:
Information storage in databases for statistics,
epidemiologic survey, Information system in hospital, etc.
Formalize information or knowledge
Social media:
Epidemiologic analysis, Therapeutical Patient Education,
Potential adverse drug effect identification
7/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
What information to identify?
Semantic entities: terms with semantic types
Semantic relations between entities
Temporal information related to events
Numerical information
Modifiers for identifying polarity, modality,
presence/absence, uncertainty
8/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Needs for analysis of biomedical texts
Various resources:
Terminologies, Ontologies, Open Linked Data
Lexica, Consumer Health Vocabularies
Semantic description of entities
NLP approaches and methods:
Rule-based approaches (more or less sophisticated regular
expressions)
Machine Learning approaches (supervised,
semi-supervised, unsupervised)
Evaluation against independent reference data
9/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Difficulties
Textual data may be noisy, sparse, multilingual
Text processing is time-consuming, may require contextual
information
Terminological and semantic variation, semantic ambiguity,
unknown or new words and terms, etc.
→ High and unpredictable number of dimensions
Complex and embedded semantic relations
10/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Difficulties
Ambiguities of the natural language at each level:
lexicon:
spell[N] vs. spell[V], Apple[company] vs. apple[fruit]
гори[V] (a form of burn) vs. гори (inflectional form of
mountain)
syntax:
the doctor examines the patient with a stetoscope
Joe experienced severe shortness of breath and chest pain
at home while having sex, which became more unpleasant
at the emergency room.
11/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Difficulties
Ambiguities of the natural language at each level:
semantics:
a red pencil, He reached the bank.
поділися (form of disappear) vs. поділися or lemma of
share)
pragmatics:
The chicken is ready to eat.
Margaret invited Susan for a visit, and she gave her a good
lunch.
a very pleasant patient
12/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Difficulties
Variation in semantically similar wording:
Bayer is buying Monsanto
Bayer clinches Monsanto
Bayer and Monsanto [...] will merge
Bayer's announced acquisition of Monsanto
Monsanto-Bayer merger
Metonymy: the latest Apple/Samsung
Metaphor: Web giants, or noir (black gold in French)
Spelling errors: Appel(call in French)/Apple
Mix of Latin and Ukrainian characters (different UTF-8
codes): i vs. і, o vs. о, p vs. р, y vs. у...
13/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Three experiments in biomedical text mining
1 Recognition of Medication, assertion, temporal information
in EHR
[Hamon and Grabar10, Périnet et al.11, Grouin et al.13, Zweigenbaum et al.13,
Hamon and Grabar14]
Work with Natalia Grabar (CNRS STL - Lille 3), Amandine Périnet
(LIM&BIO - Paris 13), Cyril Grouin, Sophie Rosset, Xavier Tannier,
Pierre Zweigenbaum (LIMSI, CNRS)
2 Mining literature for identifying risk factors
[Hamon et al.10]
Work with Martin Graña, Víctor Raggio and Hugo Naya (Institut Pasteur
de Montevideo), and Natalia Grabar (CNRS STL - Lille 3)
3 Cross-Lingual Transfer Methods for Terminology
Acquisition
[Hamon and Grabar16]
Work with Natalia Grabar (CNRS STL - Lille 3)
14/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Mining Patients' Electronic Health Records
[Hamon and Grabar10, Périnet et al.11, Grouin et al.13, Zweigenbaum et al.13,
Hamon and Grabar14]
Description of the hospitalization
A lot of (personal) information about patients
Problems
Therapies (treatments, drugs, etc.)
Tests and analysis (lab data, etc.)
Assertions regarding facts (certainty, hypothesis, etc.)
Temporal information (useful for the clinical timeline)
The best way to record information (database are difficult
to maintain)
BUT the texts are written by practitioners:
in a hurry, with mistakes, with little or incorrect syntactic
structures, etc.
15/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Objectives
Identification of
Medication names given to patients
Related information (dosage, duration, frequency, mode of
administration, reason for prescription)
Assertion: certainty and uncertainty of information in
medical texts
focus on the relation {patient / medical problem}
Temporal expressions: date, time and duration of medical
events
Participation to several I2B2 Challenges
16/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Drug-related information
acne
osteoporosis
swelling of face
arterial hypertension
ulcer of stomach
depression
solumedrol
salt
phosphate disodique anhydre
phosphate monosodique anhydre
sodium
lactosis
cortisone
steroidal anti−inflammatory
allergic shock
Quincke oedema
suffocation by larynx oedema
brain oedema
methylprednisolone
adverseeffects
digitaline
insulin
composition
is a
prescribedfor
INN
DDI
FDI
dosage
mode
frequency
reason
duration
prescriptionfeatures17/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Assertion task
Degree of
certainty from abdominal pain
With shrimps, the patient suffers
The patient is to call the hospital
if he suffers from abdominal pain
The patient denies suffering
from abdominal pain
abdominal pain
The patient suffers from
might suffer from abdominal pain
It was thought that the patient
Certainty
Hypothesis
Condition
Negative certainty
Positive certainty
Assertion
Possibility
18/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Example
Medication name, associated information, assertions and time expressions
The patient is currently off diuretics at this time. Daily
weights should be checked and if her weight increases by
more than 3 pounds Dr. Bockoven should be notified. The
patient was also started on calcitriol given elevation of
parathyroid hormone. Cardiovascular: Rate and rhythm:
The patient has a history of atrial fibrillation with a slow
ventricular response. Two weeks ago, the patient was
started on metoprolol 12.5 mg p.o. q.6 h. for rate control ,
however , this dose was decreased to 12.5 mg p.o. twice a
day, given some bradycardia on her telemetry. The patient
was also started on Flecainide 75 mg p.o. q.12 h. She will
continue on these two medications upon discharge.
19/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Example
Medication name, associated information, assertions and time expressions
RRR , lots of BS's , neuro nonfocal , ext with 1+ edema. On
atenolol , zestril , norvasc , premarin , detrol , lasix 60 qd ,
nebs prn at home. Labs sig for Cr 0.7 , CK 48 , TnI .05 ,
QBC 9.5 , Hct 41.3. From CV point of view , thought to be
CHF exac. ROMI'd without events on monitor and diuresed
2L/day. IV Lasix 80 bid to start transitioned to 60 po bid.
BNP>assay. 6/17 dobut MIBI with mod sized ant septal wall
defect c/w diagonal lesion , 3/22 Echo with EF 55-60% ,
mild LAE/RAE , no WMA , mod large RV. No further CV
studies. Cont previously meds on d/c. From FEN point of
view , 2 L fluid restriction , 2 g Na restriction. Nutrition
consult , but pt very resistant to diet changes. From GI point
of view , GERD; nexium started. From pulm point of view ,
CXR c/w sl fluid overload , no focal findings , no pulm
edema. Given NC O2 and BiPAP at night.
20/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Material
Documents
Discharge summaries: 1,249 documents (provided by the
I2B2 challenges)
2009: 649 docs in the training set , 553 docs in the test set,
17 manually annotated documents (for illustrating the
annotation guidelines)
2010: 349 annotated documents + 827 raw documents in
the training set, 477 in the test set
Assertions: 11,968 in the training set, 18,550 in the test set
2012: 190 docs in the training set, 120 docs in the test
21/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Material
Terminologies and lexica
Medication names: RxNorm (243,869 entries) and
Therapeutic classes and groups of medication from the
FDA website
Ambiguous medication (red blood cells, magnesium, iron):
specific status during the annotation process
Medical problems: 45,898 terms (Diagnosis and
Morphology axes of the Snomed International), 476 terms
from the training set documents
Medication-related information
Regular expressions for frequency, dosage, duration and
mode of administration
52 identification rules for reasons: characterization of
Snomed Int terms and/or extracted terms as reasons
22/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Material
Terminologies and lexica
Assertions:
Negation: 284 markers from the NegEx resource
[Chapman et al.01]
Lexical clues: on exertion (condition)
Morphological clues: afebrile (negative certainty)
Contextual information (342 markers)
Clues in the sentence, Section headings
... could represent a multifocal pneumonic process
(possible)
ALLERGIES, SOCIAL HISTORY, lists
Lexico-syntactic patterns (137 patterns)
be to (address | request | notify) DT (office | clinic | hospital) if PB (Hypothesis)
TE to (evaluate | check | eval | consult) (from | if | with | against) PB (Possibility)
23/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Document processing
Annotation of the documents
Use of terminological and linguistic resources and
selection and disambiguation rules
CRF-based models [Grouin et al., Minard et al.11]
tuning Heideltime system
[Strotgen and Gertz12, Hamon and Grabar14]
Design of post-processing modules for
Disambiguation and negative contexts of medication names
Computing of dependency relations between patient,
medication names and related information, or assertion
Improving the CRF-based system with extracted terms
[Aubin and Hamon06]
24/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Enriching documents with linguistic information
Extraction of
terms
Ontology
Lemmatisation
Tagging
of the terms Terminoloy
Semantic tagging
linguistic and structural
annotations
XML document with
of named
Dictionary
entities
Named entity tagging
Word and
sentence segmentation
Specialised
lexicon
Part−Of−Speech
Tagging
Tokenisation
XML document with
structural annotations
Symbolic approach: use of NLP methods
Terminological resources and
disambiguation rules
Concurrent annotations and annotation
selection
Design of post-processing modules for
Annotation disambiguation
Establishment of dependency
relations between patient,
medication names and related
information, or assertion
Annotation based on the Ogmios NLP platform
(developed during the EU Project Alvis)
25/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Enriching document with linguistic information
Identification of the sentences
The patient has a history of atrial fibrillation with a slow ventricular response .
Two weeks ago , the patient was started on metoprolol 12.5 mg p.o.
q.6 h. for rate control ...
26/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Enriching document with linguistic information
Identification of the sentences, words
The patient has a history of atrial fibrillation with a slow ventricular response .
Two weeks ago , the patient was started on metoprolol 12.5 mg p.o.
q.6 h. for rate control ...
26/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Enriching document with linguistic information
Identification of the sentences, words, lemma and
part-of-speech
The
DT
patient
NN
has
VBZ
a
DT
history
NN
of
IN
atrial
JJ
fibrillation
NN
with
IN
a
DT
slow
JJ
ventricular
JJ
response
NN
.
Two
CD
weeks
NNS
ago
RB
, the
DT
patient
NN
was
VBD
started
VBN
on
IN
metoprolol
FW
12.5
CD
mg
NN
p.o.
SYM
q.6
FW
h.
NP
for
IN
rate
NN
control
NN
...
26/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Enriching document with linguistic information
Identification of the sentences, words, lemma and
part-of-speech, named entities
[TIMEX3] [DOSAGE] [MODADM]
[FREQ]
The
DT
patient
NN
has
VBZ
a
DT
history
NN
of
IN
atrial
JJ
fibrillation
NN
with
IN
a
DT
slow
JJ
ventricular
JJ
response
NN
.
Two
CD
weeks
NNS
ago
RB
, the
DT
patient
NN
was
VBD
started
VBN
on
IN
metoprolol
FW
12.5
CD
mg
NN
p.o.
SYM
q.6
FW
h.
NP
for
IN
rate
NN
control
NN
...
26/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Enriching document with linguistic information
Identification of the sentences, words, lemma and
part-of-speech, named entities and terms with semantic
types
[TIMEX3] [DOSAGE] [MODADM]
[FREQ] [DISORDER]
[DRUG]
[DISORDER] [DISORDER]
The
DT
patient
NN
has
VBZ
a
DT
history
NN
of
IN
atrial
JJ
fibrillation
NN
with
IN
a
DT
slow
JJ
ventricular
JJ
response
NN
.
Two
CD
weeks
NNS
ago
RB
, the
DT
patient
NN
was
VBD
started
VBN
on
IN
metoprolol
FW
12.5
CD
mg
NN
p.o.
SYM
q.6
FW
h.
NP
for
IN
rate
NN
control
NN
...
26/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Concurrent annotation of documents
Preparing material for document annotation
Named Entity Recognition (frequency, duration, dosage,
mode of administration)
+ internal disambiguation (avoid nested annotations of
different types and merge annotations of the same type)
Term and semantic tagging (medication and reasons,
negation and reason marker, assertion)
based on linguistic information (word and sentence
segmentation, lemmatization)
+ internal disambiguation (nested terms, parenthesed
medication names, etc.)
27/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Time expression identification
[Hamon and Grabar14]
Tuning Heideltime system [Strotgen and Gertz12] for
English and French EHR
Enrichment and encoding of linguistic temporal
expressions specific to medical and clinical domain:
post-operative day #, b.i.d. meaning twice a day, day of life, etc.
Admission date as the reference or starting point for
computing relative dates and their normalised value
if the admission date is 14 June 2017, the normalised value of
2 days later is 16 June 2017.
Additional normalizations of the temporal expressions:
normalization the durations in approximate numerical values to
avoid undefined values
external computation for some durations and frequencies due to
limitations in HeidelTime's internal arithmetic processor
28/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Annotation selection
Processing of ambiguous medication names : laboratory
data or medication
1 if a list section: status changed in medication
HOME MEDS: methadone 20 bid, imdur 120 bid, hydral taking 25
bid, lasix 20 bid, coumadin, colace, iron, nexium 40 bid
Rejection of medicaton names: if in allergy sections
ALLERGY: prednisone, penicillins, tamsulosin, simvastatin
Removal of drug names in negative contexts
Guessing new drug names with semantic patterns
m do mo? f [Hamon et al.13]
1 Noun phrases recognized by the term extractor YATEA
2 Stopwords rejected
3 Filtering with typical suffixes of the medication names
Diovan 160mg PO BID, HCTZ 25mg PO QD, Imdur ER 60mg PO
QD, NTG .4mg PRN CP, Norvasc 10mg PO QD, Pavachol 80mg
PO QD.
29/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Results
Medication task
Focus on various parameters for reason identification and
guessing medication names
RUN2 RUN1 RUN3
System 0.7801 0.7681 (-0.0120) 0.7719 (-0.0082)
m 0.8142 0.8093 (-0.0049) 0.808 (-0.0062)
do 0.8234 0.8172 (-0.0062) 0.821 (-0.0024)
f 0.837 0.8304 (-0.0066) 0.8345 (-0.0025)
mo 0.8655 0.8577 (-0.0078) 0.8624 (-0.0031)
du 0.3575 0.3516 (-0.0059) 0.3505 (-0.0070)
r 0.2867 0.2759 (-0.0108) 0.2666 (-0.0201)
RUN1: All reasons
RUN2: All reasons without semantic tagging and reason markers
RUN3:
All reasons without semantic tagging and use of reason markers
Guessing medication names
30/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Results
Medication task
exact inexact
F P R F P R
System 0.7801 0.7997 0.7614 0.7792 0.8111 0.7497
m 0.8142 0.8448 0.7858 0.8304 0.8666 0.7971
do 0.8234 0.8728 0.7793 0.8503 0.8799 0.8226
f 0.837 0.8306 0.8435 0.8411 0.8436 0.8386
mo 0.8655 0.8543 0.877 0.863 0.844 0.8828
du 0.3575 0.3483 0.3673 0.3607 0.3669 0.3546
r 0.2867 0.3047 0.2708 0.3386 0.4386 0.2757
Reason: difficult to identify the exact noun phrases (-13%
between inexact and exact precision)
31/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Results
Assertion task and time expression identification
List of markers + section headings
Categories Training Test
P R F P R F
Associated to somebody else 0.96 0.80 0.88 0.84 0.74 0.79
Hypothesis 0.71 0.31 0.43 0.63 0.24 0.35
Condition 0.08 0.40 0.14 0.08 0.33 0.12
Possibility 0.46 0.57 0.51 0.51 0.47 0.49
Absent 0.92 0.75 0.82 0.87 0.75 0.81
Present 0.86 0.90 0.88 0.84 0.87 0.86
Assertions 0.82 0.82 0.82 0.80 0.80 0.80
Precision Recall F-measure
Temporal expressions 0.8611 0.8170 0.8385
32/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Conclusion
F-measure of the system: 0.800 (avg)
Analysis of the resource contribution:
Importance of the markers
Need to include syntactic structures
Difficulty to identify certainty degrees
few examples for condition and hypothesis
33/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Further improvements
Medication tasks:
Duration extraction: identification of specific prepositional
phrases based on parsing
Medical problem identification: development of a specific
reasoning module
Assertion task:
Enrich resources with synonyms (Wordnet)
Improving the patterns:
using syntactic dependencies
integrating semantic classes
(verbs of evidence, verbs to get in touch with somebody,
etc.)
34/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Mining literature to identify
relations between risk factors and their pathologies
[Hamon et al.10]
Objective: Massive exploitation of Medline bibliographical
database for extracting risk factors and their associations
with health conditions
Risk factors: increase people's chance to develop a given
disease
Information on risk factors is wide-spread over the web:
websites, bibliographical databases, ...
Previous works:
Genomic scientific literature (BioCreative, TREC
Genomics), clinical records (I2B2 NLP Challenge 2014),
processing of narratives [Blake04]
Data mining (KDD challenge 2004)
[Ahmad and Bath05, Cerrito04, Kolyshkina and van rooyen06]
35/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Material
Bibliographical database Medline (titles, abtracts)
Selection of potential citations/PMIDs, i.e. containing the
sequences risk factors, factor of risk
187,544 citations selected: over 42 million word
occurrences
MeSH (thesaurus for information storage and retrieval)
Disease-related MeSH term recognition in citations
36/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Document processing
1 Annotation of Medline citations with linguistic information
Ogmios NLP platform [Hamon et al.07]
Segmentation, POS-tagging & lemmatization -- Genia
Tagger [Tsuruoka et al.05]
Term recognition but also term extraction -- YATEA
[Aubin and Hamon06]
2 Risk factors identification
37/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Document processing
1 Annotation of Medline citations with linguistic information
Ogmios NLP platform [Hamon et al.07]
Segmentation, POS-tagging & lemmatization -- Genia
Tagger [Tsuruoka et al.05]
Term recognition but also term extraction -- YATEA
[Aubin and Hamon06]
2 Risk factors identification
37/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term recognition vs. Term extraction
Term recognition: Tagging of texts with terms issued from a
terminologies
Use of more or less complexe methods (string matching,
terminological variant computing, semantic distances,
ML methods...)
Term extraction: Discovering of terms in texts
Identification of noun phrases which are potential terms
(term candidates)
Computing of
the strength of the term components (unithood)
the strength of the relation to the domain (termhood)
[Kageura and Umino96]
38/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
Yet Another Term ExtrActor
(Aubin&Hamon, 2006)
Term extration from French and English texts
Shallow parsing of texts
Parsing focusing on the parts of the sentence which may
contain terms (usually the noun phrases)
With
recursively applied minimal parsing patterns
endogenous learning
Term candidate decomposition in Head and Modifier
components (component syntactic role in the noun phrase)
Each component of a term candidate is also considered as
a term candidate
Unparseable noun phrases are rejected
39/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
YATEA
Yet Another Term ExtrActor
(Aubin et Hamon, 2006)
Several statistical measures are associated with each term
candidate (Number of occurrences, C-Value1, C-Value*,
etc.) [Hamon et al.14]
Module CPAN http://search.cpan.org/~thhamon/Lingua-YaTeA/
Developpement during the European project ALVIS
Description of the shallow parsing with configuration files
Possibility of tuning for a domain (Bi
oYATEA ) [Golik et al.13]
For other languages: on-going work for Ukrainian and
Arabic
40/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
Textes
lemmatisation
+ POS tagging
22CD yoJJ maleNN ,, hNN /SYMoNN primitiveJJ
neuroectodermalJJ tumorNN withIN metsNNS toTO brainNN
andCC spineNN ,, transferredVBN fromIN Hospital1NNP ,,
initiallyRB inIN Dept1NNP andCC thenRB transferredVBN toTO
theDT floorNN .. HePRP wasVBD initiallyRB diagnosedVBN withIN
aDT thoracicJJ gangliogliomNN //resectedVBN inIN 2012CD ..
HePRP hadVBD backJJ painNN inin 2CD /SYM04CD ,, seenVBN atIN
Dept2NNP ,, andCC wasbe foundVBN toTO haveVB metsNNS toTO
brainNN andCC spineNN ..
41/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
Textes
lemmatisation
+ POS tagging
Term extraction
rule-based approaches
Identification of chunks thanks to morpho-syntactic
information (frontiers - verbs, adverbs, etc.)
22CD yoJJ maleNN ,, hNN /SYMoNN primitiveJJ
neuroectodermalJJ tumorNN withIN metsNNS toTO brainNN
andCC spineNN ,, transferredVBN fromIN Hospital1NNP ,,
initiallyRB inIN Dept1NNP andCC thenRB transferredVBN toTO
theDT floorNN .. HePRP wasVBD initiallyRB diagnosedVBN withIN
aDT thoracicJJ gangliogliomNN //resectedVBN inIN 2012CD ..
HePRP hadVBD backJJ painNN inin 2CD /SYM04CD ,, seenVBN atIN
Dept2NNP ,, andCC wasbe foundVBN toTO haveVB metsNNS toTO
brainNN andCC spineNN ..
41/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
Parsing of the noun phrases to detect term candidates
1. Identification of term candidates described by parsing
patterns
NNJJ
M H
(< H > : Head of the noun phrase, < M > : modifier of the head)
neuroectodermal tumor → (neuroectodermal< M >
tumor< T >)
tumorneuroectodermal
M H
shortness of breath → shortness< T > of breath< M >
(of) breathshortness
H M
42/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
2. Use of the previously parsed term candidates (island of
reliability) to parse remaining noun phrases
Example: primitive neuroectodermal tumor
43/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
2. Use of the previously parsed term candidates (island of
reliability) to parse remaining noun phrases
Example: primitive neuroectodermal tumor
Use of the already parsed term
neuroectodermal tumor
tumorneuroectodermal
M H
43/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
2. Use of the previously parsed term candidates (island of
reliability) to parse remaining noun phrases
Example: primitive neuroectodermal tumor
Use of the already parsed term
neuroectodermal tumor
primitive tumorneuroectodermal
M H
43/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
2. Use of the previously parsed term candidates (island of
reliability) to parse remaining noun phrases
Example: primitive neuroectodermal tumor
Use of the already parsed term
neuroectodermal tumor
primitive tumorneuroectodermal
M H
Temporary simplification (folding): primitiveJJ tumorNN
43/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
2. Use of the previously parsed term candidates (island of
reliability) to parse remaining noun phrases
Example: primitive neuroectodermal tumor
Use of the already parsed term
neuroectodermal tumor
primitive tumorneuroectodermal
M H
Temporary simplification (folding): primitiveJJ tumorNN
Use of the parsing pattern:
NNJJ
M H
→
tumorprimitive
M H
43/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
2. Use of the previously parsed term candidates (island of
reliability) to parse remaining noun phrases
Example: primitive neuroectodermal tumor
Use of the already parsed term
neuroectodermal tumor
primitive tumorneuroectodermal
M H
Temporary simplification (folding): primitiveJJ tumorNN
Use of the parsing pattern:
NNJJ
M H
→
tumorprimitive
M H
Unfolding :
tumorneuroectodermal
M H
primitive
M
H
43/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
Textes
lemmatisation
+ POS tagging
22CD yoJJ maleNN ,, hNN /SYMoNN primitiveJJ neuroectodermalJJ
tumorNN withIN metsNNS toTO brainNN andCC spineNN ,,
transferredVBN fromIN Hospital1NNP ,, initiallyRB inIN Dept1NNP
andCC thenRB transferredVBN toTO theDT floorNN .. HePRP wasVBD
initiallyRB diagnosedVBN withIN aDT thoracicJJ gangliogliomNN
//resectedVBN inIN 2012CD .. HePRP hadVBD backJJ painNN inin
2CD /SYM04CD ,, seenVBN atIN Dept2NNP ,, andCC wasbe foundVBN
toTO haveVB metsNNS toTO brainNN andCC spineNN ..
44/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
Textes
lemmatisation
+ POS tagging
Term extraction
rule-based approaches
Candidate
terms
yo male thoracic gangliogliom
h back pain
o mets
primitive neuroectodermal tumor brain
mets spine
brain floor
spine
...
44/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Term extraction with YATEA
Textes
lemmatisation
+ POS tagging
Term extraction
rule-based approaches
Candidate
terms
Term ranking
frequency
term length
C-Value
Ranked term
candidates
f l Cv1 f l Cv1
yo male 1 1 1.58 spine 2 1 2
h 1 1 1 floor 1 1 1
o 1 1 0 thoracic gangliogliom 1 2 1.58
mets 2 1 2 back pain 1 2 1.58
brain 2 1 2
primitive neuroectodermal tumor 1 3 2.32
...
44/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Document processing
1 Annotation of Medline citations with linguistic information
Ogmios NLP platform [Hamon et al.07]
Segmentation, POS-tagging & lemmatization -- Genia
Tagger [Tsuruoka et al.05]
Term recognition and extraction -- YATEA
[Aubin and Hamon06]
2 Risk factors identification
45/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Document processing
1 Annotation of Medline citations with linguistic information
Ogmios NLP platform [Hamon et al.07]
Segmentation, POS-tagging & lemmatization -- Genia
Tagger [Tsuruoka et al.05]
Term recognition and extraction -- YATEA
[Aubin and Hamon06]
2 Risk factors identification
45/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Risk factor identification
Semantico-syntactic patterns
5 patterns for risk factors and pathologies
12 patterns for handling enumerations
3 patterns for pathologies
<NP-RF> as a risk factor for <NP-P>
where
as a risk factor for: trigger sequence
<NP-RF>: noun phrases corresponding to risk factors
<NP-P>: pathologies
? and *: optional and recurrent elements
MeSH descriptors of citations
Descriptors belonging to C heading of diseases
46/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Risk factor identification
Examples
Pattern: <NP-RF-list> is a risk factor for <NP-P>
...a high intake of calcium and phosphorus is a risk
factor for the development of metabolic acidosis .
(PMID 1435825)
Pattern: risk factors for <NP-P>,? include <NP-RF-list>
...had more than one of the common risk factors for
cerebrovascular accidents , including hypertension ,
advanced age , hyperfibrinogenemia ,
diabetes mellitus , and
past history of cerebrovascular accident. (PMID 1560589)
47/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Risk factor identification
Examples
Pattern: <NP-RF-list> is a risk factor for <NP-P>
...a high intake of calcium and phosphorus is a risk
factor for the development of metabolic acidosis .
(PMID 1435825)
Pattern: risk factors for <NP-P>,? include <NP-RF-list>
...had more than one of the common risk factors for
cerebrovascular accidents , including hypertension ,
advanced age , hyperfibrinogenemia ,
diabetes mellitus , and
past history of cerebrovascular accident. (PMID 1560589)
47/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Results
Application of three kinds of patterns
(1) {risk factor, pathology}, (2) risk factors, (3) pathologies
Definition of relations:
direct relations with patterns {risk factor, pathology}
combination of information provided by (2) and (3)
10,445 PMIDs provide information
313 pairs {risk factor, pathology}
15,398 pairs by combination of (2) and (3)
5,873 risk factors (2) not associated with any pathology
MeSH indexing: 5,106 pathologies and health conditions
21,584 triplets {risk factor, pathologytext?, pathologyMeSH?}
17,620 (14,895) pairs only provided by the patterns
5,717 (4,412) pairs contain MeSH descriptors as pathology
48/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Evaluation
Evaluation of precision
ratio of correct extractions among the overall results
Manual evaluation:
no dedicated and comprehensive gold standard is available
Comparison with three relationships provided by Snomed
CT (nomenclature for organizing and exhanging clinical
data)
has causative agent: direct cause of the disorder or finding
(92,807 relations)
bacterial endocarditis has causative agent bacterium
due to: relate a clinical finding directly to its cause (25,309
relations)
acute pancreatitis due to infection
associated with: clinically relevant association between terms
without either asserting or excluding a causal or sequential
relationship between the two (36,134 relations)
fentanyl allergy has causative agent fentanyl
49/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Evaluation
1 Quality and exhaustiveness of risk factors for a given
pathology
Evaluation by medical doctor of 1,102 risk factors for
coronary heart disease: 88.38% precision
hypertension: {smoking; cigarette smoking; smoking history;
importance of total life consumption of cigarettes}
2 Comparison between text mining results for 20 pathologies
(3,100 extractions, about 25%) and Snomed CT causal
and associative relations (154,130 pairs)
19 extractions (0.6%) considered as already in Snomed CT
Snomed CT not dedicated to risk factors, but they may
occur
acquired immunodeficiency syndrome: {bisexuality, blood
transfusion, intravenous drug abuse }
50/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Conclusion
Extraction of information related to risk factors
Relation with associated pathologies
Text mining approach based on semantico-syntactic
patterns
Evaluation by medical doctor and computer scientist
88.38% of risk factors related to coronary heart disease are
correct
about 70% of extracted pathologies are equivalent with
MeSH indexing
Snomed CT is not dedidated to the recording of risk factors,
although they may occur
⇒ Creation of a dedicated resource for risk factors is suitable
51/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Future work
Use of other patterns, i.e. predictor, precursor ...
Machine learning methods
Knowledge representation:
homogeneous groups of risk factors
environmental, social, clinical, behavioral ...
Characterization of this information
modal, negative contexts
Geographical, demographic variation
52/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Adaptation of Cross-Lingual Transfer Methods
for the Building of Medical Terminology in Ukrainian
[Hamon and Grabar16]
Nowadays, methods and automatic tools for several
European languages and Japanese
[Kageura and Umino96, Cabre et al.01, Pazienza et al.05]
For many languages:
few NLP tools are available and suitable for automatic
terminology extraction
while textual data exist and terminological resources are
required
53/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Our objective
Design of specific methods for the acquisition of such
terminological resources in Ukrainian
Approaches:
Compilation of terminological resources
Automatic building of terminologies
Observations: increasing availability of parallel bilingual corpora
Methodology: Use of specialized parallel corpora including a
low-resourced language (Ukrainian)
to build bilingual and trilingual terminologies
by the means of the cross-lingual transfer principle
54/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Cross-lingual transfer principle
[Yarowsky et al.01, Lopez et al.02]
Hypothesis:
parallel and aligned corpora with two languages L1 and L2
syntactic or semantic annotations and information from L1
Method:
transpose these annotations or information from L1 to L2,
obtain the corresponding annotations and information in L2
Efficient way for [Zeman and Resnik08, Mcdonald et al.11]
processing multilingual texts from low-resourced
languages
creating various types of annotations: part-of-speech,
semantic categories or even acoustic and prosodic
features
55/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Drawbacks of the transfer principle
The transfer methodology depends on
the quality of the extracted information and annotation from
L1 texts
the quality of alignment
usually a statistical alignment method
depending on the size of the corpora:
the bigger the better
→ Define an approach to bypass these drawbacks
56/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Material
Medical data in three languages (Ukrainian, French, and
English):
Ukrainian Wikipedia:
source of relevant terms
help for the word-level alignment of the MedlinePlus corpus
MedlinePlus corpus:
a collection of specialized texts
providing the basis for the building of the terminology
57/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Medicine-related articles
from Ukrainian Wikipedia
Selection of the Ukrainian part of the Wikipedia using
medicine-related categories, such as Медицина
(medicine) or Захворювання (disorders)
Potentially covers a wide range of medical notions
Use of information in the infobox
58/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Parallel medical corpus
[Hamon and Grabar17] -- http://natalia.grabar.free.fr/resources.php
Patient-oriented brochures in three languages (Ukrainian,
French, and English) from MedlinePlus
on several medical topics (body systems, disorders and
conditions, diagnosis and therapy, health and wellness)
created in English and then translated in several other
languages (including French and Ukrainian)
About 43,000 words for each language
English Ukrainian
Cancer cells grow and divide more
quickly than healthy cells. Cancer
treatments are made to work on these
fast growing cells.
Ракові клітини ростуть і діляться
швидше, ніж здорові клітини. При лі-
куванні раку здійснюється вплив на ці
клітини, що швидко ростуть.
- Tiredness - Втома
- Nausea or vomiting - Нудота або блювота
- Pain - Біль
- Hair loss called alopecia - Втрата волосся, що називається
алопецією
59/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the Wikipedia
Objective: complete and help the alignment method applied to
the MedlinePlus corpus
Use of content of the infoboxes
Ukrainian Wikipedia
medical part
60/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the Wikipedia
Objective: complete and help the alignment method applied to
the MedlinePlus corpus
Use of content of the infoboxes
Ukrainian Wikipedia
medical part
Processing of the InfoBoxes
60/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the Wikipedia
Objective: complete and help the alignment method applied to
the MedlinePlus corpus
Use of content of the infoboxes
Ukrainian Wikipedia
medical part
Processing of the InfoBoxes
Medical terms with MeSH codes
Цукровий діабет тип 2
60/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the Wikipedia
Objective: complete and help the alignment method applied to
the MedlinePlus corpus
Use of content of the infoboxes
Ukrainian Wikipedia
medical part
Processing of the InfoBoxes
Medical terms with MeSH codes
UMLSQuerying UMLS
UMLS Цукровий діабет тип 2
NIDDM
Type 2 Diabetes Mellitus
DID2,
Diabète avec insulinorésistance
60/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the Wikipedia
Objective: complete and help the alignment method applied to
the MedlinePlus corpus
Use of content of the infoboxes
Ukrainian Wikipedia
medical part
Processing of the InfoBoxes
Medical terms with MeSH codes
UMLSQuerying UMLS
Pairs of medical terms
(UK/FR and UK/EN)
Цукровий діабет тип 2
NIDDM
Type 2 Diabetes Mellitus
DID2,
Diabète avec insulinorésistance
60/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Illustration of the transfer methods
English Ukrainian
Cancer cells grow and divide more
quickly than healthy cells. Cancer
treatments are made to work on
these fast growing cells.
Ракові клітини ростуть і діля-
ться швидше, ніж здорові кліти-
ни. При лікуванні раку здійсню-
ється вплив на ці клітини, що
швидко ростуть.
- Tiredness - Втома
- Nausea or vomiting - Нудота або блювота
- Pain - Біль
- Hair loss called alopecia - Втрата волосся, що називає-
ться алопецією
61/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Transfer 1 MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
Extraction of UK terms
corresponding to lines
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Transfer 1 MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
Extraction of UK terms
corresponding to lines
Pairs of candidate terms
(UK/FR and UK/EN)
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Transfer 1 Transfer 2MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
Extraction of UK terms
corresponding to lines
Pairs of candidate terms
(UK/FR and UK/EN)
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Transfer 1 Transfer 2MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
Giza++ suite
(including MkCls)
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
Extraction of UK terms
corresponding to lines
Pairs of candidate terms
(UK/FR and UK/EN)
MedlinePlus corpora
aligned at the word level
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Transfer 1 Transfer 2MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
Giza++ suite
(including MkCls)
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
Extraction of UK terms
corresponding to lines
Pairs of candidate terms
(UK/FR and UK/EN)
MedlinePlus corpora
aligned at the word level
UK term extraction by transfer
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Transfer 1 Transfer 2MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
Giza++ suite
(including MkCls)
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
Extraction of UK terms
corresponding to lines
Pairs of candidate terms
(UK/FR and UK/EN)
MedlinePlus corpora
aligned at the word level
UK term extraction by transfer
Pairs of candidate terms
(UK/FR and UK/EN)
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Transfer 1 Transfer 2MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
Giza++ suite
(including MkCls)
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
Extraction of UK terms
corresponding to lines
Pairs of candidate terms
(UK/FR and UK/EN)
MedlinePlus corpora
aligned at the word level
UK term extraction by transfer
Pairs of candidate terms
(UK/FR and UK/EN)
Cross-fertilization
with single-word terms
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Extraction of bilingual terminology
from the MedlinePlus corpus
Transfer 1 Transfer 2MedlinePlus Corpora
UK/FR & UK/EN
Cleaning and manual paragraph alignment
Giza++ suite
(including MkCls)
POS tagging with TreeTagger and Flemm
FR & EN term extraction with YATEA
Extraction of UK terms
corresponding to lines
Pairs of candidate terms
(UK/FR and UK/EN)
MedlinePlus corpora
aligned at the word level
UK term extraction by transfer
Pairs of candidate terms
(UK/FR and UK/EN)
Wikipedia pairs
of medical terms
Cross-fertilization
with single-word terms
Cross-fertilization
with single-word terms
62/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Evaluation
Performed by an Ukrainian native speaker having
knowledge in medical informatics
Manual checking of the extracted candidates: correct/non
correct
Validation:
Terms: independently in each language
Bilingual and trilingual relations
Computing of the precision of the results:
correct answers
all the answers
with exact and inexact match (the correct term is included
or includes the candidate)
63/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Results
Bilingual terminology from Wikipedia
357 Ukrainian medical terms (among them 177
single-word terms)
Use of the MeSH codes and UMLS:
1428 French terms (among them, 339 single-word terms)
3625 English terms (among them, 448 single-word terms)
Difference with the number of Ukrainian terms due to the MeSH
synonyms
Bilingual pairs:
1,515 Ukrainian/French term pairs (270 pairs between
single-word terms)
3,789 Ukrainian/English term pairs (405 pairs between
single-word terms)
Precision: 1 because of the collecting method
64/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Results
Bilingual terminology from the MedlinePlus - Transfer 1
436 Ukrainian terms with 0.966 precision
associated with 316 French terms and 354 English terms
282 triples between Ukrainian/French/English terms (prec.:
0.954)
63 pairs only between Ukrainian/French terms (prec.:
0.937)
115 pairs only between Ukrainian/English terms (prec.:
0.965)
Relations
involving synonyms: {втома, fatigue/tiredness},
{фаллопієва труба, trompes de fallope/trompe utérine} (fallopian
tube),
{втрата слуху/втрачається слух, hearing loss}
associating several case forms with same English or
French form: {вагітність, pregnancy} and {вагітності,
pregnancy}
65/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Analysis
Bilingual terminology from the MedlinePlus - Transfer 1
Few errors:
mainly partial match between two languages:
{ви можете спати, dormir/sleep} - lit. you can sleep.
{появу виразок у роті, mouth sores} - lit. (appearance of)
mouth sores
Causes of silence:
variation due to the translation which prevents the
transfer 1 method to extract term in French or English
Догляд: match with French title Soins but not with the
English title Your care
Problem solved by the Transfer 2 method
errors in the POS tagging or term extraction strategy
Incapacity of the term extractor to identify French or English
terms
66/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Results
Bilingual terminology from the MedlinePlus - Transfer 2
9,040 Ukrainian extracted terms (prec.: 0.454)
Exact match:
Higher precision of the French (0.674) and English terms (0.761)
But low number of terms: 3,671 for French, 3,597 for English
Due to the rich morphology of the Ukrainian language:
{напад, нападу} - attack, {припадків, припадки} - seizure,
{костей, кістки} - bones
Extraction of synonymous terms:
{биття, удару} - beats,
{приступам, припадків} - attacks/seizures
Relations:
3,724 pairs of Ukrainian/French terms (prec.: 0.309)
4,745 pairs of Ukrainian/English terms (prec.: 0.401)
4,724 triples of Ukrainian/French/English terms (prec.: 0.419)
Inexact match:
Higher precision: +0.40 points for the Ukrainian terms, +0.05 for
the French and English terms.
Due to the alignment quality?
67/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Analysis
Bilingual terminology from the MedlinePlus - Transfer 2
Error analysis
Most of the errors are due to the alignment problems
when the alignment is correct, the Ukrainian terms are correctly
extracted by the transfer
Term analysis
Most of the extracted terms are specific to the medical domain
{шприца, syringe}, {холестерину, cholesterol}, {фактори
ризику, risk factors}, {трахеотомією, tracheostomy}),
Other terms: close and approximating notions:
{діти, children}, {здорову їжу, healthy diet}, {серцевий напад,
heart attack}, {склянок рідини, glasses of liquid}
Interesting observation:
French and English terms correspond to phrases in Ukrainian:
undercooked foods: не до кінця приготовлену їжу (lit. food which is
not fully cooked)
indolore (painless): При цьому обстеженні Ви не відчуєте жодного
болю (lit. With this exam you will feel no pain)
68/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Conclusion
Proposition of transfer-based methods to
extract the term candidates in Ukrainian
create term pairs Ukrainian/French and Ukrainian/English
Works on freely available multilingual corpora in French,
English and Ukrainian
Resulting terminological resource: 4,588 Ukrainian medical
terms and 34,267 relations with French and English terms
→ Method suitable for building terminology in
low-resourced languages
69/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Future Work
Bilingual word alignment with Fast-Align [Dyer et al.13]
Use of statistical and morphological cues
Use of transfer method for keyphrase extraction from scientific
papers
⇒ Ongoing work with Kyiv Institute of Cybernetics
Proposing a similar term extration method to work with
comparable copora
70/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Overall conclusion
Biomedical text mining: a complex task which involves
several types of information ...
... to link together
many strategies for identifying the information
a lot of terminological and linguistic resources ...
... more or less available or difficult to build according to
languages and areas
Current challenges
concept recognition (disambiguation, normalization)
multilingual approaches
approaches for low-resourced languages
use of information issued from social media
71/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Ongoing funded projects
Mining literature and using Open Linked Data
MIAM Project (French National Agency, 2016)
Mining literature to collect interactions existing between
drugs and food which might lead to adverse drug events
Example: Grapefruit has an adverse effect on the CPY3A4
enzyme contained in many drugs
Objectives:
Aggregating information issued from unstructured data with
knowledge already recored in knowledge bases or Linked
Open Data repository (Drugbank, Thériaque, Sider,
Diseasome, etc.)
Managing certainty and reliability of this information
Formalisation of the interactions in Linked Open Data
72/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Drug-related information
acne
osteoporosis
swelling of face
arterial hypertension
ulcer of stomach
depression
solumedrol
salt
phosphate disodique anhydre
phosphate monosodique anhydre
sodium
lactosis
cortisone
steroidal anti−inflammatory
allergic shock
Quincke oedema
suffocation by larynx oedema
brain oedema
methylprednisolone
adverseeffects
digitaline
insulin
composition
is a
prescribedfor
INN
DDI
FDI
dosage
mode
frequency
reason
duration
prescriptionfeatures73/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Terminology acquisition for Ukrainian
Use of transfer method for keyphrase extraction from
scientific papers
Tuning of YATEA of Ukrainian
Definition and design of methods for terminological and
semantic relation acquisition
74/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Дякую!
75/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Ahmad (Rabiah) et Bath (Peter A). --
Identification of risk factors for 15-year mortality among community-dwelling older people using Cox
regression and a genetic algorithm. Journal of Gerontology, vol. 60 (8), 2005, pp. 1052--8.
Aubin (Sophie) et Hamon (Thierry). --
Improving Term Extraction with Terminological Resources. In : Advances in Natural Language Processing
(5th International Conference on NLP, FinTAL 2006), éd. par Salakoski (Tapio), Ginter (Filip), Pyysalo
(Sampo) et Pahikkala (Tapio). pp. 380--387. --
Springer.
Blake (Catherine). --
A text mining approach to enable detection of candidate risk factors. In : Medinfo, pp. 1528--1528.
Cabré (MT), Estopà (R) et Vivaldi (J). --
Automatic term detection: a review of current systems, pp. 53--88. --
John Benjamins, 2001.
Cerrito (Patricia). --
Inside text Mining. Health management technology, vol. 25 (3), 2004, pp. 28--31.
Chapman (Wendy), Bridewell (Will), Hanbury (Paul), Cooper (Gregory) et Buchanan (Bruce). --
Evaluation of negation phrases in narrative clinical reports. In : Annual Symposium of the American Medical
Informatics Association (AMIA). --
Washington, 2001.
Dyer (Chris), Chahuneau (Victor) et Smith (Noah A.). --
A Simple, Fast, and Effective Reparameterization of IBM Model 2. In : NAACL/HLT, pp. 644--648.
Golik (Wiktoria), Bossy (Robert), Ratkovic (Zorana) et Nédellec (Claire). --
Improving term extraction with linguistic analysis in the biomedical domain. In : Proceedings of the 14th
International Conference on Intelligent Text Processing and Computational Linguistics (CICLing'13). --
Samos, Greece, March 2013.
Grouin (Cyril), Abacha (Asma Ben), Bernhard (Delphine), Cartoni (Bruno), Deléger (Louise), Grau (Brigitte),
Ligozat (Anne-Laure), Minard (Anne-Lyse), Rosset (Sophie) et Zweigenbaum (Pierre). --75/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
CARAMBA: Concept, Assertion, and Relation Annotation using Machine-learning Based Approaches. In :
Proceedings of the workshop I2B2 2010.
Grouin (Cyril), Grabar (Natalia), Hamon (Thierry), Rosset (Sophie), Tannier (Xavier) et Zweigenbaum
(Pierre). --
Eventual situations for timeline extraction from clinical reports. Journal of American Medical Informatics
Association, vol. 20 (5), September 2013, pp. 820--827. --
(IF: 3.609).
Hamon (Thierry) et Grabar (Natalia). --
Linguistic approach for identification of medication names and related information in clinical narratives.
Journal of American Medical Informatics Association, vol. 17 (5), Sep-Oct 2010, pp. 549--554. --
PMID: 20819862.
Hamon (Thierry) et Grabar (Natalia). --
Tuning HeidelTime for identifying time expressions in clinical texts in English and French. In : Proceedings of
The Fifth International Workshop on Health Text Mining and Information Analysis (LOUHI2014) -- Short
paper/Poster, pp. 101--105. --
Gothenburg, Sweden, April 2014.
Hamon (Thierry) et Grabar (Natalia). --
Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian. In :
Proceedings of the 17th International Conference on Intelligent Text Processing and Computational
Linguistics (CICLING2016). --
Springer.
Hamon (Thierry) et Grabar (Natalia). --
Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation. In :
Proceedings of Computational Linguistics and Intelligent Systems (COLINS 2017), pp. 10--19.
Hamon (Thierry), Nazarenko (Adeline), Poibeau (Thierry), Aubin (Sophie) et Derivière (Julien). --
A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis. In : Proceedings of
RIAO 2007. --
Pittsburgh, USA, 2007. 15 pages.
75/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Hamon (Thierry), Graña (Martin), Raggio (Víctor), Grabar (Natalia) et Naya (Hugo). --
Identification of relations between risk factors and their pathologies or health conditions by mining scientific
literature. In : Proceedings of MEDINFO 2010, pp. 964--968. --
PMID: 20841827.
Hamon (Thierry), Grabar (Natalia) et Kokkinakis (Dimitrios). --
Medication Extraction and Guessing in Swedish, French and English. In : Proceedings of MedInfo 2013. --
Copenhagen, Danemark, August 2013.
Hamon (Thierry), Engström (Christopher) et Silvestrov (Sergei). --
Term ranking adaptation to the domain: genetic algorithm based optimisation of the C-Value. In : Proceedings
of PolTAL 2014 -- Advances in Natural Language Processing, éd. par Springer , pp. 71--83.
Kageura (K) et Umino (B). --
Methods of Automatic Term Recognition. In : National Center for Science Information Systems, pp. 1--22.
Kolyshkina (I) et van Rooyen (M). --
Text mining for insurance claim cost prediction, pp. 192--202. --
Springer-Verlag, 2006.
Lopez (Adam), Nossal (Mike), Hwa (Rebecca) et Resnik (Philip). --
Word-Level Alignment for Multilingual Resource Acquisition. In : LREC Workshop on Linguistic Knowledge
Acquisition and Representation: Bootstrapping Annotated Data. --
Las Palmas, Spain, 2002.
McDonald (Ryan), Petrov (Slav) et Hall (Keith). --
Multi-source transfer of delexicalized dependency parsers. In : EMNLP.
Minard (AL), Ligozat (AL), Ben Abacha (A), Bernhard (D), Cartoni (B), Deléger (L), Grau (B), Rosset (S),
Zweigenbaum (P) et Grouin (C). --
Hybrid methods for improving information access in clinical documents: concept, assertion, and relation
identification. J Am Med Inform Assoc, vol. 18 (5), 2011, pp. 588--93.
Pazienza (Maria Teresa), Pennacchiotti (Marco) et Zanzotto (FabioMassimo). --
75/75 Grammarly Meet-up T Hamon
Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion
Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In : Knowledge Mining, éd. par
Sirmakessis (Spiros), pp. 255--279. --
Springer Berlin Heidelberg, 2005.
Périnet (Amandine), Grabar (Natalia) et Hamon (Thierry). --
Identification des assertions dans les textes médicaux : application à la relation {patient, problème médical}.
Traitement Automatique des Langues (TAL), vol. 52 (1), 2011, pp. 97--132.
Strötgen (Jannik) et Gertz (Michael). --
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. In : Proceedings of
the Eigth International Conference on Language Resources and Evaluation (LREC'12). pp. 3746--3753. --
ELRA.
Tsuruoka (Yoshimasa), Tateishi (Yuka), Kim (Jin-Dong), Ohta (Tomoko), McNaught (John), Ananiadou
(Sophia) et Tsujii (Jun'ichi). --
Developing a Robust Part-of-Speech Tagger for Biomedical Text. In : Proceedings of Advances in
Informatics - 10th Panhellenic Conference on Informatics, pp. 382--392.
Yarowsky (David), Ngai (Grace) et Wicentowski (Richard). --
Inducing multilingual text analysis tools via robust projection across aligned corpora. In : HLT.
Zeman (D) et Resnik (P). --
Cross-language parser adaptation between related languages. In : NLP for Less Privileged Languages.
Zweigenbaum (Pierre), Lavergne (Thomas), Grabar (Natalia), Hamon (Thierry), Rosset (Sophie) et Grouin
(Cyril). --
Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case
study. Biomedical Informatics Insights, vol. 6 (Suppl. 1), 2013, pp. 51--62.
75/75 Grammarly Meet-up T Hamon

More Related Content

What's hot

HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsR Systems International
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: ParsingRushdi Shams
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
Introduction to spaCy
Introduction to spaCyIntroduction to spaCy
Introduction to spaCyRyo Takahashi
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligencekulk0003
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020Rui Zhang
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsVaibhav Khanna
 
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfleewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfKristiLBurns
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsArtifacia
 
Evolutionary-Algorithms.ppt
Evolutionary-Algorithms.pptEvolutionary-Algorithms.ppt
Evolutionary-Algorithms.pptlakshmi.ec
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Thien Q. Tran
 
Hidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathHidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathLê Hòa
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in BioinformaticsAli Kishk
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareAseda Owusua Addai-Deseh
 

What's hot (20)

HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Introduction to spaCy
Introduction to spaCyIntroduction to spaCy
Introduction to spaCy
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
NLP tutorial at AIME 2020
NLP tutorial at AIME 2020NLP tutorial at AIME 2020
NLP tutorial at AIME 2020
 
Information retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of wordsInformation retrieval 10 tf idf and bag of words
Information retrieval 10 tf idf and bag of words
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfleewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Attention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its ApplicationsAttention Mechanism in Language Understanding and its Applications
Attention Mechanism in Language Understanding and its Applications
 
Evolutionary-Algorithms.ppt
Evolutionary-Algorithms.pptEvolutionary-Algorithms.ppt
Evolutionary-Algorithms.ppt
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
 
Text categorization
Text categorizationText categorization
Text categorization
 
Hidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable PathHidden Markov Model - The Most Probable Path
Hidden Markov Model - The Most Probable Path
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in Bioinformatics
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
 

Similar to Natural Language Processing for biomedical text mining - Thierry Hamon

High throughput analysis and alerting of disease outbreaks from the grey lite...
High throughput analysis and alerting of disease outbreaks from the grey lite...High throughput analysis and alerting of disease outbreaks from the grey lite...
High throughput analysis and alerting of disease outbreaks from the grey lite...Nigel Collier
 
WCIT 2014 Amnon Shvo - Translational & interoperable health infrastructure
WCIT 2014 Amnon Shvo - Translational & interoperable health infrastructureWCIT 2014 Amnon Shvo - Translational & interoperable health infrastructure
WCIT 2014 Amnon Shvo - Translational & interoperable health infrastructureWCIT 2014
 
Translational & Interoperable Health Infostructure - The Servant of Three Mas...
Translational & Interoperable Health Infostructure - The Servant of Three Mas...Translational & Interoperable Health Infostructure - The Servant of Three Mas...
Translational & Interoperable Health Infostructure - The Servant of Three Mas...WCIT 2014
 
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...Vasile Topac
 
Dtmd jones hodges model h2cm ou sept 2011
Dtmd jones hodges model h2cm ou sept 2011Dtmd jones hodges model h2cm ou sept 2011
Dtmd jones hodges model h2cm ou sept 2011Peter Jones
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Koray Atalag
 
NER Public Health Digital Library Project
NER Public Health Digital Library ProjectNER Public Health Digital Library Project
NER Public Health Digital Library ProjectElaine Martin
 
Health research, clinical registries, electronic health records – how do they...
Health research, clinical registries, electronic health records – how do they...Health research, clinical registries, electronic health records – how do they...
Health research, clinical registries, electronic health records – how do they...Koray Atalag
 
Speech Understanding Dictation To Clinical Data - TEPR 2009
Speech Understanding   Dictation To Clinical Data - TEPR 2009Speech Understanding   Dictation To Clinical Data - TEPR 2009
Speech Understanding Dictation To Clinical Data - TEPR 2009Nick van Terheyden
 
Edet 637 Dual Coding Theory
Edet 637 Dual Coding TheoryEdet 637 Dual Coding Theory
Edet 637 Dual Coding Theoryguestb8ed61
 
IRJET- Text Summarization of Medical Records using Text Mining
IRJET- Text Summarization of Medical Records using Text MiningIRJET- Text Summarization of Medical Records using Text Mining
IRJET- Text Summarization of Medical Records using Text MiningIRJET Journal
 
Sign-out Workshop for New Interns
Sign-out Workshop for New InternsSign-out Workshop for New Interns
Sign-out Workshop for New InternsVineet Arora
 
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.pptturban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.pptDEEPAK948083
 
5 Paragraph EssayDue Wednesday 42915 800pm PST.Review the.docx
5 Paragraph EssayDue Wednesday 42915   800pm PST.Review the.docx5 Paragraph EssayDue Wednesday 42915   800pm PST.Review the.docx
5 Paragraph EssayDue Wednesday 42915 800pm PST.Review the.docxgilbertkpeters11344
 
Speech Understanding – The Key To Unlocking Clinical Knowledge Delivering Sa...
Speech Understanding – The Key To Unlocking Clinical Knowledge  Delivering Sa...Speech Understanding – The Key To Unlocking Clinical Knowledge  Delivering Sa...
Speech Understanding – The Key To Unlocking Clinical Knowledge Delivering Sa...Nick van Terheyden
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reportsSaeed Mehrabi
 
Healthstory Enabling The Emr Dictation To Clinical Data
Healthstory Enabling The Emr   Dictation To Clinical DataHealthstory Enabling The Emr   Dictation To Clinical Data
Healthstory Enabling The Emr Dictation To Clinical DataNick van Terheyden
 

Similar to Natural Language Processing for biomedical text mining - Thierry Hamon (20)

High throughput analysis and alerting of disease outbreaks from the grey lite...
High throughput analysis and alerting of disease outbreaks from the grey lite...High throughput analysis and alerting of disease outbreaks from the grey lite...
High throughput analysis and alerting of disease outbreaks from the grey lite...
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
WCIT 2014 Amnon Shvo - Translational & interoperable health infrastructure
WCIT 2014 Amnon Shvo - Translational & interoperable health infrastructureWCIT 2014 Amnon Shvo - Translational & interoperable health infrastructure
WCIT 2014 Amnon Shvo - Translational & interoperable health infrastructure
 
Translational & Interoperable Health Infostructure - The Servant of Three Mas...
Translational & Interoperable Health Infostructure - The Servant of Three Mas...Translational & Interoperable Health Infostructure - The Servant of Three Mas...
Translational & Interoperable Health Infostructure - The Servant of Three Mas...
 
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...
 
Dtmd jones hodges model h2cm ou sept 2011
Dtmd jones hodges model h2cm ou sept 2011Dtmd jones hodges model h2cm ou sept 2011
Dtmd jones hodges model h2cm ou sept 2011
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
 
NER Public Health Digital Library Project
NER Public Health Digital Library ProjectNER Public Health Digital Library Project
NER Public Health Digital Library Project
 
Health research, clinical registries, electronic health records – how do they...
Health research, clinical registries, electronic health records – how do they...Health research, clinical registries, electronic health records – how do they...
Health research, clinical registries, electronic health records – how do they...
 
Speech Understanding Dictation To Clinical Data - TEPR 2009
Speech Understanding   Dictation To Clinical Data - TEPR 2009Speech Understanding   Dictation To Clinical Data - TEPR 2009
Speech Understanding Dictation To Clinical Data - TEPR 2009
 
Edet 637 Dual Coding Theory
Edet 637 Dual Coding TheoryEdet 637 Dual Coding Theory
Edet 637 Dual Coding Theory
 
IRJET- Text Summarization of Medical Records using Text Mining
IRJET- Text Summarization of Medical Records using Text MiningIRJET- Text Summarization of Medical Records using Text Mining
IRJET- Text Summarization of Medical Records using Text Mining
 
Sign-out Workshop for New Interns
Sign-out Workshop for New InternsSign-out Workshop for New Interns
Sign-out Workshop for New Interns
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.pptturban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
turban_ch07ch07ch07ch07ch07ch07dss9e_ch07.ppt
 
5 Paragraph EssayDue Wednesday 42915 800pm PST.Review the.docx
5 Paragraph EssayDue Wednesday 42915   800pm PST.Review the.docx5 Paragraph EssayDue Wednesday 42915   800pm PST.Review the.docx
5 Paragraph EssayDue Wednesday 42915 800pm PST.Review the.docx
 
Speech Understanding – The Key To Unlocking Clinical Knowledge Delivering Sa...
Speech Understanding – The Key To Unlocking Clinical Knowledge  Delivering Sa...Speech Understanding – The Key To Unlocking Clinical Knowledge  Delivering Sa...
Speech Understanding – The Key To Unlocking Clinical Knowledge Delivering Sa...
 
Data Mining in Rediology reports
Data Mining in Rediology reportsData Mining in Rediology reports
Data Mining in Rediology reports
 
Healthstory Enabling The Emr Dictation To Clinical Data
Healthstory Enabling The Emr   Dictation To Clinical DataHealthstory Enabling The Emr   Dictation To Clinical Data
Healthstory Enabling The Emr Dictation To Clinical Data
 

More from Grammarly

Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering Grammarly
 
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly
 
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly
 
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly
 

More from Grammarly (14)

Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering Vitalii Braslavskyi - Declarative engineering
Vitalii Braslavskyi - Declarative engineering
 
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...
 
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...
 
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...
 
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
Grammarly AI-NLP Club #4 - Understanding and assessing language with neural n...
 
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly Meetup: DevOps at Grammarly: Scaling 100x
Grammarly Meetup: DevOps at Grammarly: Scaling 100x
 
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...
 
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
 
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy GutsGrammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
Grammarly Meetup: Paraphrase Detection in NLP (PART 1) - Yuriy Guts
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Natural Language Processing for biomedical text mining - Thierry Hamon

  • 1. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Natural Language Processing for biomedical text mining Thierry Hamon LIMSI, CNRS, Université Paris-Saclay, Orsay, France Université Paris 13, Sorbonne Paris Cité, Villetaneuse, France hamon@limsi.fr 14/06/2017 1/75 Grammarly Meet-up T Hamon
  • 2. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Context Most of the data are unstructured about 90% of the data produced in 2011 (1.8 trillion of gigabytes) [Oracle, 2011] 85% of data produced in compagnies Unstructured data: textual data Important source of information Accessing and reading are costly, time-consuming and sometimes impossible Need of methods for information retrieval and information extraction 2/75 Grammarly Meet-up T Hamon
  • 3. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Context In biomedical domain, constant increase of amount of Scientific Medical literature Scientific papers in digital libraries or portal Medical, pharmacological, epidemiological reports Electronic Health Records in hospitals Discharge summaries Radiological reports Patient-related textual data documents explaining diseases to patients, health behaviors social media (online discussion forums, twitter messages) 3/75 Grammarly Meet-up T Hamon
  • 4. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Context Example: Scientific article publications Medline (U.S. National Library of Medicine bibliographic database) - https://www.ncbi.nlm.nih.gov/pubmed/ Evolution of the number of references to articles in life sciences Citations Added to MEDLINE® per Year Currently: More than 27 million references 4/75 Grammarly Meet-up T Hamon
  • 5. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion What is text mining? Objective: Extraction of useful and non-trivial knowledge from texts Extraction of information useful for a given application from textual data, i.e. writen in natural language Collecting and linking this information Feed databases or knowledge bases with information extracted from texts Indirectly: allow data mining on unstructured/textual data 5/75 Grammarly Meet-up T Hamon
  • 6. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Data mining vs. Text mining Data mining Methods and algorithms to explore structured data, issued from databases, data warehouse or knowledge bases Objectives: Highlight rules, identify trends or behaviours which are invisible to humans 6/75 Grammarly Meet-up T Hamon
  • 7. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Data mining vs. Text mining Data mining Methods and algorithms to explore structured data, issued from databases, data warehouse or knowledge bases Objectives: Highlight rules, identify trends or behaviours which are invisible to humans Text mining Methods and algorithms to explore unstructured data, i.e. texts written in Natural Language Objectives: Extraction and categorisation of information available in the texts 6/75 Grammarly Meet-up T Hamon
  • 8. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion What are text mining applications? EHR: Search and find relevant information, Hospital information system Provide synthetic views of patient-related information EHR / Scientific literature: Information storage in databases for statistics, epidemiologic survey, Information system in hospital, etc. Formalize information or knowledge Social media: Epidemiologic analysis, Therapeutical Patient Education, Potential adverse drug effect identification 7/75 Grammarly Meet-up T Hamon
  • 9. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion What information to identify? Semantic entities: terms with semantic types Semantic relations between entities Temporal information related to events Numerical information Modifiers for identifying polarity, modality, presence/absence, uncertainty 8/75 Grammarly Meet-up T Hamon
  • 10. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Needs for analysis of biomedical texts Various resources: Terminologies, Ontologies, Open Linked Data Lexica, Consumer Health Vocabularies Semantic description of entities NLP approaches and methods: Rule-based approaches (more or less sophisticated regular expressions) Machine Learning approaches (supervised, semi-supervised, unsupervised) Evaluation against independent reference data 9/75 Grammarly Meet-up T Hamon
  • 11. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Difficulties Textual data may be noisy, sparse, multilingual Text processing is time-consuming, may require contextual information Terminological and semantic variation, semantic ambiguity, unknown or new words and terms, etc. → High and unpredictable number of dimensions Complex and embedded semantic relations 10/75 Grammarly Meet-up T Hamon
  • 12. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Difficulties Ambiguities of the natural language at each level: lexicon: spell[N] vs. spell[V], Apple[company] vs. apple[fruit] гори[V] (a form of burn) vs. гори (inflectional form of mountain) syntax: the doctor examines the patient with a stetoscope Joe experienced severe shortness of breath and chest pain at home while having sex, which became more unpleasant at the emergency room. 11/75 Grammarly Meet-up T Hamon
  • 13. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Difficulties Ambiguities of the natural language at each level: semantics: a red pencil, He reached the bank. поділися (form of disappear) vs. поділися or lemma of share) pragmatics: The chicken is ready to eat. Margaret invited Susan for a visit, and she gave her a good lunch. a very pleasant patient 12/75 Grammarly Meet-up T Hamon
  • 14. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Difficulties Variation in semantically similar wording: Bayer is buying Monsanto Bayer clinches Monsanto Bayer and Monsanto [...] will merge Bayer's announced acquisition of Monsanto Monsanto-Bayer merger Metonymy: the latest Apple/Samsung Metaphor: Web giants, or noir (black gold in French) Spelling errors: Appel(call in French)/Apple Mix of Latin and Ukrainian characters (different UTF-8 codes): i vs. і, o vs. о, p vs. р, y vs. у... 13/75 Grammarly Meet-up T Hamon
  • 15. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Three experiments in biomedical text mining 1 Recognition of Medication, assertion, temporal information in EHR [Hamon and Grabar10, Périnet et al.11, Grouin et al.13, Zweigenbaum et al.13, Hamon and Grabar14] Work with Natalia Grabar (CNRS STL - Lille 3), Amandine Périnet (LIM&BIO - Paris 13), Cyril Grouin, Sophie Rosset, Xavier Tannier, Pierre Zweigenbaum (LIMSI, CNRS) 2 Mining literature for identifying risk factors [Hamon et al.10] Work with Martin Graña, Víctor Raggio and Hugo Naya (Institut Pasteur de Montevideo), and Natalia Grabar (CNRS STL - Lille 3) 3 Cross-Lingual Transfer Methods for Terminology Acquisition [Hamon and Grabar16] Work with Natalia Grabar (CNRS STL - Lille 3) 14/75 Grammarly Meet-up T Hamon
  • 16. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Mining Patients' Electronic Health Records [Hamon and Grabar10, Périnet et al.11, Grouin et al.13, Zweigenbaum et al.13, Hamon and Grabar14] Description of the hospitalization A lot of (personal) information about patients Problems Therapies (treatments, drugs, etc.) Tests and analysis (lab data, etc.) Assertions regarding facts (certainty, hypothesis, etc.) Temporal information (useful for the clinical timeline) The best way to record information (database are difficult to maintain) BUT the texts are written by practitioners: in a hurry, with mistakes, with little or incorrect syntactic structures, etc. 15/75 Grammarly Meet-up T Hamon
  • 17. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Objectives Identification of Medication names given to patients Related information (dosage, duration, frequency, mode of administration, reason for prescription) Assertion: certainty and uncertainty of information in medical texts focus on the relation {patient / medical problem} Temporal expressions: date, time and duration of medical events Participation to several I2B2 Challenges 16/75 Grammarly Meet-up T Hamon
  • 18. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Drug-related information acne osteoporosis swelling of face arterial hypertension ulcer of stomach depression solumedrol salt phosphate disodique anhydre phosphate monosodique anhydre sodium lactosis cortisone steroidal anti−inflammatory allergic shock Quincke oedema suffocation by larynx oedema brain oedema methylprednisolone adverseeffects digitaline insulin composition is a prescribedfor INN DDI FDI dosage mode frequency reason duration prescriptionfeatures17/75 Grammarly Meet-up T Hamon
  • 19. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Assertion task Degree of certainty from abdominal pain With shrimps, the patient suffers The patient is to call the hospital if he suffers from abdominal pain The patient denies suffering from abdominal pain abdominal pain The patient suffers from might suffer from abdominal pain It was thought that the patient Certainty Hypothesis Condition Negative certainty Positive certainty Assertion Possibility 18/75 Grammarly Meet-up T Hamon
  • 20. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Example Medication name, associated information, assertions and time expressions The patient is currently off diuretics at this time. Daily weights should be checked and if her weight increases by more than 3 pounds Dr. Bockoven should be notified. The patient was also started on calcitriol given elevation of parathyroid hormone. Cardiovascular: Rate and rhythm: The patient has a history of atrial fibrillation with a slow ventricular response. Two weeks ago, the patient was started on metoprolol 12.5 mg p.o. q.6 h. for rate control , however , this dose was decreased to 12.5 mg p.o. twice a day, given some bradycardia on her telemetry. The patient was also started on Flecainide 75 mg p.o. q.12 h. She will continue on these two medications upon discharge. 19/75 Grammarly Meet-up T Hamon
  • 21. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Example Medication name, associated information, assertions and time expressions RRR , lots of BS's , neuro nonfocal , ext with 1+ edema. On atenolol , zestril , norvasc , premarin , detrol , lasix 60 qd , nebs prn at home. Labs sig for Cr 0.7 , CK 48 , TnI .05 , QBC 9.5 , Hct 41.3. From CV point of view , thought to be CHF exac. ROMI'd without events on monitor and diuresed 2L/day. IV Lasix 80 bid to start transitioned to 60 po bid. BNP>assay. 6/17 dobut MIBI with mod sized ant septal wall defect c/w diagonal lesion , 3/22 Echo with EF 55-60% , mild LAE/RAE , no WMA , mod large RV. No further CV studies. Cont previously meds on d/c. From FEN point of view , 2 L fluid restriction , 2 g Na restriction. Nutrition consult , but pt very resistant to diet changes. From GI point of view , GERD; nexium started. From pulm point of view , CXR c/w sl fluid overload , no focal findings , no pulm edema. Given NC O2 and BiPAP at night. 20/75 Grammarly Meet-up T Hamon
  • 22. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Material Documents Discharge summaries: 1,249 documents (provided by the I2B2 challenges) 2009: 649 docs in the training set , 553 docs in the test set, 17 manually annotated documents (for illustrating the annotation guidelines) 2010: 349 annotated documents + 827 raw documents in the training set, 477 in the test set Assertions: 11,968 in the training set, 18,550 in the test set 2012: 190 docs in the training set, 120 docs in the test 21/75 Grammarly Meet-up T Hamon
  • 23. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Material Terminologies and lexica Medication names: RxNorm (243,869 entries) and Therapeutic classes and groups of medication from the FDA website Ambiguous medication (red blood cells, magnesium, iron): specific status during the annotation process Medical problems: 45,898 terms (Diagnosis and Morphology axes of the Snomed International), 476 terms from the training set documents Medication-related information Regular expressions for frequency, dosage, duration and mode of administration 52 identification rules for reasons: characterization of Snomed Int terms and/or extracted terms as reasons 22/75 Grammarly Meet-up T Hamon
  • 24. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Material Terminologies and lexica Assertions: Negation: 284 markers from the NegEx resource [Chapman et al.01] Lexical clues: on exertion (condition) Morphological clues: afebrile (negative certainty) Contextual information (342 markers) Clues in the sentence, Section headings ... could represent a multifocal pneumonic process (possible) ALLERGIES, SOCIAL HISTORY, lists Lexico-syntactic patterns (137 patterns) be to (address | request | notify) DT (office | clinic | hospital) if PB (Hypothesis) TE to (evaluate | check | eval | consult) (from | if | with | against) PB (Possibility) 23/75 Grammarly Meet-up T Hamon
  • 25. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Document processing Annotation of the documents Use of terminological and linguistic resources and selection and disambiguation rules CRF-based models [Grouin et al., Minard et al.11] tuning Heideltime system [Strotgen and Gertz12, Hamon and Grabar14] Design of post-processing modules for Disambiguation and negative contexts of medication names Computing of dependency relations between patient, medication names and related information, or assertion Improving the CRF-based system with extracted terms [Aubin and Hamon06] 24/75 Grammarly Meet-up T Hamon
  • 26. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Enriching documents with linguistic information Extraction of terms Ontology Lemmatisation Tagging of the terms Terminoloy Semantic tagging linguistic and structural annotations XML document with of named Dictionary entities Named entity tagging Word and sentence segmentation Specialised lexicon Part−Of−Speech Tagging Tokenisation XML document with structural annotations Symbolic approach: use of NLP methods Terminological resources and disambiguation rules Concurrent annotations and annotation selection Design of post-processing modules for Annotation disambiguation Establishment of dependency relations between patient, medication names and related information, or assertion Annotation based on the Ogmios NLP platform (developed during the EU Project Alvis) 25/75 Grammarly Meet-up T Hamon
  • 27. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Enriching document with linguistic information Identification of the sentences The patient has a history of atrial fibrillation with a slow ventricular response . Two weeks ago , the patient was started on metoprolol 12.5 mg p.o. q.6 h. for rate control ... 26/75 Grammarly Meet-up T Hamon
  • 28. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Enriching document with linguistic information Identification of the sentences, words The patient has a history of atrial fibrillation with a slow ventricular response . Two weeks ago , the patient was started on metoprolol 12.5 mg p.o. q.6 h. for rate control ... 26/75 Grammarly Meet-up T Hamon
  • 29. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Enriching document with linguistic information Identification of the sentences, words, lemma and part-of-speech The DT patient NN has VBZ a DT history NN of IN atrial JJ fibrillation NN with IN a DT slow JJ ventricular JJ response NN . Two CD weeks NNS ago RB , the DT patient NN was VBD started VBN on IN metoprolol FW 12.5 CD mg NN p.o. SYM q.6 FW h. NP for IN rate NN control NN ... 26/75 Grammarly Meet-up T Hamon
  • 30. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Enriching document with linguistic information Identification of the sentences, words, lemma and part-of-speech, named entities [TIMEX3] [DOSAGE] [MODADM] [FREQ] The DT patient NN has VBZ a DT history NN of IN atrial JJ fibrillation NN with IN a DT slow JJ ventricular JJ response NN . Two CD weeks NNS ago RB , the DT patient NN was VBD started VBN on IN metoprolol FW 12.5 CD mg NN p.o. SYM q.6 FW h. NP for IN rate NN control NN ... 26/75 Grammarly Meet-up T Hamon
  • 31. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Enriching document with linguistic information Identification of the sentences, words, lemma and part-of-speech, named entities and terms with semantic types [TIMEX3] [DOSAGE] [MODADM] [FREQ] [DISORDER] [DRUG] [DISORDER] [DISORDER] The DT patient NN has VBZ a DT history NN of IN atrial JJ fibrillation NN with IN a DT slow JJ ventricular JJ response NN . Two CD weeks NNS ago RB , the DT patient NN was VBD started VBN on IN metoprolol FW 12.5 CD mg NN p.o. SYM q.6 FW h. NP for IN rate NN control NN ... 26/75 Grammarly Meet-up T Hamon
  • 32. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Concurrent annotation of documents Preparing material for document annotation Named Entity Recognition (frequency, duration, dosage, mode of administration) + internal disambiguation (avoid nested annotations of different types and merge annotations of the same type) Term and semantic tagging (medication and reasons, negation and reason marker, assertion) based on linguistic information (word and sentence segmentation, lemmatization) + internal disambiguation (nested terms, parenthesed medication names, etc.) 27/75 Grammarly Meet-up T Hamon
  • 33. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Time expression identification [Hamon and Grabar14] Tuning Heideltime system [Strotgen and Gertz12] for English and French EHR Enrichment and encoding of linguistic temporal expressions specific to medical and clinical domain: post-operative day #, b.i.d. meaning twice a day, day of life, etc. Admission date as the reference or starting point for computing relative dates and their normalised value if the admission date is 14 June 2017, the normalised value of 2 days later is 16 June 2017. Additional normalizations of the temporal expressions: normalization the durations in approximate numerical values to avoid undefined values external computation for some durations and frequencies due to limitations in HeidelTime's internal arithmetic processor 28/75 Grammarly Meet-up T Hamon
  • 34. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Annotation selection Processing of ambiguous medication names : laboratory data or medication 1 if a list section: status changed in medication HOME MEDS: methadone 20 bid, imdur 120 bid, hydral taking 25 bid, lasix 20 bid, coumadin, colace, iron, nexium 40 bid Rejection of medicaton names: if in allergy sections ALLERGY: prednisone, penicillins, tamsulosin, simvastatin Removal of drug names in negative contexts Guessing new drug names with semantic patterns m do mo? f [Hamon et al.13] 1 Noun phrases recognized by the term extractor YATEA 2 Stopwords rejected 3 Filtering with typical suffixes of the medication names Diovan 160mg PO BID, HCTZ 25mg PO QD, Imdur ER 60mg PO QD, NTG .4mg PRN CP, Norvasc 10mg PO QD, Pavachol 80mg PO QD. 29/75 Grammarly Meet-up T Hamon
  • 35. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Results Medication task Focus on various parameters for reason identification and guessing medication names RUN2 RUN1 RUN3 System 0.7801 0.7681 (-0.0120) 0.7719 (-0.0082) m 0.8142 0.8093 (-0.0049) 0.808 (-0.0062) do 0.8234 0.8172 (-0.0062) 0.821 (-0.0024) f 0.837 0.8304 (-0.0066) 0.8345 (-0.0025) mo 0.8655 0.8577 (-0.0078) 0.8624 (-0.0031) du 0.3575 0.3516 (-0.0059) 0.3505 (-0.0070) r 0.2867 0.2759 (-0.0108) 0.2666 (-0.0201) RUN1: All reasons RUN2: All reasons without semantic tagging and reason markers RUN3: All reasons without semantic tagging and use of reason markers Guessing medication names 30/75 Grammarly Meet-up T Hamon
  • 36. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Results Medication task exact inexact F P R F P R System 0.7801 0.7997 0.7614 0.7792 0.8111 0.7497 m 0.8142 0.8448 0.7858 0.8304 0.8666 0.7971 do 0.8234 0.8728 0.7793 0.8503 0.8799 0.8226 f 0.837 0.8306 0.8435 0.8411 0.8436 0.8386 mo 0.8655 0.8543 0.877 0.863 0.844 0.8828 du 0.3575 0.3483 0.3673 0.3607 0.3669 0.3546 r 0.2867 0.3047 0.2708 0.3386 0.4386 0.2757 Reason: difficult to identify the exact noun phrases (-13% between inexact and exact precision) 31/75 Grammarly Meet-up T Hamon
  • 37. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Results Assertion task and time expression identification List of markers + section headings Categories Training Test P R F P R F Associated to somebody else 0.96 0.80 0.88 0.84 0.74 0.79 Hypothesis 0.71 0.31 0.43 0.63 0.24 0.35 Condition 0.08 0.40 0.14 0.08 0.33 0.12 Possibility 0.46 0.57 0.51 0.51 0.47 0.49 Absent 0.92 0.75 0.82 0.87 0.75 0.81 Present 0.86 0.90 0.88 0.84 0.87 0.86 Assertions 0.82 0.82 0.82 0.80 0.80 0.80 Precision Recall F-measure Temporal expressions 0.8611 0.8170 0.8385 32/75 Grammarly Meet-up T Hamon
  • 38. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Conclusion F-measure of the system: 0.800 (avg) Analysis of the resource contribution: Importance of the markers Need to include syntactic structures Difficulty to identify certainty degrees few examples for condition and hypothesis 33/75 Grammarly Meet-up T Hamon
  • 39. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Further improvements Medication tasks: Duration extraction: identification of specific prepositional phrases based on parsing Medical problem identification: development of a specific reasoning module Assertion task: Enrich resources with synonyms (Wordnet) Improving the patterns: using syntactic dependencies integrating semantic classes (verbs of evidence, verbs to get in touch with somebody, etc.) 34/75 Grammarly Meet-up T Hamon
  • 40. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Mining literature to identify relations between risk factors and their pathologies [Hamon et al.10] Objective: Massive exploitation of Medline bibliographical database for extracting risk factors and their associations with health conditions Risk factors: increase people's chance to develop a given disease Information on risk factors is wide-spread over the web: websites, bibliographical databases, ... Previous works: Genomic scientific literature (BioCreative, TREC Genomics), clinical records (I2B2 NLP Challenge 2014), processing of narratives [Blake04] Data mining (KDD challenge 2004) [Ahmad and Bath05, Cerrito04, Kolyshkina and van rooyen06] 35/75 Grammarly Meet-up T Hamon
  • 41. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Material Bibliographical database Medline (titles, abtracts) Selection of potential citations/PMIDs, i.e. containing the sequences risk factors, factor of risk 187,544 citations selected: over 42 million word occurrences MeSH (thesaurus for information storage and retrieval) Disease-related MeSH term recognition in citations 36/75 Grammarly Meet-up T Hamon
  • 42. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Document processing 1 Annotation of Medline citations with linguistic information Ogmios NLP platform [Hamon et al.07] Segmentation, POS-tagging & lemmatization -- Genia Tagger [Tsuruoka et al.05] Term recognition but also term extraction -- YATEA [Aubin and Hamon06] 2 Risk factors identification 37/75 Grammarly Meet-up T Hamon
  • 43. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Document processing 1 Annotation of Medline citations with linguistic information Ogmios NLP platform [Hamon et al.07] Segmentation, POS-tagging & lemmatization -- Genia Tagger [Tsuruoka et al.05] Term recognition but also term extraction -- YATEA [Aubin and Hamon06] 2 Risk factors identification 37/75 Grammarly Meet-up T Hamon
  • 44. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term recognition vs. Term extraction Term recognition: Tagging of texts with terms issued from a terminologies Use of more or less complexe methods (string matching, terminological variant computing, semantic distances, ML methods...) Term extraction: Discovering of terms in texts Identification of noun phrases which are potential terms (term candidates) Computing of the strength of the term components (unithood) the strength of the relation to the domain (termhood) [Kageura and Umino96] 38/75 Grammarly Meet-up T Hamon
  • 45. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA Yet Another Term ExtrActor (Aubin&Hamon, 2006) Term extration from French and English texts Shallow parsing of texts Parsing focusing on the parts of the sentence which may contain terms (usually the noun phrases) With recursively applied minimal parsing patterns endogenous learning Term candidate decomposition in Head and Modifier components (component syntactic role in the noun phrase) Each component of a term candidate is also considered as a term candidate Unparseable noun phrases are rejected 39/75 Grammarly Meet-up T Hamon
  • 46. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion YATEA Yet Another Term ExtrActor (Aubin et Hamon, 2006) Several statistical measures are associated with each term candidate (Number of occurrences, C-Value1, C-Value*, etc.) [Hamon et al.14] Module CPAN http://search.cpan.org/~thhamon/Lingua-YaTeA/ Developpement during the European project ALVIS Description of the shallow parsing with configuration files Possibility of tuning for a domain (Bi oYATEA ) [Golik et al.13] For other languages: on-going work for Ukrainian and Arabic 40/75 Grammarly Meet-up T Hamon
  • 47. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA Textes lemmatisation + POS tagging 22CD yoJJ maleNN ,, hNN /SYMoNN primitiveJJ neuroectodermalJJ tumorNN withIN metsNNS toTO brainNN andCC spineNN ,, transferredVBN fromIN Hospital1NNP ,, initiallyRB inIN Dept1NNP andCC thenRB transferredVBN toTO theDT floorNN .. HePRP wasVBD initiallyRB diagnosedVBN withIN aDT thoracicJJ gangliogliomNN //resectedVBN inIN 2012CD .. HePRP hadVBD backJJ painNN inin 2CD /SYM04CD ,, seenVBN atIN Dept2NNP ,, andCC wasbe foundVBN toTO haveVB metsNNS toTO brainNN andCC spineNN .. 41/75 Grammarly Meet-up T Hamon
  • 48. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA Textes lemmatisation + POS tagging Term extraction rule-based approaches Identification of chunks thanks to morpho-syntactic information (frontiers - verbs, adverbs, etc.) 22CD yoJJ maleNN ,, hNN /SYMoNN primitiveJJ neuroectodermalJJ tumorNN withIN metsNNS toTO brainNN andCC spineNN ,, transferredVBN fromIN Hospital1NNP ,, initiallyRB inIN Dept1NNP andCC thenRB transferredVBN toTO theDT floorNN .. HePRP wasVBD initiallyRB diagnosedVBN withIN aDT thoracicJJ gangliogliomNN //resectedVBN inIN 2012CD .. HePRP hadVBD backJJ painNN inin 2CD /SYM04CD ,, seenVBN atIN Dept2NNP ,, andCC wasbe foundVBN toTO haveVB metsNNS toTO brainNN andCC spineNN .. 41/75 Grammarly Meet-up T Hamon
  • 49. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA Parsing of the noun phrases to detect term candidates 1. Identification of term candidates described by parsing patterns NNJJ M H (< H > : Head of the noun phrase, < M > : modifier of the head) neuroectodermal tumor → (neuroectodermal< M > tumor< T >) tumorneuroectodermal M H shortness of breath → shortness< T > of breath< M > (of) breathshortness H M 42/75 Grammarly Meet-up T Hamon
  • 50. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA 2. Use of the previously parsed term candidates (island of reliability) to parse remaining noun phrases Example: primitive neuroectodermal tumor 43/75 Grammarly Meet-up T Hamon
  • 51. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA 2. Use of the previously parsed term candidates (island of reliability) to parse remaining noun phrases Example: primitive neuroectodermal tumor Use of the already parsed term neuroectodermal tumor tumorneuroectodermal M H 43/75 Grammarly Meet-up T Hamon
  • 52. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA 2. Use of the previously parsed term candidates (island of reliability) to parse remaining noun phrases Example: primitive neuroectodermal tumor Use of the already parsed term neuroectodermal tumor primitive tumorneuroectodermal M H 43/75 Grammarly Meet-up T Hamon
  • 53. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA 2. Use of the previously parsed term candidates (island of reliability) to parse remaining noun phrases Example: primitive neuroectodermal tumor Use of the already parsed term neuroectodermal tumor primitive tumorneuroectodermal M H Temporary simplification (folding): primitiveJJ tumorNN 43/75 Grammarly Meet-up T Hamon
  • 54. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA 2. Use of the previously parsed term candidates (island of reliability) to parse remaining noun phrases Example: primitive neuroectodermal tumor Use of the already parsed term neuroectodermal tumor primitive tumorneuroectodermal M H Temporary simplification (folding): primitiveJJ tumorNN Use of the parsing pattern: NNJJ M H → tumorprimitive M H 43/75 Grammarly Meet-up T Hamon
  • 55. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA 2. Use of the previously parsed term candidates (island of reliability) to parse remaining noun phrases Example: primitive neuroectodermal tumor Use of the already parsed term neuroectodermal tumor primitive tumorneuroectodermal M H Temporary simplification (folding): primitiveJJ tumorNN Use of the parsing pattern: NNJJ M H → tumorprimitive M H Unfolding : tumorneuroectodermal M H primitive M H 43/75 Grammarly Meet-up T Hamon
  • 56. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA Textes lemmatisation + POS tagging 22CD yoJJ maleNN ,, hNN /SYMoNN primitiveJJ neuroectodermalJJ tumorNN withIN metsNNS toTO brainNN andCC spineNN ,, transferredVBN fromIN Hospital1NNP ,, initiallyRB inIN Dept1NNP andCC thenRB transferredVBN toTO theDT floorNN .. HePRP wasVBD initiallyRB diagnosedVBN withIN aDT thoracicJJ gangliogliomNN //resectedVBN inIN 2012CD .. HePRP hadVBD backJJ painNN inin 2CD /SYM04CD ,, seenVBN atIN Dept2NNP ,, andCC wasbe foundVBN toTO haveVB metsNNS toTO brainNN andCC spineNN .. 44/75 Grammarly Meet-up T Hamon
  • 57. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA Textes lemmatisation + POS tagging Term extraction rule-based approaches Candidate terms yo male thoracic gangliogliom h back pain o mets primitive neuroectodermal tumor brain mets spine brain floor spine ... 44/75 Grammarly Meet-up T Hamon
  • 58. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Term extraction with YATEA Textes lemmatisation + POS tagging Term extraction rule-based approaches Candidate terms Term ranking frequency term length C-Value Ranked term candidates f l Cv1 f l Cv1 yo male 1 1 1.58 spine 2 1 2 h 1 1 1 floor 1 1 1 o 1 1 0 thoracic gangliogliom 1 2 1.58 mets 2 1 2 back pain 1 2 1.58 brain 2 1 2 primitive neuroectodermal tumor 1 3 2.32 ... 44/75 Grammarly Meet-up T Hamon
  • 59. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Document processing 1 Annotation of Medline citations with linguistic information Ogmios NLP platform [Hamon et al.07] Segmentation, POS-tagging & lemmatization -- Genia Tagger [Tsuruoka et al.05] Term recognition and extraction -- YATEA [Aubin and Hamon06] 2 Risk factors identification 45/75 Grammarly Meet-up T Hamon
  • 60. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Document processing 1 Annotation of Medline citations with linguistic information Ogmios NLP platform [Hamon et al.07] Segmentation, POS-tagging & lemmatization -- Genia Tagger [Tsuruoka et al.05] Term recognition and extraction -- YATEA [Aubin and Hamon06] 2 Risk factors identification 45/75 Grammarly Meet-up T Hamon
  • 61. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Risk factor identification Semantico-syntactic patterns 5 patterns for risk factors and pathologies 12 patterns for handling enumerations 3 patterns for pathologies <NP-RF> as a risk factor for <NP-P> where as a risk factor for: trigger sequence <NP-RF>: noun phrases corresponding to risk factors <NP-P>: pathologies ? and *: optional and recurrent elements MeSH descriptors of citations Descriptors belonging to C heading of diseases 46/75 Grammarly Meet-up T Hamon
  • 62. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Risk factor identification Examples Pattern: <NP-RF-list> is a risk factor for <NP-P> ...a high intake of calcium and phosphorus is a risk factor for the development of metabolic acidosis . (PMID 1435825) Pattern: risk factors for <NP-P>,? include <NP-RF-list> ...had more than one of the common risk factors for cerebrovascular accidents , including hypertension , advanced age , hyperfibrinogenemia , diabetes mellitus , and past history of cerebrovascular accident. (PMID 1560589) 47/75 Grammarly Meet-up T Hamon
  • 63. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Risk factor identification Examples Pattern: <NP-RF-list> is a risk factor for <NP-P> ...a high intake of calcium and phosphorus is a risk factor for the development of metabolic acidosis . (PMID 1435825) Pattern: risk factors for <NP-P>,? include <NP-RF-list> ...had more than one of the common risk factors for cerebrovascular accidents , including hypertension , advanced age , hyperfibrinogenemia , diabetes mellitus , and past history of cerebrovascular accident. (PMID 1560589) 47/75 Grammarly Meet-up T Hamon
  • 64. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Results Application of three kinds of patterns (1) {risk factor, pathology}, (2) risk factors, (3) pathologies Definition of relations: direct relations with patterns {risk factor, pathology} combination of information provided by (2) and (3) 10,445 PMIDs provide information 313 pairs {risk factor, pathology} 15,398 pairs by combination of (2) and (3) 5,873 risk factors (2) not associated with any pathology MeSH indexing: 5,106 pathologies and health conditions 21,584 triplets {risk factor, pathologytext?, pathologyMeSH?} 17,620 (14,895) pairs only provided by the patterns 5,717 (4,412) pairs contain MeSH descriptors as pathology 48/75 Grammarly Meet-up T Hamon
  • 65. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Evaluation Evaluation of precision ratio of correct extractions among the overall results Manual evaluation: no dedicated and comprehensive gold standard is available Comparison with three relationships provided by Snomed CT (nomenclature for organizing and exhanging clinical data) has causative agent: direct cause of the disorder or finding (92,807 relations) bacterial endocarditis has causative agent bacterium due to: relate a clinical finding directly to its cause (25,309 relations) acute pancreatitis due to infection associated with: clinically relevant association between terms without either asserting or excluding a causal or sequential relationship between the two (36,134 relations) fentanyl allergy has causative agent fentanyl 49/75 Grammarly Meet-up T Hamon
  • 66. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Evaluation 1 Quality and exhaustiveness of risk factors for a given pathology Evaluation by medical doctor of 1,102 risk factors for coronary heart disease: 88.38% precision hypertension: {smoking; cigarette smoking; smoking history; importance of total life consumption of cigarettes} 2 Comparison between text mining results for 20 pathologies (3,100 extractions, about 25%) and Snomed CT causal and associative relations (154,130 pairs) 19 extractions (0.6%) considered as already in Snomed CT Snomed CT not dedicated to risk factors, but they may occur acquired immunodeficiency syndrome: {bisexuality, blood transfusion, intravenous drug abuse } 50/75 Grammarly Meet-up T Hamon
  • 67. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Conclusion Extraction of information related to risk factors Relation with associated pathologies Text mining approach based on semantico-syntactic patterns Evaluation by medical doctor and computer scientist 88.38% of risk factors related to coronary heart disease are correct about 70% of extracted pathologies are equivalent with MeSH indexing Snomed CT is not dedidated to the recording of risk factors, although they may occur ⇒ Creation of a dedicated resource for risk factors is suitable 51/75 Grammarly Meet-up T Hamon
  • 68. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Future work Use of other patterns, i.e. predictor, precursor ... Machine learning methods Knowledge representation: homogeneous groups of risk factors environmental, social, clinical, behavioral ... Characterization of this information modal, negative contexts Geographical, demographic variation 52/75 Grammarly Meet-up T Hamon
  • 69. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian [Hamon and Grabar16] Nowadays, methods and automatic tools for several European languages and Japanese [Kageura and Umino96, Cabre et al.01, Pazienza et al.05] For many languages: few NLP tools are available and suitable for automatic terminology extraction while textual data exist and terminological resources are required 53/75 Grammarly Meet-up T Hamon
  • 70. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Our objective Design of specific methods for the acquisition of such terminological resources in Ukrainian Approaches: Compilation of terminological resources Automatic building of terminologies Observations: increasing availability of parallel bilingual corpora Methodology: Use of specialized parallel corpora including a low-resourced language (Ukrainian) to build bilingual and trilingual terminologies by the means of the cross-lingual transfer principle 54/75 Grammarly Meet-up T Hamon
  • 71. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Cross-lingual transfer principle [Yarowsky et al.01, Lopez et al.02] Hypothesis: parallel and aligned corpora with two languages L1 and L2 syntactic or semantic annotations and information from L1 Method: transpose these annotations or information from L1 to L2, obtain the corresponding annotations and information in L2 Efficient way for [Zeman and Resnik08, Mcdonald et al.11] processing multilingual texts from low-resourced languages creating various types of annotations: part-of-speech, semantic categories or even acoustic and prosodic features 55/75 Grammarly Meet-up T Hamon
  • 72. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Drawbacks of the transfer principle The transfer methodology depends on the quality of the extracted information and annotation from L1 texts the quality of alignment usually a statistical alignment method depending on the size of the corpora: the bigger the better → Define an approach to bypass these drawbacks 56/75 Grammarly Meet-up T Hamon
  • 73. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Material Medical data in three languages (Ukrainian, French, and English): Ukrainian Wikipedia: source of relevant terms help for the word-level alignment of the MedlinePlus corpus MedlinePlus corpus: a collection of specialized texts providing the basis for the building of the terminology 57/75 Grammarly Meet-up T Hamon
  • 74. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Medicine-related articles from Ukrainian Wikipedia Selection of the Ukrainian part of the Wikipedia using medicine-related categories, such as Медицина (medicine) or Захворювання (disorders) Potentially covers a wide range of medical notions Use of information in the infobox 58/75 Grammarly Meet-up T Hamon
  • 75. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Parallel medical corpus [Hamon and Grabar17] -- http://natalia.grabar.free.fr/resources.php Patient-oriented brochures in three languages (Ukrainian, French, and English) from MedlinePlus on several medical topics (body systems, disorders and conditions, diagnosis and therapy, health and wellness) created in English and then translated in several other languages (including French and Ukrainian) About 43,000 words for each language English Ukrainian Cancer cells grow and divide more quickly than healthy cells. Cancer treatments are made to work on these fast growing cells. Ракові клітини ростуть і діляться швидше, ніж здорові клітини. При лі- куванні раку здійснюється вплив на ці клітини, що швидко ростуть. - Tiredness - Втома - Nausea or vomiting - Нудота або блювота - Pain - Біль - Hair loss called alopecia - Втрата волосся, що називається алопецією 59/75 Grammarly Meet-up T Hamon
  • 76. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the Wikipedia Objective: complete and help the alignment method applied to the MedlinePlus corpus Use of content of the infoboxes Ukrainian Wikipedia medical part 60/75 Grammarly Meet-up T Hamon
  • 77. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the Wikipedia Objective: complete and help the alignment method applied to the MedlinePlus corpus Use of content of the infoboxes Ukrainian Wikipedia medical part Processing of the InfoBoxes 60/75 Grammarly Meet-up T Hamon
  • 78. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the Wikipedia Objective: complete and help the alignment method applied to the MedlinePlus corpus Use of content of the infoboxes Ukrainian Wikipedia medical part Processing of the InfoBoxes Medical terms with MeSH codes Цукровий діабет тип 2 60/75 Grammarly Meet-up T Hamon
  • 79. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the Wikipedia Objective: complete and help the alignment method applied to the MedlinePlus corpus Use of content of the infoboxes Ukrainian Wikipedia medical part Processing of the InfoBoxes Medical terms with MeSH codes UMLSQuerying UMLS UMLS Цукровий діабет тип 2 NIDDM Type 2 Diabetes Mellitus DID2, Diabète avec insulinorésistance 60/75 Grammarly Meet-up T Hamon
  • 80. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the Wikipedia Objective: complete and help the alignment method applied to the MedlinePlus corpus Use of content of the infoboxes Ukrainian Wikipedia medical part Processing of the InfoBoxes Medical terms with MeSH codes UMLSQuerying UMLS Pairs of medical terms (UK/FR and UK/EN) Цукровий діабет тип 2 NIDDM Type 2 Diabetes Mellitus DID2, Diabète avec insulinorésistance 60/75 Grammarly Meet-up T Hamon
  • 81. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Illustration of the transfer methods English Ukrainian Cancer cells grow and divide more quickly than healthy cells. Cancer treatments are made to work on these fast growing cells. Ракові клітини ростуть і діля- ться швидше, ніж здорові кліти- ни. При лікуванні раку здійсню- ється вплив на ці клітини, що швидко ростуть. - Tiredness - Втома - Nausea or vomiting - Нудота або блювота - Pain - Біль - Hair loss called alopecia - Втрата волосся, що називає- ться алопецією 61/75 Grammarly Meet-up T Hamon
  • 82. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment 62/75 Grammarly Meet-up T Hamon
  • 83. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA 62/75 Grammarly Meet-up T Hamon
  • 84. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Transfer 1 MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA Extraction of UK terms corresponding to lines 62/75 Grammarly Meet-up T Hamon
  • 85. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Transfer 1 MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA Extraction of UK terms corresponding to lines Pairs of candidate terms (UK/FR and UK/EN) 62/75 Grammarly Meet-up T Hamon
  • 86. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Transfer 1 Transfer 2MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA Extraction of UK terms corresponding to lines Pairs of candidate terms (UK/FR and UK/EN) 62/75 Grammarly Meet-up T Hamon
  • 87. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Transfer 1 Transfer 2MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment Giza++ suite (including MkCls) POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA Extraction of UK terms corresponding to lines Pairs of candidate terms (UK/FR and UK/EN) MedlinePlus corpora aligned at the word level 62/75 Grammarly Meet-up T Hamon
  • 88. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Transfer 1 Transfer 2MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment Giza++ suite (including MkCls) POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA Extraction of UK terms corresponding to lines Pairs of candidate terms (UK/FR and UK/EN) MedlinePlus corpora aligned at the word level UK term extraction by transfer 62/75 Grammarly Meet-up T Hamon
  • 89. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Transfer 1 Transfer 2MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment Giza++ suite (including MkCls) POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA Extraction of UK terms corresponding to lines Pairs of candidate terms (UK/FR and UK/EN) MedlinePlus corpora aligned at the word level UK term extraction by transfer Pairs of candidate terms (UK/FR and UK/EN) 62/75 Grammarly Meet-up T Hamon
  • 90. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Transfer 1 Transfer 2MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment Giza++ suite (including MkCls) POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA Extraction of UK terms corresponding to lines Pairs of candidate terms (UK/FR and UK/EN) MedlinePlus corpora aligned at the word level UK term extraction by transfer Pairs of candidate terms (UK/FR and UK/EN) Cross-fertilization with single-word terms 62/75 Grammarly Meet-up T Hamon
  • 91. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Extraction of bilingual terminology from the MedlinePlus corpus Transfer 1 Transfer 2MedlinePlus Corpora UK/FR & UK/EN Cleaning and manual paragraph alignment Giza++ suite (including MkCls) POS tagging with TreeTagger and Flemm FR & EN term extraction with YATEA Extraction of UK terms corresponding to lines Pairs of candidate terms (UK/FR and UK/EN) MedlinePlus corpora aligned at the word level UK term extraction by transfer Pairs of candidate terms (UK/FR and UK/EN) Wikipedia pairs of medical terms Cross-fertilization with single-word terms Cross-fertilization with single-word terms 62/75 Grammarly Meet-up T Hamon
  • 92. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Evaluation Performed by an Ukrainian native speaker having knowledge in medical informatics Manual checking of the extracted candidates: correct/non correct Validation: Terms: independently in each language Bilingual and trilingual relations Computing of the precision of the results: correct answers all the answers with exact and inexact match (the correct term is included or includes the candidate) 63/75 Grammarly Meet-up T Hamon
  • 93. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Results Bilingual terminology from Wikipedia 357 Ukrainian medical terms (among them 177 single-word terms) Use of the MeSH codes and UMLS: 1428 French terms (among them, 339 single-word terms) 3625 English terms (among them, 448 single-word terms) Difference with the number of Ukrainian terms due to the MeSH synonyms Bilingual pairs: 1,515 Ukrainian/French term pairs (270 pairs between single-word terms) 3,789 Ukrainian/English term pairs (405 pairs between single-word terms) Precision: 1 because of the collecting method 64/75 Grammarly Meet-up T Hamon
  • 94. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Results Bilingual terminology from the MedlinePlus - Transfer 1 436 Ukrainian terms with 0.966 precision associated with 316 French terms and 354 English terms 282 triples between Ukrainian/French/English terms (prec.: 0.954) 63 pairs only between Ukrainian/French terms (prec.: 0.937) 115 pairs only between Ukrainian/English terms (prec.: 0.965) Relations involving synonyms: {втома, fatigue/tiredness}, {фаллопієва труба, trompes de fallope/trompe utérine} (fallopian tube), {втрата слуху/втрачається слух, hearing loss} associating several case forms with same English or French form: {вагітність, pregnancy} and {вагітності, pregnancy} 65/75 Grammarly Meet-up T Hamon
  • 95. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Analysis Bilingual terminology from the MedlinePlus - Transfer 1 Few errors: mainly partial match between two languages: {ви можете спати, dormir/sleep} - lit. you can sleep. {появу виразок у роті, mouth sores} - lit. (appearance of) mouth sores Causes of silence: variation due to the translation which prevents the transfer 1 method to extract term in French or English Догляд: match with French title Soins but not with the English title Your care Problem solved by the Transfer 2 method errors in the POS tagging or term extraction strategy Incapacity of the term extractor to identify French or English terms 66/75 Grammarly Meet-up T Hamon
  • 96. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Results Bilingual terminology from the MedlinePlus - Transfer 2 9,040 Ukrainian extracted terms (prec.: 0.454) Exact match: Higher precision of the French (0.674) and English terms (0.761) But low number of terms: 3,671 for French, 3,597 for English Due to the rich morphology of the Ukrainian language: {напад, нападу} - attack, {припадків, припадки} - seizure, {костей, кістки} - bones Extraction of synonymous terms: {биття, удару} - beats, {приступам, припадків} - attacks/seizures Relations: 3,724 pairs of Ukrainian/French terms (prec.: 0.309) 4,745 pairs of Ukrainian/English terms (prec.: 0.401) 4,724 triples of Ukrainian/French/English terms (prec.: 0.419) Inexact match: Higher precision: +0.40 points for the Ukrainian terms, +0.05 for the French and English terms. Due to the alignment quality? 67/75 Grammarly Meet-up T Hamon
  • 97. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Analysis Bilingual terminology from the MedlinePlus - Transfer 2 Error analysis Most of the errors are due to the alignment problems when the alignment is correct, the Ukrainian terms are correctly extracted by the transfer Term analysis Most of the extracted terms are specific to the medical domain {шприца, syringe}, {холестерину, cholesterol}, {фактори ризику, risk factors}, {трахеотомією, tracheostomy}), Other terms: close and approximating notions: {діти, children}, {здорову їжу, healthy diet}, {серцевий напад, heart attack}, {склянок рідини, glasses of liquid} Interesting observation: French and English terms correspond to phrases in Ukrainian: undercooked foods: не до кінця приготовлену їжу (lit. food which is not fully cooked) indolore (painless): При цьому обстеженні Ви не відчуєте жодного болю (lit. With this exam you will feel no pain) 68/75 Grammarly Meet-up T Hamon
  • 98. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Conclusion Proposition of transfer-based methods to extract the term candidates in Ukrainian create term pairs Ukrainian/French and Ukrainian/English Works on freely available multilingual corpora in French, English and Ukrainian Resulting terminological resource: 4,588 Ukrainian medical terms and 34,267 relations with French and English terms → Method suitable for building terminology in low-resourced languages 69/75 Grammarly Meet-up T Hamon
  • 99. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Future Work Bilingual word alignment with Fast-Align [Dyer et al.13] Use of statistical and morphological cues Use of transfer method for keyphrase extraction from scientific papers ⇒ Ongoing work with Kyiv Institute of Cybernetics Proposing a similar term extration method to work with comparable copora 70/75 Grammarly Meet-up T Hamon
  • 100. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Overall conclusion Biomedical text mining: a complex task which involves several types of information ... ... to link together many strategies for identifying the information a lot of terminological and linguistic resources ... ... more or less available or difficult to build according to languages and areas Current challenges concept recognition (disambiguation, normalization) multilingual approaches approaches for low-resourced languages use of information issued from social media 71/75 Grammarly Meet-up T Hamon
  • 101. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Ongoing funded projects Mining literature and using Open Linked Data MIAM Project (French National Agency, 2016) Mining literature to collect interactions existing between drugs and food which might lead to adverse drug events Example: Grapefruit has an adverse effect on the CPY3A4 enzyme contained in many drugs Objectives: Aggregating information issued from unstructured data with knowledge already recored in knowledge bases or Linked Open Data repository (Drugbank, Thériaque, Sider, Diseasome, etc.) Managing certainty and reliability of this information Formalisation of the interactions in Linked Open Data 72/75 Grammarly Meet-up T Hamon
  • 102. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Drug-related information acne osteoporosis swelling of face arterial hypertension ulcer of stomach depression solumedrol salt phosphate disodique anhydre phosphate monosodique anhydre sodium lactosis cortisone steroidal anti−inflammatory allergic shock Quincke oedema suffocation by larynx oedema brain oedema methylprednisolone adverseeffects digitaline insulin composition is a prescribedfor INN DDI FDI dosage mode frequency reason duration prescriptionfeatures73/75 Grammarly Meet-up T Hamon
  • 103. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Terminology acquisition for Ukrainian Use of transfer method for keyphrase extraction from scientific papers Tuning of YATEA of Ukrainian Definition and design of methods for terminological and semantic relation acquisition 74/75 Grammarly Meet-up T Hamon
  • 104. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Дякую! 75/75 Grammarly Meet-up T Hamon
  • 105. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Ahmad (Rabiah) et Bath (Peter A). -- Identification of risk factors for 15-year mortality among community-dwelling older people using Cox regression and a genetic algorithm. Journal of Gerontology, vol. 60 (8), 2005, pp. 1052--8. Aubin (Sophie) et Hamon (Thierry). -- Improving Term Extraction with Terminological Resources. In : Advances in Natural Language Processing (5th International Conference on NLP, FinTAL 2006), éd. par Salakoski (Tapio), Ginter (Filip), Pyysalo (Sampo) et Pahikkala (Tapio). pp. 380--387. -- Springer. Blake (Catherine). -- A text mining approach to enable detection of candidate risk factors. In : Medinfo, pp. 1528--1528. Cabré (MT), Estopà (R) et Vivaldi (J). -- Automatic term detection: a review of current systems, pp. 53--88. -- John Benjamins, 2001. Cerrito (Patricia). -- Inside text Mining. Health management technology, vol. 25 (3), 2004, pp. 28--31. Chapman (Wendy), Bridewell (Will), Hanbury (Paul), Cooper (Gregory) et Buchanan (Bruce). -- Evaluation of negation phrases in narrative clinical reports. In : Annual Symposium of the American Medical Informatics Association (AMIA). -- Washington, 2001. Dyer (Chris), Chahuneau (Victor) et Smith (Noah A.). -- A Simple, Fast, and Effective Reparameterization of IBM Model 2. In : NAACL/HLT, pp. 644--648. Golik (Wiktoria), Bossy (Robert), Ratkovic (Zorana) et Nédellec (Claire). -- Improving term extraction with linguistic analysis in the biomedical domain. In : Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing'13). -- Samos, Greece, March 2013. Grouin (Cyril), Abacha (Asma Ben), Bernhard (Delphine), Cartoni (Bruno), Deléger (Louise), Grau (Brigitte), Ligozat (Anne-Laure), Minard (Anne-Lyse), Rosset (Sophie) et Zweigenbaum (Pierre). --75/75 Grammarly Meet-up T Hamon
  • 106. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion CARAMBA: Concept, Assertion, and Relation Annotation using Machine-learning Based Approaches. In : Proceedings of the workshop I2B2 2010. Grouin (Cyril), Grabar (Natalia), Hamon (Thierry), Rosset (Sophie), Tannier (Xavier) et Zweigenbaum (Pierre). -- Eventual situations for timeline extraction from clinical reports. Journal of American Medical Informatics Association, vol. 20 (5), September 2013, pp. 820--827. -- (IF: 3.609). Hamon (Thierry) et Grabar (Natalia). -- Linguistic approach for identification of medication names and related information in clinical narratives. Journal of American Medical Informatics Association, vol. 17 (5), Sep-Oct 2010, pp. 549--554. -- PMID: 20819862. Hamon (Thierry) et Grabar (Natalia). -- Tuning HeidelTime for identifying time expressions in clinical texts in English and French. In : Proceedings of The Fifth International Workshop on Health Text Mining and Information Analysis (LOUHI2014) -- Short paper/Poster, pp. 101--105. -- Gothenburg, Sweden, April 2014. Hamon (Thierry) et Grabar (Natalia). -- Adaptation of Cross-Lingual Transfer Methods for the Building of Medical Terminology in Ukrainian. In : Proceedings of the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING2016). -- Springer. Hamon (Thierry) et Grabar (Natalia). -- Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation. In : Proceedings of Computational Linguistics and Intelligent Systems (COLINS 2017), pp. 10--19. Hamon (Thierry), Nazarenko (Adeline), Poibeau (Thierry), Aubin (Sophie) et Derivière (Julien). -- A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis. In : Proceedings of RIAO 2007. -- Pittsburgh, USA, 2007. 15 pages. 75/75 Grammarly Meet-up T Hamon
  • 107. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Hamon (Thierry), Graña (Martin), Raggio (Víctor), Grabar (Natalia) et Naya (Hugo). -- Identification of relations between risk factors and their pathologies or health conditions by mining scientific literature. In : Proceedings of MEDINFO 2010, pp. 964--968. -- PMID: 20841827. Hamon (Thierry), Grabar (Natalia) et Kokkinakis (Dimitrios). -- Medication Extraction and Guessing in Swedish, French and English. In : Proceedings of MedInfo 2013. -- Copenhagen, Danemark, August 2013. Hamon (Thierry), Engström (Christopher) et Silvestrov (Sergei). -- Term ranking adaptation to the domain: genetic algorithm based optimisation of the C-Value. In : Proceedings of PolTAL 2014 -- Advances in Natural Language Processing, éd. par Springer , pp. 71--83. Kageura (K) et Umino (B). -- Methods of Automatic Term Recognition. In : National Center for Science Information Systems, pp. 1--22. Kolyshkina (I) et van Rooyen (M). -- Text mining for insurance claim cost prediction, pp. 192--202. -- Springer-Verlag, 2006. Lopez (Adam), Nossal (Mike), Hwa (Rebecca) et Resnik (Philip). -- Word-Level Alignment for Multilingual Resource Acquisition. In : LREC Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Data. -- Las Palmas, Spain, 2002. McDonald (Ryan), Petrov (Slav) et Hall (Keith). -- Multi-source transfer of delexicalized dependency parsers. In : EMNLP. Minard (AL), Ligozat (AL), Ben Abacha (A), Bernhard (D), Cartoni (B), Deléger (L), Grau (B), Rosset (S), Zweigenbaum (P) et Grouin (C). -- Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification. J Am Med Inform Assoc, vol. 18 (5), 2011, pp. 588--93. Pazienza (Maria Teresa), Pennacchiotti (Marco) et Zanzotto (FabioMassimo). -- 75/75 Grammarly Meet-up T Hamon
  • 108. Introduction Mining EHR Mining Literature Terminology building by Transfer Conclusion Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In : Knowledge Mining, éd. par Sirmakessis (Spiros), pp. 255--279. -- Springer Berlin Heidelberg, 2005. Périnet (Amandine), Grabar (Natalia) et Hamon (Thierry). -- Identification des assertions dans les textes médicaux : application à la relation {patient, problème médical}. Traitement Automatique des Langues (TAL), vol. 52 (1), 2011, pp. 97--132. Strötgen (Jannik) et Gertz (Michael). -- Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. In : Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC'12). pp. 3746--3753. -- ELRA. Tsuruoka (Yoshimasa), Tateishi (Yuka), Kim (Jin-Dong), Ohta (Tomoko), McNaught (John), Ananiadou (Sophia) et Tsujii (Jun'ichi). -- Developing a Robust Part-of-Speech Tagger for Biomedical Text. In : Proceedings of Advances in Informatics - 10th Panhellenic Conference on Informatics, pp. 382--392. Yarowsky (David), Ngai (Grace) et Wicentowski (Richard). -- Inducing multilingual text analysis tools via robust projection across aligned corpora. In : HLT. Zeman (D) et Resnik (P). -- Cross-language parser adaptation between related languages. In : NLP for Less Privileged Languages. Zweigenbaum (Pierre), Lavergne (Thomas), Grabar (Natalia), Hamon (Thierry), Rosset (Sophie) et Grouin (Cyril). -- Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case study. Biomedical Informatics Insights, vol. 6 (Suppl. 1), 2013, pp. 51--62. 75/75 Grammarly Meet-up T Hamon