SlideShare a Scribd company logo
1 of 28
Download to read offline
Using SVMs with the Command Relation Feature
to Identify Negated Events in Biomedical
Literature

Farzaneh Sarafraz
Goran Nenadic

School of Computer Science
University of Manchester
sarafraf@cs.man.ac.uk
g.nenadic@manchester.ac.uk
Outline
•   Motivation & aim
•   Molecular events
•   Data & experiments
•   Methods
•   Discussion
•   Summary



                         2 / 27
Motivation & aim
• Biomedical literature
     • 2000 papers published every day
•   Biomedical information extraction needed
     • Improve IE by negation information
     • Negative results are interesting and reported
     • “The IKK complex, but not p90 (rsk), is responsible for the in
       vivo phosphorylation of I-kappa-B-alpha.”
• Resources
     • Shared tasks, data
     • Linguistic tools (syntactic parsers)


                                                                        3 / 27
Problem statement
• Given
  • Pubmed abstracts
  • Protein/gene mentions annotated
  • Molecular events annotated

• Wanted for every event
  • Negated or not

• Classification problem
                                      4 / 27
Molecular events
                      participant   trigger                               participant

“We further show that Nmi interacts with all STATs except Stat2.”

                                    trigger




                                    event
                                                      participation type
            participation type
                                                             {theme, cause}
            {theme, cause}
                                              event type
                     participant              {binding,                  participant
                                              transcription,
                                              regulation,
                   participant type           expression}              participant type
                {gene/protein, event}                               {gene/protein, event}/ 27
                                                                                        5
Molecular events – class I
• One theme (gene/protein)

• “The effect of this synergism was perceptible at
  the level of induction of the IL-2 gene.”
   • Trigger: induction
   • Type: gene expression
   • Theme: IL-2
• Types: transcription, gene expression, phosphorylation,
  protein catabolism, localization
                                                       6 / 27
Molecular events – class II
• One or more themes (gene/protein)

• “We further show that Nmi interacts with all
  STATs except Stat2.”
   • Trigger: interacts
   • Type: binding
   • Themes: Nmi, Stat2
   • Negated
• Type: Binding
                                                 7 / 27
Molecular events – class III
• 1 theme, 0 or 1 cause
   • may be gene/protein or other events
• “Overexpression of full-length ALG-4 induced
  transcription of FasL and, consequently, apoptosis.”

  Event     Trigger            Type              Theme     Cause
  Event 1   “transcription”    Transcription     FasL
  Event 2   “Overexpression”   Gene expression   ALG-4
  Event 3   “Overexpression”   Regulation        Event 2
  Event 4   “induced”          Regulation        Event 1   Event 3


                                                               8 / 27
 • Types: regulation types
Data: BioNLP’09
• Training: 800 abstracts
• Test: 260 abstracts
• Gold annotations
   • Event trigger, type, participants, negation
   • Negation cue not annotated
    Event          Training data          Development data
                                           Test data
    class          total     negated      total    negated
    Class I        2,858     131          559      26
    Class II       887       44           249      15
    Class III      4,870     440          987      66
    Total          9,685     615          1,795    107

                                                             9 / 27
Methodologies
• Rule-based
   • The command relation
• Classification
   • SVM on event representation
      • Lexical features: negation cue, POS
      • Syntactic features: command
      • Semantic features: event types
• Baseline
   • NegEx: event triggers as “terms”
                                              10 / 27
TP
                     Precision =
                                   TP + FP
  Evaluation measures
        TP
Precision =                                    TP
              TP + FP
                    Recall = Sensitivity =
                                             TP + FN

                          TP
Recall = Sensitivity == 2 × Precision× Recall
                   F1
                      TP + FN
                            Precision+ Recall

           Precision × Recall        TN
F1 = 2 ×             Specificity =
           Precision + Recall      TN + FP

                  TN
Specificity =
                TN + FP



                                                       11 / 27
Baseline results


Approach                   P     R     F1    Spec.
No negation detection      -     0%    -     94%
any negation cue present   20%   78%   32%   81%
NegEx                      36%   37%   36%   93%




                                                12 / 27
The command relation
• If a and b are nodes in the constituency parse
  tree of a sentence, then a X-commands b iff the
  lowest ancestor of a with label X is also an
  ancestor of b.




  Ronald Langacker, On Pronominalization and the Chain of Command, in D. Reibel and S. Schane (eds.) Modern
  Studies in English, Prentice-Hall, Englewood Cliffs, NJ. 160-186. 1969.



                                                                                                     13 / 27
Example of the command relation
                    S



                   a        S


• a S-commands b.
• b does not S-command a.   b
                                  14 / 27
X-command
in action
                         S

             We now                 VP
                               show that



                           S                 VP

    a mutant motif that exchanges    fails   to bind the p50
       the terminal 3' C for a G              homodimer.



                                                               15 / 27
Rule-based method
• An event is negated if
  •   Negation cue exists;
and
  •   Negation cue S-commands any participant
  •   Negation cue S-commands trigger
  •   Negation cue S-commands both
  •   Negation cue VP-commands both



                                                16 / 27
Results of rule-based method

Approach                      P     R     F1    Spec.
negation cue S-commands any   23%   76%   35%   84%
participant
negation cue                  23%   68%   34%   85%
S-commands trigger
negation cue                  23%   68%   35%   86%
S-commands both
negation cue                              42%
VP-commands both



                                                        17 / 27
SVM features
• Semantic features
   • Event type
• Lexical features
   • Sentence contains negation cue
   • Negation cue
• Syntactic features
   •   POS of neg cue
   •   POS of event trigger
   •   POS of the participants
   •   Parse tree distance between trigger & cue
   •   Type of smallest phrase containing trigger & cue
   •   Cue S-commands any participant
   •   Cue S-commands trigger


                                                          18 / 27
Results of single SVM, incremental
    feature sets
Feature set      P    R     F1    Spec.

Features 1-7    43%   8%    14%   99.2%

Features 1-8    73%   19%   30%   99.3%

Features 1-9    71%   38%   49%   99.2%

Features 1-10   76%   38%   51%   99.2%



                                     19 / 27
1. Event type

        Results of single SVM, incremental
2. Sentence contains neg
   cue
        feature sets
3. Neg cue
4. POS of neg cue
5. POS of event trigger
6. POS of theset
   Feature participants        P    R     F1    Spec.
7. Type of smallest phrase
   Features 1-7
   containing trigger & cue   43%   8%    14%   99.2%

  Features 1-8                73%   19%   30%   99.3%

  Features 1-9                71%   38%   49%   99.2%

  Features 1-10               76%   38%   51%   99.2%



                                                   20 / 27
1. Event type

        Results of single SVM, incremental
2. Sentence contains neg
   cue
        feature sets
3. Neg cue
4. POS of neg cue
5. POS of event trigger
6. POS of theset
   Feature participants        P    R     F1    Spec.
7. Type of smallest phrase
   Features 1-7
   containing trigger & cue   43%   8%    14%   99.2%
8. Cue S-commands any
   participant 1-8
   Features                   73%   19%   30%   99.3%

  Features 1-9                71%   38%   49%   99.2%

  Features 1-10               76%   38%   51%   99.2%



                                                   21 / 27
1. Event type

        Results of single SVM, incremental
2. Sentence contains neg
   cue
        feature sets
3. Neg cue
4. POS of neg cue
5. POS of event trigger
6. POS of theset
    Feature participants       P    R     F1    Spec.
7. Type of smallest phrase
    Features 1-7
   containing trigger & cue   43%   8%    14%   99.2%
8. Cue S-commands any
   participant 1-8
    Features                  73%   19%   30%   99.3%
9. Cue S-commands
    Features 1-9
   trigger                    71%   38%   49%   99.2%

  Features 1-10               76%   38%   51%   99.2%



                                                   22 / 27
1. Event type

        Results of single SVM, incremental
2. Sentence contains neg
   cue
        feature sets
3. Neg cue
4. POS of neg cue
5. POS of event trigger
6. POS of theset
    Feature participants       P    R     F1    Spec.
7. Type of smallest phrase
    Features 1-7
   containing trigger & cue   43%   8%    14%   99.2%
8. Cue S-commands any
   participant 1-8
    Features                  73%   19%   30%   99.3%
9. Cue S-commands
    Features 1-9
   trigger                    71%   38%   49%   99.2%
10.Parse tree distance
    Features 1-10
   between trigger & cue      76%   38%   51%   99.2%



                                                   23 / 27
Results of separate SVMs for each class
Event class      P      R     F1    Spec.
Class I          94%    65%   77%   99.8%
(559 events)
Class II         100%   33%   50%   100%
(249 events)
Class III        81%    44%   57%   99.2%
(987 events)
Micro-average    88%    49%   63%   99.4%
(1,795 events)
Macro-average    92%    47%   62%   99.7%
(3 classes)
                                            24 / 27
Future work
• Use class-specific features
• Study other variants of command
• Combine negation detection with automatic
    event detection instead of using ‘gold’ events
•   Use negation detection on a larger scale dataset
    (MEDLINE) to find contradictions & contrasts in
    the biomedical literature


                                                 25 / 27
Conclusions
• SVM for extracting negated events
   • >99% specificity
   • 63% F-measure (micro average)
• Different classes of events behave differently
• To detect negated molecular event
   • Event trigger & surface distances not enough
   • Semantic & command features useful
   • Event participants as important as triggers
• Apply on large scale data – MEDLINE
                                                    26 / 27
Acknowledgements
• Organisers of BioNLP’09
• GN TEAM
• Casey Bergman’s lab – Faculty of Life Sciences,
  University of Manchester
• James Eales – University of Manchester
• Jonathan Caruana – University College London


• Web service soon available at
  http://gnode1.mib.man.ac.uk/negmole

                                                    27 / 27
X-command              S
in action
           We now              VP
                             show that



                         S                 VP

   a mutant motif that exchanges   fails   to bind the p50
      the terminal 3' C for a G            homodimer that
                                                        S


                                           is upregulated in LPS tolerant
                                             human Mono Mac 6 cells.
                                                                 28 / 27

More Related Content

Viewers also liked

Viewers also liked (11)

BioNLP09 Winners
BioNLP09 WinnersBioNLP09 Winners
BioNLP09 Winners
 
Eoy
EoyEoy
Eoy
 
Rosario Hearst
Rosario HearstRosario Hearst
Rosario Hearst
 
the_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframethe_life_cycle_of_a_wireframe
the_life_cycle_of_a_wireframe
 
Edu
EduEdu
Edu
 
Tinsleys 7 Accomplishments
Tinsleys 7 AccomplishmentsTinsleys 7 Accomplishments
Tinsleys 7 Accomplishments
 
Language
LanguageLanguage
Language
 
Six Month
Six MonthSix Month
Six Month
 
Defense
DefenseDefense
Defense
 
Olivia Contradictions
Olivia ContradictionsOlivia Contradictions
Olivia Contradictions
 
Ambiguity
AmbiguityAmbiguity
Ambiguity
 

Workshop negations

  • 1. Using SVMs with the Command Relation Feature to Identify Negated Events in Biomedical Literature Farzaneh Sarafraz Goran Nenadic School of Computer Science University of Manchester sarafraf@cs.man.ac.uk g.nenadic@manchester.ac.uk
  • 2. Outline • Motivation & aim • Molecular events • Data & experiments • Methods • Discussion • Summary 2 / 27
  • 3. Motivation & aim • Biomedical literature • 2000 papers published every day • Biomedical information extraction needed • Improve IE by negation information • Negative results are interesting and reported • “The IKK complex, but not p90 (rsk), is responsible for the in vivo phosphorylation of I-kappa-B-alpha.” • Resources • Shared tasks, data • Linguistic tools (syntactic parsers) 3 / 27
  • 4. Problem statement • Given • Pubmed abstracts • Protein/gene mentions annotated • Molecular events annotated • Wanted for every event • Negated or not • Classification problem 4 / 27
  • 5. Molecular events participant trigger participant “We further show that Nmi interacts with all STATs except Stat2.” trigger event participation type participation type {theme, cause} {theme, cause} event type participant {binding, participant transcription, regulation, participant type expression} participant type {gene/protein, event} {gene/protein, event}/ 27 5
  • 6. Molecular events – class I • One theme (gene/protein) • “The effect of this synergism was perceptible at the level of induction of the IL-2 gene.” • Trigger: induction • Type: gene expression • Theme: IL-2 • Types: transcription, gene expression, phosphorylation, protein catabolism, localization 6 / 27
  • 7. Molecular events – class II • One or more themes (gene/protein) • “We further show that Nmi interacts with all STATs except Stat2.” • Trigger: interacts • Type: binding • Themes: Nmi, Stat2 • Negated • Type: Binding 7 / 27
  • 8. Molecular events – class III • 1 theme, 0 or 1 cause • may be gene/protein or other events • “Overexpression of full-length ALG-4 induced transcription of FasL and, consequently, apoptosis.” Event Trigger Type Theme Cause Event 1 “transcription” Transcription FasL Event 2 “Overexpression” Gene expression ALG-4 Event 3 “Overexpression” Regulation Event 2 Event 4 “induced” Regulation Event 1 Event 3 8 / 27 • Types: regulation types
  • 9. Data: BioNLP’09 • Training: 800 abstracts • Test: 260 abstracts • Gold annotations • Event trigger, type, participants, negation • Negation cue not annotated Event Training data Development data Test data class total negated total negated Class I 2,858 131 559 26 Class II 887 44 249 15 Class III 4,870 440 987 66 Total 9,685 615 1,795 107 9 / 27
  • 10. Methodologies • Rule-based • The command relation • Classification • SVM on event representation • Lexical features: negation cue, POS • Syntactic features: command • Semantic features: event types • Baseline • NegEx: event triggers as “terms” 10 / 27
  • 11. TP Precision = TP + FP Evaluation measures TP Precision = TP TP + FP Recall = Sensitivity = TP + FN TP Recall = Sensitivity == 2 × Precision× Recall F1 TP + FN Precision+ Recall Precision × Recall TN F1 = 2 × Specificity = Precision + Recall TN + FP TN Specificity = TN + FP 11 / 27
  • 12. Baseline results Approach P R F1 Spec. No negation detection - 0% - 94% any negation cue present 20% 78% 32% 81% NegEx 36% 37% 36% 93% 12 / 27
  • 13. The command relation • If a and b are nodes in the constituency parse tree of a sentence, then a X-commands b iff the lowest ancestor of a with label X is also an ancestor of b. Ronald Langacker, On Pronominalization and the Chain of Command, in D. Reibel and S. Schane (eds.) Modern Studies in English, Prentice-Hall, Englewood Cliffs, NJ. 160-186. 1969. 13 / 27
  • 14. Example of the command relation S a S • a S-commands b. • b does not S-command a. b 14 / 27
  • 15. X-command in action S We now VP show that S VP a mutant motif that exchanges fails to bind the p50 the terminal 3' C for a G homodimer. 15 / 27
  • 16. Rule-based method • An event is negated if • Negation cue exists; and • Negation cue S-commands any participant • Negation cue S-commands trigger • Negation cue S-commands both • Negation cue VP-commands both 16 / 27
  • 17. Results of rule-based method Approach P R F1 Spec. negation cue S-commands any 23% 76% 35% 84% participant negation cue 23% 68% 34% 85% S-commands trigger negation cue 23% 68% 35% 86% S-commands both negation cue 42% VP-commands both 17 / 27
  • 18. SVM features • Semantic features • Event type • Lexical features • Sentence contains negation cue • Negation cue • Syntactic features • POS of neg cue • POS of event trigger • POS of the participants • Parse tree distance between trigger & cue • Type of smallest phrase containing trigger & cue • Cue S-commands any participant • Cue S-commands trigger 18 / 27
  • 19. Results of single SVM, incremental feature sets Feature set P R F1 Spec. Features 1-7 43% 8% 14% 99.2% Features 1-8 73% 19% 30% 99.3% Features 1-9 71% 38% 49% 99.2% Features 1-10 76% 38% 51% 99.2% 19 / 27
  • 20. 1. Event type Results of single SVM, incremental 2. Sentence contains neg cue feature sets 3. Neg cue 4. POS of neg cue 5. POS of event trigger 6. POS of theset Feature participants P R F1 Spec. 7. Type of smallest phrase Features 1-7 containing trigger & cue 43% 8% 14% 99.2% Features 1-8 73% 19% 30% 99.3% Features 1-9 71% 38% 49% 99.2% Features 1-10 76% 38% 51% 99.2% 20 / 27
  • 21. 1. Event type Results of single SVM, incremental 2. Sentence contains neg cue feature sets 3. Neg cue 4. POS of neg cue 5. POS of event trigger 6. POS of theset Feature participants P R F1 Spec. 7. Type of smallest phrase Features 1-7 containing trigger & cue 43% 8% 14% 99.2% 8. Cue S-commands any participant 1-8 Features 73% 19% 30% 99.3% Features 1-9 71% 38% 49% 99.2% Features 1-10 76% 38% 51% 99.2% 21 / 27
  • 22. 1. Event type Results of single SVM, incremental 2. Sentence contains neg cue feature sets 3. Neg cue 4. POS of neg cue 5. POS of event trigger 6. POS of theset Feature participants P R F1 Spec. 7. Type of smallest phrase Features 1-7 containing trigger & cue 43% 8% 14% 99.2% 8. Cue S-commands any participant 1-8 Features 73% 19% 30% 99.3% 9. Cue S-commands Features 1-9 trigger 71% 38% 49% 99.2% Features 1-10 76% 38% 51% 99.2% 22 / 27
  • 23. 1. Event type Results of single SVM, incremental 2. Sentence contains neg cue feature sets 3. Neg cue 4. POS of neg cue 5. POS of event trigger 6. POS of theset Feature participants P R F1 Spec. 7. Type of smallest phrase Features 1-7 containing trigger & cue 43% 8% 14% 99.2% 8. Cue S-commands any participant 1-8 Features 73% 19% 30% 99.3% 9. Cue S-commands Features 1-9 trigger 71% 38% 49% 99.2% 10.Parse tree distance Features 1-10 between trigger & cue 76% 38% 51% 99.2% 23 / 27
  • 24. Results of separate SVMs for each class Event class P R F1 Spec. Class I 94% 65% 77% 99.8% (559 events) Class II 100% 33% 50% 100% (249 events) Class III 81% 44% 57% 99.2% (987 events) Micro-average 88% 49% 63% 99.4% (1,795 events) Macro-average 92% 47% 62% 99.7% (3 classes) 24 / 27
  • 25. Future work • Use class-specific features • Study other variants of command • Combine negation detection with automatic event detection instead of using ‘gold’ events • Use negation detection on a larger scale dataset (MEDLINE) to find contradictions & contrasts in the biomedical literature 25 / 27
  • 26. Conclusions • SVM for extracting negated events • >99% specificity • 63% F-measure (micro average) • Different classes of events behave differently • To detect negated molecular event • Event trigger & surface distances not enough • Semantic & command features useful • Event participants as important as triggers • Apply on large scale data – MEDLINE 26 / 27
  • 27. Acknowledgements • Organisers of BioNLP’09 • GN TEAM • Casey Bergman’s lab – Faculty of Life Sciences, University of Manchester • James Eales – University of Manchester • Jonathan Caruana – University College London • Web service soon available at http://gnode1.mib.man.ac.uk/negmole 27 / 27
  • 28. X-command S in action We now VP show that S VP a mutant motif that exchanges fails to bind the p50 the terminal 3' C for a G homodimer that S is upregulated in LPS tolerant human Mono Mac 6 cells. 28 / 27