Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bionlp09

429 views

Published on

Published in: Technology, Business
  • Login to see the comments

  • Be the first to like this

Bionlp09

  1. 1. BIONLP'09 Shared Task Farzaneh Sarafraz James Eales Reza Mohammadi Goran Nenadic 26 March 2009    
  2. 2. BioNLP'09 Task 1 Events in abstracts  Given: gene and gene products (proteins)  Wanted: events  type − trigger − participant(s) − cause (if applicable) −    
  3. 3. Example quot;I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. quot; Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3    
  4. 4. Event Types Gene expression Binding   Transcription Regulation   Protein Catabolism Positive regulation   Localisation Negative regulation   Phosphorylation     
  5. 5. Training and Test Data Training data: 800 abstracts  Development data: 150 abstracts  Test data: 260 abstracts     
  6. 6. Our System 1) Finding trigger and type 2) Finding participants (themes) 3) Post processing    
  7. 7. 1) Finding Triggers and Types ­ CRF quot;I kappa B/MAD­3 masks the nuclear localization...quot;  0   0   0  0      9    0     0          0 quot;The binding of I kappa B/MAD­3 to NF­kappa B p65 is  0      0    0 0    0  0   0    0     0    0  0   0 sufficient to retarget NF­kappa B p65 from the   0       0     4        0    0   0   0    0 nucleus to the cytoplasm.quot;  0     0   0      0 9: negative regulation 4: localisation    
  8. 8. CRF features for each token is­protein  is­PPI­word  generic POS tag  log­frequency of token being a trigger for each   event type (10 features) number of proteins in sentence (sentence­level)     
  9. 9. Trigger Detection Post Processing Positive discrimination  Manually looking at false negatives − Adding recurring triggers − Negative discrimination  Manually looking at false positives − Filtering out common mistaken tokens −    
  10. 10. Trigger Detection Results Event Class #Gold R P F­score Localisation 40 77.5 47.69 59.05 Binding 180 33.33 54.55 41.38 Gene expression 282 76.6 58.54 66.36 Transcription 68 58.82 18.6 28.27 88.89 86.49 Protein catabolism 19 84.21 97.5 81.25 88.64 Phosphorylation 40 Non­reg total 629 63.91 48.73 55.3 Regulation 138 13.04 62.07 21.56 Positive regulation462 13.85 54.24 22.07 Neg. regulation 153 29.41 45.92 35.86 All total 1382 38.28 49.44 43.15    
  11. 11. 2) Finding Participants Type and number of participants  1 theme (protein) − 1 theme and 1 cause  − (proteins/other events) Gene expression  Transcription Regulation   Protein Catabolism Positive regulation   Localisation Negative regulation   Phosphorylation  1 or more themes (protein) − Binding     
  12. 12. Parse Tree Distance    
  13. 13. Parse Tree Distance Analysis    
  14. 14. Theme in Subtree Single Theme events  Theme in subtree  0.7054 − Theme not in subtree  0.2946 − Binding event  Any theme in subtree = 0.5435 − Any theme not in subtree = 0.4565 − Regulation events  Either theme or cause in subtree = 0.5919 −     Either theme or cause not in subtree = 0.4081 −
  15. 15. Distance in Trigger Subtree    
  16. 16. Distances not in Trigger Subtree    
  17. 17. Rules Concerning Parse Tree Analysis For quot;bindingquot;, report as themes:  up to the second closest protein in the subtree − and the first closest protein in the rest of the tree − quot;In contrast, gp41 failed to stimulate NF­kappaB  binding activity in as much as no NF­kappaB bound to  the main NF­kappaB­binding site 2 of the IL­10  promoter after addition of gp41.quot; Successfully missing out the final   gp41.    
  18. 18. Example of a Missed (FN) Theme For gene expression  All the proteins in the subtree are reported as  − themes quot;The 15­lipoxygenase (lox) gene is expressed in a  tissue­specific manner, predominantly in  erythroid cells but also in airway epithelial  cells and eosinophils.quot;                 is                /               gene   expressed              |          15­lipoxygenase
  19. 19. Evaluation on Development Data Event Class #Gold R P F­score Localisation 53 67.92 46.75 55.38 Binding 312 21.47 63.81 32.13 Gene expression 356 64.61 76.33 69.98 89.8 Transcription 82 53.66 67.18 77.55 Protein catabolism 21 90.48 67.86 91.49 Phosphorylation 47 53.09 67.19 Non­reg total 871 50.4 68.44 58.05 Regulation 172 5.23 33.33 9.05 Positive regulation 632 3.48 21.36 5.99 Neg. regulation 201 9.45 15.08 11.62 Regulatory total 1005 4.98 19.53 7.93 All total 1876 26.07 54.46 35.26    
  20. 20. Evaluation on Test Data Event Class #Gold R P F­score Localisation 174 44.83 53.06 48.6 Binding 347 12.68 40.37 19.3 722 52.63 69.34 59.84 Gene expression Transcription 137 15.33 67.74 25 Protein catabolism 14 42.86 50 46.15 135 78.52 53.81 63.86 Phosphorylation Non­reg total 1529 41.53 60.82 49.36 Regulation 291 3.09 19.15  5.33 Positive regulation 983 1.12 8.87 1.99 Neg. regulation 379 12.4 20.52 15.46 Regulatory total 1653 4.05 16.75 6.53 All total 3182 22.06 48.61 30.35    
  21. 21. Results: Ranked 12 out of 24 teams Rank R P F­Score Rank R P F­Score 1 46.73 58.48 51.95 13 25.96 36.26 30.26 2 45.82 47.52 46.66 14 20.93 49.3 29.38 3 34.98 61.59 44.62 15 22.69 40.55 29.1 4 36.9 55.59 44.35 16 21.53 36.99 27.21 5 33.41 51.55 40.54 17 17.44 39.99 24.29 6 28.13 53.56 36.88 18 28.63 20.88 24.15 7 28.22 45.78 34.92 19 13.45 71.81 22.66 8 27.75 46.6 34.78 20 22.78 19.03 20.74 9 21.62 62.21 32.09 21 30.42 14.11 19.28 10 21.12 56.9 30.8 22 11.25 66.54 19.25 11 22.5 47.7 30.58 23 11.69 31.42 17.04 12 22.06 48.61 30.35 24 9.4 61.65 16.31    
  22. 22. End.    
  23. 23. Other Tasks Event detection and characterization  Event argument recognition  Negations and speculations     
  24. 24. Example quot;I kappa B/MAD­3 masks the nuclear localization  signal of NF­kappa B p65 and requires the  transactivation domain to inhibit NF­kappa B  p65 DNA binding. quot; Event: negative regulation Trigger: masks Theme1: the first p65 Cause: MAD­3 Site: nuclear localization signal    
  25. 25. Example quot;In contrast, NF­kappa B p50 alone fails to  stimulate kappa B­directed transcription, and  based on prior in vitro studies, is not  directly regulated by I kappa B. quot; Event: regulation Theme1: this p50 Trigger: regulated Negation: true for this event Speculation: none    

×