Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Login to see the comments

  • Be the first to like this


  1. 1. i2b2 Challenge 2009 and Our Participation Irena Spasic Farzaneh Sarafraz Goran Nenadic Summer 2009
  2. 2. About i2b2 Informatics for Integrating Biology & the Bedside Related to NIH 3 Shared tasks so far
  3. 3. The task: Medication Extraction Given Other Discharge reports Event Wanted Temporal Medication mention Certainty Dose Mode of application Frequency Duration Reason List/narrative
  4. 4. Example Ciprofloxacin 500 mg q.6h. for remaining four  doses baby aspirin 81 mg daily , Lasix 40 mg  b.i.d. , for three days along with potassium  chloride slow release 20 mEq b.i.d. for three  days , Motrin 400 mg q.8h. p.r.n. Pain The patient had received a total of five units  of packed red blood cells due to blood loss
  5. 5. Regulations/requirements Medical requirements Drug taken by patient No allergies No food, water, diet, tobacco, alcohol, illicit drugs Linguistic requirements the most informative base adjective phrase or the longest base noun phrase as reason
  6. 6. Required output Event-based annotation Repeat individual mention for each event “Aspirin for headache and for leg pain” Aspirin … headache Aspirin … leg pain Semantic-level expectations NITROGLYCERIN 1/150 ( 0.4 MG ) 1 TAB SL q5min  x 3
  7. 7. Training and test data Ground Truth, 27 records Manually annotated by “PG students” Scrutinised by the community Relative f-score: ~60% Unannotated training data: 620 Test data: 260
  8. 8. Our system Linguistic Preprocessing Input: plain ASCII Output: XML Rules MinorThird Template Filling
  9. 9. Preprocessing Split sentences A sentence and paragraph breaker NaCTeM: sptoolkit.jar POS tagging A part-of-speech tagger for English Tsujii: postagger Parsing (chunking) CFG parser Tsujii: chunkparser
  10. 10. Rules Medication Dictionary (> 1000) Morphological: medication affix (> 100) -bicine, -caine, etc. Precedes a mode Inhaler, supplement, etc. Medication type Cardiac, cardiovascular (~100) Symptoms (~100) Chest discomfort, etc.
  11. 11. Word lists and regular expressions Dosage, mode, frequency Duration (While, for, etc.) Reason Head Diseases Symptoms (pain, agitation, etc.) ~20 Inffixes (hyper-, -emia, etc.) Modifier (acute, chronic, etc.) <100 Time phrases, Body parts
  12. 12. Producing output Remove allergies Remove laboratory results Merge labels <m>INSULIN</m> <m>GLARGINE</m> <f>after dialysis</f> on <f>Monday</f>­ <f>Wednesday</f>­<f>Friday</f> Remove negated medications “patient instructed not to take Viagra.” etc.
  13. 13. Evaluation process Small training data (27) Organisers Community Gold standard test data (260) Annotated by participants Merge and tie-break Community Silver data (620) Voting
  14. 14. Evaluation on ground truth inexact                 horizontal      system­level    X       0.8776 inexact                 horizontal      patient­level   X       0.8928 inexact                 vertical        system­level    do      0.9150 inexact                 vertical        patient­level   do      0.9160 inexact                 vertical        system­level    f       0.9172 inexact                 vertical        patient­level   f       0.9197 inexact                 vertical        system­level    mo      0.9441 inexact                 vertical        patient­level   mo      0.9471 inexact                 vertical        system­level    m       0.9544 inexact                 vertical        patient­level   m       0.9519 inexact                 vertical        system­level    r       0.5260 inexact                 vertical        patient­level   r       0.3876 inexact                 vertical        system­level    du      0.7958 inexact                 vertical        patient­level   du      0.5846
  15. 15. Preliminary evaluation on test data inexact    horizontal  system­level     X       0.7847 inexact    horizontal  patient­level     X      0.7755 inexact    vertical    system­level     do     0.8267 inexact    vertical    patient­level     do   0.8155 inexact    vertical    system­level     f     0.8349 inexact    vertical    patient­level     f     0.8289 inexact    vertical    system­level     mo     0.8359 inexact    vertical    patient­level     mo     0.8256 inexact    vertical    system­level     m     0.8533 inexact    vertical    patient­level     m     0.8541 inexact    vertical    system­level     r     0.3881 inexact    vertical    patient­level     r     0.3883 inexact    vertical    system­level     du     0.51 inexact    vertical    patient­level     du     0.4969