Research paper presented at 15th International Conference on Advances in ICT for Emerging Regions(ICTer).
Adverse drug reactions (ADRs) have become the most common cause of deaths in the world despite post marketing drug surveillance. Expensive clinical trials do not uncover all the ADRs and also cumbersome for consumers and healthcare professionals. Majority of existing methods rely on patients’ spontaneous self-reports. The recent explosion of micro blogging platforms such as Twitter presents a new information source to discover ADRs. In this study, the authors developed a system to automatically extract ADRs from Twitter messages utilizing Natural Language Processing (NLP) techniques. First, the authors proposed a novel method to filter out all the drug related messages from the Twitter data stream. Dictionary based approaches were used to identify medical terminology, emoticons and slang words. The interpretation of “internet language” was also addressed in this research. The best classifier for the classification of ADR reached an accuracy of 68% with F-measure of 69%. The results suggest that there is potential for extracting ADR related information from Twitter messages to support pharmacovigilance.
Dubai Call Girls O528786472 Diabolic Call Girls In Dubai
Identifying adverse drug reactions by analyzing twitter messages
1. Identifying Adverse Drug Reactions
by Analyzing
Twitter Messages
Presented by - Parinda Rajapaksha
115th International Conference on Advances in ICT for Emerging Regions ICTer2015
Authors - Parinda Rajapaksha, Ruvwan Weerasinghe
2. “ The person who takes medicine must
recover twice, once from the
disease and once from
the medicine ”
- William Osler, M.D.
215th International Conference on Advances in ICT for Emerging Regions ICTer2015
3. • Introduction, Motivation & Related works
• Proposed solution, Research Question & limitations
• Design, Implementation & Evaluation
• Discussion
• Future works
ROAD MAP
315th International Conference on Advances in ICT for Emerging Regions ICTer2015
4. • What is an Adverse Drug Reaction (ADR)?
̶ Harm associated with normal dosage during normal use
̶ Unintended, harmful reaction
̶ Nausea, insomnia, hallucination, headache, depression
• Becoming a dire global problem
– Over 770 000 people are injured or died in each year
– Prescription drugs have become 4th leading medical cause of death in
Canada and US
INTRODUCTION
415th International Conference on Advances in ICT for Emerging Regions ICTer2015
5. • Some regulatory bodies have begun programs
̶ Surveillance systems
̶ Reporting systems
̶ Conduct clinical trials
• BUT
– Reporting systems are voluntary in most of the countries
– Spontaneous self reports do not uncover all aspect of drug safety
– Clinical trials are very cumbersome
INTRODUCTION
5
Traditional Solutions
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
6. • Recent explosion of Social Media platforms presents a
valuable information source
• People share personal medical experiences with each other
through online community
MOTIVATION
615th International Conference on Advances in ICT for Emerging Regions ICTer2015
7. • Gurulingappa et al. used MEDLINE case reports
̶ 5 000 drugs were extracted from nearly 3 000 case reports
̶ Ontology driven methodology
• Eiji et al. extracted clinical information from Electronic health
records
– 3 000 discharge summaries accumulated
in one month at Tokyo hospitals
RELATED WORKS
7
Medical Case Reports
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
8. • Robert et al. collected comments of health related web sites
̶ DailyStrength web site
̶ Limited to North America
̶ Not consider demographic
̶ Beneficial and Adverse effects are unclear
• Brant et al. investigated ‘Withdrawn’ and ‘watchlist’ drugs
– Yahoo! Groups
– No. of messages for each drug was not evenly distributed
– Did not have adequate data to prove the analysis
RELATED WORKS
8
Online Health Forums
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
9. • Jiang et al. Analyzed textual and semantic features of Twitter
̶ 2 Billion Tweets , 5 Cancer drugs
̶ Used Topic modeling approach
̶ Performance was limited due to data sparseness and high level of noise
• Clark et al. Extracted 7Million Tweets in Digital drug safety
surveillance research
– Data sample tends to be noisy
– Difference between internet speech, writing patterns and standardize
clinical data
RELATED WORKS
9
Twitter Related
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
10. • Analyze user experiences through Twitter messages
• Twitter as a micro blogging platform
• WHY Twitter??
OUR PROPOSED SOLUTION
1015th International Conference on Advances in ICT for Emerging Regions ICTer2015
11. • Analyze user experiences through Twitter messages
• Twitter as a micro blogging platform
• WHY Twitter??
̶ Public availability, update frequency and message volume
OUR PROPOSED SOLUTION
11
Statistic Brain -2014/01/01 (http://www.statisticbrain.com/twitter-statistics/)
Total number of active Twitter users – 645 million
Average number of tweets per day - 58 million
No. of tweets that happen in every second - 9,100
12. RESEARCH QUESTIONS
12
“ How to Identify drug related Tweets by removing
noise in the Twitter messages and automatically classify
them into adverse effects and other effects? ”
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
13. • Limited to one pharmaceutical name
̶ Very large number of drugs in the world and growing frequently
• Only works for Twitter messages with English texts
– Language processing becomes really hard without knowing other
languages
SCOPE AND LIMITATION
1315th International Conference on Advances in ICT for Emerging Regions ICTer2015
14. DESIGN
14
Data Acquisition & Filtering
Tweet Preprocessing
Text Processing
Classification
Adverse Effects Other Effects
Manual Annotation
Feature Extraction
2.
1.
3.
4.
5.
6.
15. IMPLEMENTATION
15
Data Acquisition
• Ethical concerns?
̶ Accordance with Terms & Conditions of Twitter API
̶ NOT from privet accounts
• Xanax as the test case
̶ Used for Panic disorders and Anxiety disorders
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
16. • Why filtering method?
̶ Capture more useful data while downloading
IMPLEMENTATION
16
Data Filtering
Misspelled
Drug
Names
Trade
Names
Generic
Name
Twitter Data
Stream
Misspelled
words
Dictionary
Filtered Drug
Related Tweets
17. • Assumption 1
̶ These are the only categories which can be given possible misspelled
words
̶ Xxnaa as Xanax NOT possible, hardly misspelled
IMPLEMENTATION
17
Misspelled Word Dictionary
Reason for
Misspelling
Examples
Skip Letters Xnax, Xanx
Double Letters Xaanax, Xannax
Reversed Letters Xnaax, Xaanx
Missed Key Xabax, Xahax, Xajax
Inserted key Xabnax, Xamnax
18. IMPLEMENTATION
18
Data Collection
1 829 (3%)
51 467
(94%)
1 477 (3%)
Generic Name
Trade Names
Misspelled Names
1 477 misspelled Twitter
messages were captured
54 774 messages within 7 weeks
(14 Aug 2014 - 1 Oct 2014 )
19. • Identifying Twitter specific noisy information
̶ Retweets (RT)
̶ User mentions (@)
̶ Hash tags (#)
IMPLEMENTATION
19
Pre-processing
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
20. IMPLEMENTATION
20
Pre-processing
doc said being of the xanax was giving my
heart major issues and causing problems that
weren't even there in the 1st place
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
21. IMPLEMENTATION
21
Text Processing
• Remove advertisements, news and forum posts
• Assumption 2
– Possibility of having a link in a legitimate Twitter message is considerably
low
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
22. IMPLEMENTATION
22
Text Processing
• Replace slang words, emoticons and abbreviations
Slang
Word
Intended Meaning
abt About
w/o without
smh somehow
idk I don’t know
n2g Not too good
lol Laugh out loud
… ………………
Emoticon Intended
Meaning
:-D Big grin
:((( Sad
:’( Crying
:-@ Screaming
O.o Confused
B-) Cool
… ………………
Slang word dictionary (5 242) Emoticons dictionary (80)
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
23. IMPLEMENTATION
23
Text Processing
being doing good on my diet giving up soda and
iced coffee I do not think so laugh out loud
i would need xanax smile
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
24. IMPLEMENTATION
24
Text Processing
• Some Twitter messages are NOT related to the context
̶ “ I need some Xanax”
̶ “ Xanax is expensive but I'm worth it”
̶ “@yung i need to buy xanax but the site won't let me ship to canada?”
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
25. IMPLEMENTATION
25
Medical Terminologies
• Use of medical terminologies
̶ MedDRA (Medical Dictionary for Regulatory Activities)
̶ SIDDER (Side Effect Resource)
̶ Collected 15 205 medical terminologies
• Data set reduced to 3 334 messages after checking 54 774
messages
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
26. IMPLEMENTATION
26
Feature Extraction
• Used Bag-of-word model
̶ Only consider occurrence (presence or absence)
• Stop words NOT removed
̶ A Twitter message can includes really few words
̶ Character limitation <140
• Stemming
– Ex: Takes , Taking Take
– Used Porter stemmer available in Python NLTK
– 4 572 Features
27. IMPLEMENTATION
27
Manual Annotation
• Condition 1 :
– There should be a person or a group of people who involved with the
drug to label a message as Adverse
• Condition 2 :
– Beneficial effects, Conditions or indications as Other
• Condition 3 :
– Sentence should be in affirmative (not interrogative, not subjunctive)
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
28. IMPLEMENTATION
28
Annotated Messages
• Adverse Effects
̶ “ People take xanax so lightly like it's nothing. people get addicted and
die from that shit. i blacked out driving while high on it once ”
̶ “This xanax medicine causes suicidal thoughts when first taking it. what
the f*** ”
̶ “ I should stop taking all the drugs man they are obviously ruining your
brain and making you bipolar. xanax is the main contributor ”
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
29. IMPLEMENTATION
29
Annotated Messages
• Other Effects
̶ “ I am going to sleep now because this xanax got me feeling good ”
̶ “ I need a prescription to xanax or valium or anything that will help me
chill out and sleep for once ”
̶ “ I am not sure if it is the xanax or lack of sleep but f*** i do not feel
real ”
Beneficial Effect
Suspicious feeling
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
30. IMPLEMENTATION
30
Data Distribution
• Data distribution highly unbalanced
• 93% data contributed from Other category
221
3 113
0
500
1000
1500
2000
2500
3000
3500
Advere Effects Other Effects
No.ofobservations
Class Label
Advere Effects
Other Effects
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
31. IMPLEMENTATION
31
Sampling
• Why Undersampling ??
̶ Reduce observations from Other category
̶ Amount of Adverse effects will not change
̶ It will not add synthetic observations to the Adverse effect class
A O A O A O
Initial Behavior Undersampling Oversampling
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
32. IMPLEMENTATION
32
Classification
• Used Naïve Bayes algorithm
̶ Best and naïve classification algorithm for text classification approaches
according to the literature
̶ Achieve highest classifier performance in related works
• Compared with Decision Tree algorithm
• 10 fold cross validation
• Used WEKA tool box
– It supports for data pre-processing, regression, classification, clustering
and data visualization
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
33. • Purpose is to identifying Adverse effects as much as possible
EVALUATION
33
Balanced vs. Unbalanced Data Set
Balanced Data Set (A-221 O-221)
Adverse Effect 0.67 0.71 0.69 0.71
Other 0.69 0.65 0.67 0.71
Unbalanced Data Set (A-221, O-3113)
Precision Recall F-Measure AUC
Adverse Effect 0.18 0.19 0.18 0.71
Other 0.94 0.93 0.94 0.71
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
34. • Proposed solution (Balanced data set) perform really well
EVALUATION
34
Balanced vs. Unbalanced Data Set
Unbalanced Balanced
No. of Observations 3 334 442
Training Time (sec) 20.8 1.6
Accuracy 89% 68%
Contributed from Other category
Reduced data set
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
35. EVALUATION
35
NB vs. DT
Naïve Bayes True Class
AE O
Classified
As
AE 157 64
O 78 143
Decision Tree True Class
AE O
Classified
As
AE 137 84
O 83 138
68% of Accuracy
62% of Accuracy
36. EVALUATION
36
NB vs. DT
Naïve Bayes
Precision Recall F-Measure AUC
Adverse Effect 0.67 0.71 0.69 0.71
Other 0.69 0.65 0.67 0.71
Decision Tree
Adverse Effect 0.62 0.62 0.62 0.72
Other 0.62 0.62 0.62 0.72
• Proposed solution (Naïve Bayes) perform really well
38. EVALUATION
38
Error Analysis
• Curse of dimensionality
– The performance of the classier decreases when the dimensionality of
the problem becomes too large
4 572 features
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
39. CONCLUSION
39
• Proposed a method to identify ADR from Twitter data
• Proposed filtering method capture 1 477 (3%) additional data
• All the performance measurements lie around 70%. Training
time 1.6 seconds
• ‘Curse of dimensionality ‘ has reduced the performance of
classifier
• Results suggested the potential for extracting ADR related
information from Twitter
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
40. FUTURE WORKS
40
1) Degree or level of ADR
– Can be categorized the effect into High, normal, low
– Useful in prioritizing the effects in pharmacovigilance
EX:
– “ This medicine causes suicidal thoughts when first taking it ” -
Extremely negative
– “ I'm stressed I can't even sleep after using this pills ” – High
– “ Its 2:20 a.m. and I am yawning and shaking my head vamplife ” – Less
harm
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
41. FUTURE WORKS
41
2) Identifying the dissemination of drug users
– Twitter provides geo locations of Twitter messages
– Weather conditions and habitual actions of each country may affect to
the drug and their effects
EX:
̶ “ My mom asks me to get beer while picking up a Xanax prescription.
So that looks good ” Beer + Xanax = ??
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
42. 42
Thank You
NLP could save a Life !
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
43. PILOT STUDY
43
2014/04/08 8.30 AM - 11.30 AM
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
45. SIGNIFICANCE OF THIS RESEARCH
45
• Filtering method
̶ Identified misspelled Drug related messages
̶ Capture more useful data
• Removing Advertisements, Forum posts and News related
Twitter messages
• Building medical corpus to remove unwanted Twitter messages
̶ SIDDER
̶ MedDRA
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
46. TOOLS & TECHNOLOGIES
46
• Data acquisition & filtering
̶ Twitter streaming API
̶ Tweepy library in Python
̶ Key word typo generator online tool
• Text processing
̶ Python NLTK (Natural Language Tool Kit)
̶ Porter stemmer
• Classification, sub sampling , ensemble learning and data
visualization
̶ WEKA tool box
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
47. DATA DISTRIBUTION
47
25 289
(46%)
RT
29 485
(54%)
Pre-Processed Messages Retweets
Adds, News,
Forum posts
23 176
Duplicate
Messages
15th International Conference on Advances in ICT for Emerging Regions ICTer2015
49. TEXT PROCESSING
49
Raw Message After Text Processing
@utemim @goddess1207 @LadyZ_712
@misscolor63 @Cozyrosy1 There isn't
enough Xanax to make me spend an
hour w/ a room full of 5yr olds!
there is not enough xanax to make
me spend an hour with a room full of
5 year olds
Being doing good on my diet, giving up
soda and iced coffee I don't think so lol I
would need Xanax :)
being doing good on my diet giving
up soda and iced coffee i do not think
so laugh out loud i would need xanax
smile
@OMGImBoss i WANT MY XANAX BITCH
:( AND im asking u with who you or they
want papakush there?? O.o
i want my xanax bitch sad and i am
asking you with who you or they want
papakush there confused
50. JUST ASSUME…
50
EBOLA Virus Affected to millions of people around the world
Doctors found a cure
Tested using monkeys
An Adverse Drug Reaction is an expression that describes the harm associated with the use of given medications at a normal dosage during normal use.
It could be an unintended, considerably harmful reaction resulting from an involvement of a medicinal product.
Majority of patients may experience ADR soon after taking the drugs.
ADR is currently becoming a dire global problem causing higher numbers of deaths.
According to the annual report of the Agency of Healthcare Research and Quality, over 770,000 people are injured or died each year
The Journal of the American Medical Association (JAMA) states that prescription drugs have now become the fourth leading medical cause of death in
the United State and Canada
Therefore identifying ADRs in an appropriate manner plays a critical role in the arena of pharmacovigilance.
As a solution to the this problem, some of the regulatory bodies like FDA have begun programs like surveillance systems.
Most of the existing methods rely on patients’ “spontaneous” self-reports that attest problems
BUT
Expensive clinical trials are very cumbersome for consumers and healthcare professionals.
Clinical trials are very cumbersome for consumers and healthcare professionals because it takes much time and resources conduct the clinical trials
clinical trial is conducted with few patients as a sample, once a drug is released to the market, millions of patients around the world may take it.
The recent explosion of social media platforms such as Facebook, Twitter, and Google+ presents a new information source for discovering and analyzing various forms of social characteristics.
The users in an online community often share a wide variety of personal medical experiences with each other rather than in a clinical research study or with their physician
in general, sleepiness is treated as an adverse effect term. But someone suffering from insomnia would consider it as beneficial effect
But the number of health messages for each drug was not evenly distributed among the data set.
data sample tends to be extremely noisy due to the dierence between
internet speech and writing patterns and stand
Recent works have correlated with the prior knowledge
to the tracking illness over time, measuring behavioral risk factors, localizing
illnesses by geographic region and detecting inuenza epidemicsardized clinical data.
So the our proposed solution was to analyze the user experiences through Twitter to identify ADR
Among various online communities, Twitter is an online social network that permits millions of people around the world to stay connected to the every corner of the world.
As a micro blogging platform, Twitter has an enormous number of users who are frequently sharing and discussing their personal experiences,
opinions, thoughts, and random details of their lives.
The emergence of Twitter represents a valuable data source because of its public availability, update frequency and message volume
So the our proposed solution was to analyze the user experiences through Twitter to identify ADR
Among various online communities, Twitter is an online social network that permits millions of people around the world to stay connected to the every corner of the world.
As a micro blogging platform, Twitter has an enormous number of users who are frequently sharing and discussing their personal experiences,
opinions, thoughts, and random details of their lives.
The emergence of Twitter represents a valuable data source because of its public availability, update frequency and message volume
According to the Static Brain2, there are approximately 645 million active Twitter users in the world.
The Average number of tweets per day is 58 million.
In addition, approximately 9,100 tweets happen in every second around the world.
Among them, there may be a significant number of users who share and discuss their experiences related to health, drugs, and their
adverse effects.
Though a twitter post contains little informational content related to the individual's health, aggregation of thousands of twitter posts may generate valuable knowledge
Also the previous researchers have proven that social media related data can be used to identify various social characteristics.
There is a very large number of drugs in the world and growing frequently.
Also the FDA has investigated a large number of adverse effects.
But it is very difficult to judge all these drugs and their effects within this scope.
As a solution, one pharmaceutical name has been selected to evaluate this model by reducing the scope into few outcomes rather than checking all the possibilities.
After building and evaluating the model, any pharmaceutical can be used as the input.
The extracted Twitter messages may contain text in languages other than English.
It would be very hard to process these Twitter messages without knowing all of these languages.
To overcome this problem, the scope of this research has been limited to messages in the English language.
Therefore this model will only work for English texts.
Selecting a drug name among thousands of pharmaceuticals is somewhat difficult because some of the drug names are not frequently discussed in the Twitter
Therefore understanding the dissemination of health related information through Twitter will be very helpful in selecting a proper drug name as an input to the proposed system.
To get a clear perception about the data source and its distribution, we have conduct an initial experiment.
We have chosen most popular 15 pharmaceutical names for our initial experiment.
These names were selected from several well-known web pages that were related to the public health.
I collected more than 700 tweets within 3 hours and this diagram shows the number of collected twitter messages.
nearly half of the users in above distribution were discussing and sharing information about 'Xanax'
Xanax use in medical treatment of panic disorder and anxiety disorders.