SlideShare a Scribd company logo
1 of 30
R by example: mining Twitter for consumer
attitudes towards airlines

presented at the

Boston Predictive Analytics
MeetUp
by


Jeffrey Breen
President
Cambridge Aviation Research

jbreen@cambridge.aero

June 2011




  Cambridge Aviation Research   ā€¢ 245 First Street ā€¢ Suite 1800 ā€¢ Cambridge, MA 02142 ā€¢ cambridge.aero




                                                                                                         Ā© Copyright 2010 by Cambridge Aviation Research. All rights reserved.
Airlines top customer satisfaction... alphabetically




http://www.theacsi.org/                                     3
Actually, they rank below the Post
    Office and health insurers




                                     4
which gives us plenty to listen to
                               Completely unimpressed with @continental or @united.
RT @dave_mcgregor:       Poor communication, goofy reservations systems and
Publicly pledging to                       all to turn my trip into a mess.
never fly @delta again.
The worst airline ever.
U have lost my patronage     @united #fail on wifi in red carpet clubs (too
forever due to ur            slow), delayed flight, customer service in red
incompetence                 carpet club (too slow), hmmm do u see a trend?



@United Weather delays may not be your fault,
but you are in the customer service business.
It's atrocious how people are getting treated!
We were just told we are delayed 1.5        @SouthwestAir I know you don't make the
hrs & next announcement on @JetBlue -      weather. But at least pretend I am not a
ā€œWe're selling headsets.ā€ Way to           bother when I ask if the delay will make
capitalize on our misfortune.                                    miss my connection
         @SouthwestAir
     I hate you with every            Hey @delta - you suck! Your prices
    single bone in my body          are over the moon & to move a flight
   for delaying my flight by         a cpl of days is $150.00. Insane. I
   3 hours, 30mins before I              hate you! U ruined my vacation!
    was supposed to board.
             #hate
Game Plan
Search Twitter for
airline mentions &
collect tweet text            Score sentiment for    Summarize for each
                                  each tweet              airline
 Load sentiment
   word lists


                                                           Compare Twitter
                                                         sentiment with ACSI
                                                           satisfaction score
                       Scrape ACSI web site for
                     airline customer satisfaction
                                 scores




                                                                                14
Game Plan
Search Twitter for
airline mentions &
collect tweet text            Score sentiment for    Summarize for each
                                  each tweet              airline
 Load sentiment
   word lists


                                                           Compare Twitter
                                                         sentiment with ACSI
                                                           satisfaction score
                       Scrape ACSI web site for
                     airline customer satisfaction
                                 scores




                                                                                15
Searching Twitter in one line
Rā€™s XML and RCurl packages make it easy to grab web data, but Jeff
Gentryā€™s twitteR package makes searching Twitter almost too easy:

> # load the package
> library(twitteR)
> # get the 1,500 most recent tweets mentioning ā€˜@deltaā€™:
> delta.tweets = searchTwitter('@delta', n=1500)




See what we got in return:              A ā€œlistā€ in R is a collection of
                                        objects and its elements may be
> length(delta.tweets)                  named or just numbered.
[1] 1500
> class(delta.tweets)
[1] "list"
                                        ā€œ[[ ]]ā€ is used to access elements.
Examine the output
Letā€™s take a look at the ļ¬rst tweet in the output list:

    > tweet = delta.tweets[[1]]
                                       tweet is an object of type ā€œstatusā€
                                       from the ā€œtwitteRā€ package.
    > class(tweet)
    [1] "status"
    attr(,"package")                   It holds all the information about
    [1] "twitteR"                      the tweet returned from Twitter.



The help page (ā€œ?statusā€) describes some accessor methods like
getScreenName() and getText() which do what you would expect:

    > tweet$getScreenName()
    [1] "Alaqawari"
    > tweet$getText()
    [1] "I am ready to head home. Inshallah will try to get on the earlier
    flight to Fresno. @Delta @DeltaAssist"
Extract the tweet text
R has several (read: too many) ways to apply functions iteratively.
ā€¢The plyr package uniļ¬es them all with a consistent naming convention.
ā€¢The function name is determined by the input and output data types. We
have a list and would like a simple array output, so we use ā€œlaplyā€:

> delta.text = laply(delta.tweets, function(t) t$getText() )


> length(delta.text)[1] 1500
> head(delta.text, 5)
[1] "I am ready to head home. Inshallah will try to get on the earlier
flight to Fresno. @Delta @DeltaAssist"
[2] "@Delta Releases 2010 Corporate Responsibility Report - @PRNewswire
(press release) : http://tinyurl.com/64mz3oh"
[3] "Another week, another upgrade! Thanks @Delta!"
[4] "I'm not able to check in or select a seat for flight DL223/KL6023 to
Seattle tomorrow. Help? @KLM @delta"
[5] "In my boredom of waiting realized @deltaairlines is now @delta
seriously..... Stil waiting and your not even unloading status yet"
Game Plan
Search Twitter for
airline mentions &
collect tweet text            Score sentiment for    Summarize for each
                                  each tweet              airline
 Load sentiment
   word lists


                                                           Compare Twitter
                                                         sentiment with ACSI
                                                           satisfaction score
                       Scrape ACSI web site for
                     airline customer satisfaction
                                 scores




                                                                                19
Estimating Sentiment

There are many good papers and resources describing methods to
estimate sentiment. These are very complex algorithms.



For this tutorial, we use a very simple algorithm which assigns a score by
simply counting the number of occurrences of ā€œpositiveā€ and ā€œnegativeā€
words in a tweet. The code for our score.sentiment() function can be
found at the end of this deck.


Hu & Liu have published an ā€œopinion lexiconā€ which categorizes
approximately 6,800 words as positive or negative and which can be
downloaded.


            Positive: love, best, cool, great, good, amazing
            Negative: hate, worst, sucks, awful, nightmare
                                                                        20
Load sentiment word lists
1. Download Hu & Liuā€™s opinion lexicon:


   http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html


2. Loading data is one of Rā€™s strengths. These are simple text ļ¬les,
though they use ā€œ;ā€ as a comment character at the beginning:

   > hu.liu.pos = scan('../data/opinion-lexicon-English/positive-
   words.txt', what='character', comment.char=';')

   > hu.liu.neg = scan('../data/opinion-lexicon-English/negative-
   words.txt', what='character', comment.char=';')



3. Add a few industry-speciļ¬c and/or especially emphatic terms:

   > pos.words = c(hu.liu.pos, 'upgrade')         The c() function
   > neg.words = c(hu.liu.neg, 'wtf', 'wait',     combines objects
     'waiting', 'epicfail', 'mechanical')         into vectors or lists
Game Plan
Search Twitter for
airline mentions &
collect tweet text            Score sentiment for    Summarize for each
                                  each tweet              airline
 Load sentiment
   word lists


                                                           Compare Twitter
                                                         sentiment with ACSI
                                                           satisfaction score
                       Scrape ACSI web site for
                     airline customer satisfaction
                                 scores




                                                                                22
Algorithm sanity check
    > sample = c("You're awesome and I love you",
          "I hate and hate and hate. So angry. Die!",
          "Impressed and amazed: you are peerless in your achievement of
          unparalleled mediocrity.")
    > result = score.sentiment(sample, pos.words, neg.words)
    > class(result)
                                   data.frames hold tabular data so they
    [1] "data.frame"
                                   consist of columns & rows which can
    > result$score
                                   be accessed by name or number.
    [1]   2 -5   4
                                   Here, ā€œscoreā€ is the name of a column.


So, not so good with sarcasm. Here are a couple of real tweets:

    > score.sentiment(c("@Delta I'm going to need you to get it together.
    Delay on tarmac, delayed connection, crazy gate changes... #annoyed",
    "Surprised and happy that @Delta helped me avoid the 3.5 hr layover I
    was scheduled for. Patient and helpful agents. #remarkable"),
    pos.words, neg.words)$score
    [1] -4   5
Accessing data.frames
Hereā€™s the data.frame just returned from score.sentiment():
   > result
       score                                                                                     text

   1           2                                                         You're awesome and I love you

   2      -5                                                   I hate and hate and hate. So angry. Die!

   3           4 Impressed and amazed: you are peerless in your achievement of unparalleled mediocrity.



Elements can be accessed by name or position, and positions can be
ranges:
   > result[1,1]
   [1] 2
   > result[1,'score']
   [1] 2
   > result[1:2, 'score']
   [1]         2 -5
   > result[c(1,3), 'score']
   [1] 2 4
   > result[,'score']
   [1]         2 -5     4
Score the tweets
To score all of the Delta tweets, just feed their text into
score.sentiment():

    > delta.scores = score.sentiment(delta.text, pos.words,     Progress bar
    neg.words, .progress='text')                                provided by
    |==================================================| 100%   plyr

Letā€™s add two new columns to identify the airline for when we
combine all the scores later:
    > delta.scores$airline = 'Delta'
    > delta.scores$code = 'DLā€™
Plot Deltaā€™s score distribution
Rā€™s built-in hist() function will create and plot histograms of your data:
    > hist(delta.scores$score)
The ggplot2 alternative
ggplot2 is an alternative graphics package which generates more reļ¬ned
graphics:
   > qplot(delta.scores$score)
Lather. Rinse. Repeat
To see how the other airlines fare, collect & score tweets for other
airlines.


Then combine all the results into a single ā€œall.scoresā€ data.frame:

    > all.scores = rbind( american.scores, continental.scores, delta.scores,
    jetblue.scores, southwest.scores, united.scores, us.scores )



                                                rbind() combines
                                                rows from
                                                data.frames, arrays,
                                                and matrices
Compare score distributions
   ggplot2 implements ā€œgrammar of graphicsā€, building plots in layers:
       > ggplot(data=all.scores) + # ggplot works on data.frames, always
            geom_bar(mapping=aes(x=score, fill=airline), binwidth=1) +
            facet_grid(airline~.) + # make a separate plot for each airline
            theme_bw() + scale_fill_brewer() # plain display, nicer colors




ggplot2ā€™s faceting
capability makes it
easy to generate the
same graph for
different values of a
variable, in this case
ā€œairlineā€.
Game Plan
Search Twitter for
airline mentions &
collect tweet text            Score sentiment for    Summarize for each
                                  each tweet              airline
 Load sentiment
   word lists


                                                           Compare Twitter
                                                         sentiment with ACSI
                                                           satisfaction score
                       Scrape ACSI web site for
                     airline customer satisfaction
                                 scores




                                                                                30
Ignore the middle
Letā€™s focus on very negative (<-2) and positive (>2) tweets:
    > all.scores$very.pos = as.numeric( all.scores$score >= 2 )
    > all.scores$very.neg = as.numeric( all.scores$score <= -2 )


For each airline ( airline + code ), letā€™s use the ratio of very positive to
very negative tweets as the overall sentiment score for each airline:
    > twitter.df = ddply(all.scores, c('airline', 'code'), summarise,
    pos.count = sum( very.pos ), neg.count = sum( very.neg ) )
    > twitter.df$all.count = twitter.df$pos.count + twitter.df$neg.count
    > twitter.df$score = round( 100 * twitter.df$pos.count /
                 twitter.df$all.count )

Sort with orderBy() from the doBy package:
        > orderBy(~-score, twitter.df)
Any relation to ACSIā€™s airline scores?




http://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines

                                                                                                       18
Game Plan
Search Twitter for
airline mentions &
collect tweet text            Score sentiment for    Summarize for each
                                  each tweet              airline
 Load sentiment
   word lists


                                                           Compare Twitter
                                                         sentiment with ACSI
                                                           satisfaction score
                       Scrape ACSI web site for
                     airline customer satisfaction
                                 scores




                                                                                33
Scrape, donā€™t type
XML package provides amazing readHTMLtable() function:
    > library(XML)
    > acsi.url = 'http://www.theacsi.org/index.php?
    option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines'
    > acsi.df = readHTMLTable(acsi.url, header=T, which=1,
    stringsAsFactors=F)
    > # only keep column #1 (name) and #18 (2010 score)
    > acsi.df = acsi.df[,c(1,18)]
    > head(acsi.df,1)
                         10
    1 Southwest Airlines 79



Well, typing metadata is OK, I guess... clean up column names, etc:

    > colnames(acsi.df) = c('airline', 'score')              NA (as in ā€œn/aā€) is
    > acsi.df$code = c('WN', NA, 'CO', NA, 'AA', 'DL',       supported as a
                       'US', 'NW', 'UA')                     valid value
    > acsi.df$score = as.numeric(acsi.df$score)              everywhere in R.
Game Plan
Search Twitter for
airline mentions &
collect tweet text            Score sentiment for    Summarize for each
                                  each tweet              airline
 Load sentiment
   word lists


                                                           Compare Twitter
                                                         sentiment with ACSI
                                                           satisfaction score
                       Scrape ACSI web site for
                     airline customer satisfaction
                                 scores




                                                                                35
Join and compare
merge() joins two data.frames by the speciļ¬ed ā€œby=ā€ ļ¬elds. You can
specify ā€˜suffixesā€™ to rename conļ¬‚icting column names:

    > compare.df = merge(twitter.df, acsi.df, by='code',
        suffixes=c('.twitter', '.acsi'))




Unless you specify ā€œall=Tā€, non-matching rows are dropped (like a SQL
INNER JOIN), and thatā€™s what happened to top scoring JetBlue.


With a very low score, and low traffic to boot, soon-to-disappear
Continental looks like an outlier. Letā€™s exclude:
    > compare.df = subset(compare.df, all.count > 100)
an actual result!
ggplot will even run lm() linear
(and other) regressions for you
 with its geom_smooth() layer:

> ggplot( compare.df ) +
geom_point(aes(x=score.twitter,
y=score.acsi,
color=airline.twitter), size=5) +
geom_smooth(aes(x=score.twitter,
y=score.acsi, group=1), se=F,
method="lm") +
theme_bw() +
opts(legend.position=c(0.2,
0.85))




                                         37
                                         21
http://www.despair.com/cudi.html
R code for example scoring function
    score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
	   require(plyr)
	   require(stringr)
	
	   # we got a vector of sentences. plyr will handle a list or a vector as an "l" for us
	   # we want a simple array of scores back, so we use "l" + "a" + "ply" = laply:
	   scores = laply(sentences, function(sentence, pos.words, neg.words) {
	   	
	   	      # clean up sentences with R's regex-driven global substitute, gsub():
	   	      sentence = gsub('[[:punct:]]', '', sentence)
	   	      sentence = gsub('[[:cntrl:]]', '', sentence)
	   	      sentence = gsub('d+', '', sentence)
	   	      # and convert to lower case:
	   	      sentence = tolower(sentence)

	   	     # split into words. str_split is in the stringr package
	   	     word.list = str_split(sentence, 's+')
	   	     # sometimes a list() is one level of hierarchy too much
	   	     words = unlist(word.list)

	   	     # compare our words to the dictionaries of positive & negative terms
	   	     pos.matches = match(words, pos.words)
	   	     neg.matches = match(words, neg.words)
	
	   	     # match() returns the position of the matched term or NA
	   	     # we just want a TRUE/FALSE:
	   	     pos.matches = !is.na(pos.matches)
	   	     neg.matches = !is.na(neg.matches)

	   	     # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
	   	     score = sum(pos.matches) - sum(neg.matches)

	   	      return(score)
	   }, pos.words, neg.words, .progress=.progress )

	   scores.df = data.frame(score=scores, text=sentences)
	   return(scores.df)
}                                                                                          39

More Related Content

What's hot

Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Rapid Innovation: The Business Case for Modern Application Development (SRV20...Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Rapid Innovation: The Business Case for Modern Application Development (SRV20...Amazon Web Services
Ā 
Webinar: Self-service Analytics con VirtualizaciĆ³n de Datos
Webinar: Self-service Analytics con VirtualizaciĆ³n de DatosWebinar: Self-service Analytics con VirtualizaciĆ³n de Datos
Webinar: Self-service Analytics con VirtualizaciĆ³n de DatosDenodo
Ā 
The Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital TransformationNUS-ISS
Ā 
Sungard Global trading Presentation
Sungard Global trading PresentationSungard Global trading Presentation
Sungard Global trading Presentationahemeury
Ā 
Heroku 101 py con 2015 - David Gouldin
Heroku 101   py con 2015 - David GouldinHeroku 101   py con 2015 - David Gouldin
Heroku 101 py con 2015 - David GouldinHeroku
Ā 
A Health Catalyst Overview: Learn How a Data First Strategy Can Drive Increas...
A Health Catalyst Overview: Learn How a Data First Strategy Can Drive Increas...A Health Catalyst Overview: Learn How a Data First Strategy Can Drive Increas...
A Health Catalyst Overview: Learn How a Data First Strategy Can Drive Increas...Health Catalyst
Ā 
Container Security Using Microsoft Defender
Container Security Using Microsoft DefenderContainer Security Using Microsoft Defender
Container Security Using Microsoft DefenderRahul Khengare
Ā 
Insurance industry trends 2015 and beyond: #3 Cloud Computing
Insurance industry trends 2015 and beyond: #3 Cloud ComputingInsurance industry trends 2015 and beyond: #3 Cloud Computing
Insurance industry trends 2015 and beyond: #3 Cloud ComputingEuro IT Group
Ā 
Bi in telecom through kpiā€™s
Bi in telecom through kpiā€™sBi in telecom through kpiā€™s
Bi in telecom through kpiā€™sSai Venkatesh
Ā 
Digital Integration Hub - Maximise Your APIs
Digital Integration Hub - Maximise Your APIsDigital Integration Hub - Maximise Your APIs
Digital Integration Hub - Maximise Your APIsDaniel Toomey
Ā 
India as a Product Nation - The Next Google can come from India
India as a Product Nation - The Next Google can come from IndiaIndia as a Product Nation - The Next Google can come from India
India as a Product Nation - The Next Google can come from IndiaProductNation/iSPIRT
Ā 
Cyber security analysis presentation
Cyber security analysis presentationCyber security analysis presentation
Cyber security analysis presentationVaibhav R
Ā 
Customer-Centric Marketing
Customer-Centric MarketingCustomer-Centric Marketing
Customer-Centric MarketingDung Tri
Ā 
VMware Partner Program Plan
VMware Partner Program PlanVMware Partner Program Plan
VMware Partner Program PlanElliott Lowe
Ā 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
Ā 
Cyber kill chain
Cyber kill chainCyber kill chain
Cyber kill chainAnkita Ganguly
Ā 
IT Certification Roadmap
IT Certification RoadmapIT Certification Roadmap
IT Certification RoadmapLeslie A. George
Ā 
Threat Modeling 101
Threat Modeling 101Threat Modeling 101
Threat Modeling 101Atlassian
Ā 
(ARC307) Infrastructure as Code
(ARC307) Infrastructure as Code(ARC307) Infrastructure as Code
(ARC307) Infrastructure as CodeAmazon Web Services
Ā 

What's hot (20)

Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Rapid Innovation: The Business Case for Modern Application Development (SRV20...Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Ā 
Webinar: Self-service Analytics con VirtualizaciĆ³n de Datos
Webinar: Self-service Analytics con VirtualizaciĆ³n de DatosWebinar: Self-service Analytics con VirtualizaciĆ³n de Datos
Webinar: Self-service Analytics con VirtualizaciĆ³n de Datos
Ā 
The Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
Ā 
Sungard Global trading Presentation
Sungard Global trading PresentationSungard Global trading Presentation
Sungard Global trading Presentation
Ā 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
Ā 
Heroku 101 py con 2015 - David Gouldin
Heroku 101   py con 2015 - David GouldinHeroku 101   py con 2015 - David Gouldin
Heroku 101 py con 2015 - David Gouldin
Ā 
A Health Catalyst Overview: Learn How a Data First Strategy Can Drive Increas...
A Health Catalyst Overview: Learn How a Data First Strategy Can Drive Increas...A Health Catalyst Overview: Learn How a Data First Strategy Can Drive Increas...
A Health Catalyst Overview: Learn How a Data First Strategy Can Drive Increas...
Ā 
Container Security Using Microsoft Defender
Container Security Using Microsoft DefenderContainer Security Using Microsoft Defender
Container Security Using Microsoft Defender
Ā 
Insurance industry trends 2015 and beyond: #3 Cloud Computing
Insurance industry trends 2015 and beyond: #3 Cloud ComputingInsurance industry trends 2015 and beyond: #3 Cloud Computing
Insurance industry trends 2015 and beyond: #3 Cloud Computing
Ā 
Bi in telecom through kpiā€™s
Bi in telecom through kpiā€™sBi in telecom through kpiā€™s
Bi in telecom through kpiā€™s
Ā 
Digital Integration Hub - Maximise Your APIs
Digital Integration Hub - Maximise Your APIsDigital Integration Hub - Maximise Your APIs
Digital Integration Hub - Maximise Your APIs
Ā 
India as a Product Nation - The Next Google can come from India
India as a Product Nation - The Next Google can come from IndiaIndia as a Product Nation - The Next Google can come from India
India as a Product Nation - The Next Google can come from India
Ā 
Cyber security analysis presentation
Cyber security analysis presentationCyber security analysis presentation
Cyber security analysis presentation
Ā 
Customer-Centric Marketing
Customer-Centric MarketingCustomer-Centric Marketing
Customer-Centric Marketing
Ā 
VMware Partner Program Plan
VMware Partner Program PlanVMware Partner Program Plan
VMware Partner Program Plan
Ā 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
Ā 
Cyber kill chain
Cyber kill chainCyber kill chain
Cyber kill chain
Ā 
IT Certification Roadmap
IT Certification RoadmapIT Certification Roadmap
IT Certification Roadmap
Ā 
Threat Modeling 101
Threat Modeling 101Threat Modeling 101
Threat Modeling 101
Ā 
(ARC307) Infrastructure as Code
(ARC307) Infrastructure as Code(ARC307) Infrastructure as Code
(ARC307) Infrastructure as Code
Ā 

Viewers also liked

Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSkillspeed
Ā 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in REdureka!
Ā 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
Ā 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
Ā 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
Ā 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweetsVasu Jain
Ā 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
Ā 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
Ā 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
Ā 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
Ā 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
Ā 
The Who What Where When And Why Of Social Media Lead Generation
The Who What Where When And Why Of Social Media Lead GenerationThe Who What Where When And Why Of Social Media Lead Generation
The Who What Where When And Why Of Social Media Lead GenerationAbhishek Shah
Ā 
Social Media Secrets
Social Media SecretsSocial Media Secrets
Social Media SecretsGuy Kawasaki
Ā 
SteadyBudget's Seed Funding Pitch Deck
SteadyBudget's Seed Funding Pitch DeckSteadyBudget's Seed Funding Pitch Deck
SteadyBudget's Seed Funding Pitch DeckShape Integrated Software
Ā 
Zenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateZenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateJoseph Hsieh
Ā 
AdPushup Fundraising Deck - First Pitch
AdPushup Fundraising Deck - First PitchAdPushup Fundraising Deck - First Pitch
AdPushup Fundraising Deck - First Pitchadpushup
Ā 
AppVirality.com - Investor Pitch Deck
AppVirality.com - Investor Pitch DeckAppVirality.com - Investor Pitch Deck
AppVirality.com - Investor Pitch DeckLaxman Papineni
Ā 
How Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksHow Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksWealthsimple
Ā 
500ā€™s Demo Day Batch 16 >> Podozi
500ā€™s Demo Day Batch 16 >>  Podozi500ā€™s Demo Day Batch 16 >>  Podozi
500ā€™s Demo Day Batch 16 >> Podozi500 Startups
Ā 
Swipes pitch deck for Beta Pitch 2013 Finals in Berlin
Swipes pitch deck for Beta Pitch 2013 Finals in BerlinSwipes pitch deck for Beta Pitch 2013 Finals in Berlin
Swipes pitch deck for Beta Pitch 2013 Finals in BerlinSwipes App
Ā 

Viewers also liked (20)

Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
Ā 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in R
Ā 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
Ā 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
Ā 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
Ā 
Sentiment analysis of tweets
Sentiment analysis of tweetsSentiment analysis of tweets
Sentiment analysis of tweets
Ā 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
Ā 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
Ā 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
Ā 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
Ā 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Ā 
The Who What Where When And Why Of Social Media Lead Generation
The Who What Where When And Why Of Social Media Lead GenerationThe Who What Where When And Why Of Social Media Lead Generation
The Who What Where When And Why Of Social Media Lead Generation
Ā 
Social Media Secrets
Social Media SecretsSocial Media Secrets
Social Media Secrets
Ā 
SteadyBudget's Seed Funding Pitch Deck
SteadyBudget's Seed Funding Pitch DeckSteadyBudget's Seed Funding Pitch Deck
SteadyBudget's Seed Funding Pitch Deck
Ā 
Zenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateZenpayroll Pitch Deck Template
Zenpayroll Pitch Deck Template
Ā 
AdPushup Fundraising Deck - First Pitch
AdPushup Fundraising Deck - First PitchAdPushup Fundraising Deck - First Pitch
AdPushup Fundraising Deck - First Pitch
Ā 
AppVirality.com - Investor Pitch Deck
AppVirality.com - Investor Pitch DeckAppVirality.com - Investor Pitch Deck
AppVirality.com - Investor Pitch Deck
Ā 
How Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksHow Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeks
Ā 
500ā€™s Demo Day Batch 16 >> Podozi
500ā€™s Demo Day Batch 16 >>  Podozi500ā€™s Demo Day Batch 16 >>  Podozi
500ā€™s Demo Day Batch 16 >> Podozi
Ā 
Swipes pitch deck for Beta Pitch 2013 Finals in Berlin
Swipes pitch deck for Beta Pitch 2013 Finals in BerlinSwipes pitch deck for Beta Pitch 2013 Finals in Berlin
Swipes pitch deck for Beta Pitch 2013 Finals in Berlin
Ā 

Similar to R by example: mining Twitter for consumer attitudes towards airlines

Sentiment analysis on airlines
Sentiment analysis on airlinesSentiment analysis on airlines
Sentiment analysis on airlinesPiyush Srivastava
Ā 
Final Presentation
Final PresentationFinal Presentation
Final PresentationLove Tyagi
Ā 
The Value of Twitter
The Value of TwitterThe Value of Twitter
The Value of TwitterAdam Blackwood
Ā 
AWS January 2016 Webinar Series - Building Smart Applications with Amazon Mac...
AWS January 2016 Webinar Series - Building Smart Applications with Amazon Mac...AWS January 2016 Webinar Series - Building Smart Applications with Amazon Mac...
AWS January 2016 Webinar Series - Building Smart Applications with Amazon Mac...Amazon Web Services
Ā 
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...PAPIs.io
Ā 
DashboardUMUC DATA 620Assignment 3.1Your NameProfessorDateProbl.docx
DashboardUMUC DATA 620Assignment 3.1Your NameProfessorDateProbl.docxDashboardUMUC DATA 620Assignment 3.1Your NameProfessorDateProbl.docx
DashboardUMUC DATA 620Assignment 3.1Your NameProfessorDateProbl.docxsimonithomas47935
Ā 
Real-World Smart Applications with Amazon Machine Learning - AWS Machine Lear...
Real-World Smart Applications with Amazon Machine Learning - AWS Machine Lear...Real-World Smart Applications with Amazon Machine Learning - AWS Machine Lear...
Real-World Smart Applications with Amazon Machine Learning - AWS Machine Lear...AWS Germany
Ā 
Exploring Equilibrium Login to the computer and open a web browser.docx
Exploring Equilibrium Login to the computer and open a web browser.docxExploring Equilibrium Login to the computer and open a web browser.docx
Exploring Equilibrium Login to the computer and open a web browser.docxssuser454af01
Ā 
(BDT302) Real-World Smart Applications With Amazon Machine Learning
(BDT302) Real-World Smart Applications With Amazon Machine Learning(BDT302) Real-World Smart Applications With Amazon Machine Learning
(BDT302) Real-World Smart Applications With Amazon Machine LearningAmazon Web Services
Ā 
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisIBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisTorsten Steinbach
Ā 
Building an Autonomous Data Layer
Building an Autonomous Data LayerBuilding an Autonomous Data Layer
Building an Autonomous Data LayerMartyPitt1
Ā 
Flutter Festivals IIT Goa Session 2
Flutter Festivals IIT Goa Session 2Flutter Festivals IIT Goa Session 2
Flutter Festivals IIT Goa Session 2SEJALGUPTA44
Ā 
REST APIs, Girls Who Code
REST APIs, Girls Who CodeREST APIs, Girls Who Code
REST APIs, Girls Who CodeTwitter Developers
Ā 
IST365 - Project Deliverable #3Create the corresponding relation.docx
IST365 - Project Deliverable #3Create the corresponding relation.docxIST365 - Project Deliverable #3Create the corresponding relation.docx
IST365 - Project Deliverable #3Create the corresponding relation.docxpriestmanmable
Ā 
"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)Portland R User Group
Ā 
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI servicesAWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI servicesAmazon Web Services
Ā 
Command Query Responsibility Segregation at Enterprise Scale
Command Query Responsibility Segregation at Enterprise ScaleCommand Query Responsibility Segregation at Enterprise Scale
Command Query Responsibility Segregation at Enterprise ScaleTechWell
Ā 
Databaseconcepts
DatabaseconceptsDatabaseconcepts
Databaseconceptsdilipkkr
Ā 

Similar to R by example: mining Twitter for consumer attitudes towards airlines (20)

Sentiment analysis on airlines
Sentiment analysis on airlinesSentiment analysis on airlines
Sentiment analysis on airlines
Ā 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
Ā 
The Value of Twitter
The Value of TwitterThe Value of Twitter
The Value of Twitter
Ā 
AWS January 2016 Webinar Series - Building Smart Applications with Amazon Mac...
AWS January 2016 Webinar Series - Building Smart Applications with Amazon Mac...AWS January 2016 Webinar Series - Building Smart Applications with Amazon Mac...
AWS January 2016 Webinar Series - Building Smart Applications with Amazon Mac...
Ā 
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Building a Production-ready Predictive App for Customer Service - Alex Ingerm...
Ā 
DashboardUMUC DATA 620Assignment 3.1Your NameProfessorDateProbl.docx
DashboardUMUC DATA 620Assignment 3.1Your NameProfessorDateProbl.docxDashboardUMUC DATA 620Assignment 3.1Your NameProfessorDateProbl.docx
DashboardUMUC DATA 620Assignment 3.1Your NameProfessorDateProbl.docx
Ā 
Real-World Smart Applications with Amazon Machine Learning - AWS Machine Lear...
Real-World Smart Applications with Amazon Machine Learning - AWS Machine Lear...Real-World Smart Applications with Amazon Machine Learning - AWS Machine Lear...
Real-World Smart Applications with Amazon Machine Learning - AWS Machine Lear...
Ā 
Exploring Equilibrium Login to the computer and open a web browser.docx
Exploring Equilibrium Login to the computer and open a web browser.docxExploring Equilibrium Login to the computer and open a web browser.docx
Exploring Equilibrium Login to the computer and open a web browser.docx
Ā 
(BDT302) Real-World Smart Applications With Amazon Machine Learning
(BDT302) Real-World Smart Applications With Amazon Machine Learning(BDT302) Real-World Smart Applications With Amazon Machine Learning
(BDT302) Real-World Smart Applications With Amazon Machine Learning
Ā 
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisIBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter Analysis
Ā 
Building an Autonomous Data Layer
Building an Autonomous Data LayerBuilding an Autonomous Data Layer
Building an Autonomous Data Layer
Ā 
Flutter Festivals IIT Goa Session 2
Flutter Festivals IIT Goa Session 2Flutter Festivals IIT Goa Session 2
Flutter Festivals IIT Goa Session 2
Ā 
REST APIs, Girls Who Code
REST APIs, Girls Who CodeREST APIs, Girls Who Code
REST APIs, Girls Who Code
Ā 
IST365 - Project Deliverable #3Create the corresponding relation.docx
IST365 - Project Deliverable #3Create the corresponding relation.docxIST365 - Project Deliverable #3Create the corresponding relation.docx
IST365 - Project Deliverable #3Create the corresponding relation.docx
Ā 
"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)
Ā 
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI servicesAWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
AWS Summit Singapore 2019 | Accelerating ML Adoption with Our New AI services
Ā 
Command Query Responsibility Segregation at Enterprise Scale
Command Query Responsibility Segregation at Enterprise ScaleCommand Query Responsibility Segregation at Enterprise Scale
Command Query Responsibility Segregation at Enterprise Scale
Ā 
Databaseconcepts
DatabaseconceptsDatabaseconcepts
Databaseconcepts
Ā 
Blind sql injection
Blind sql injectionBlind sql injection
Blind sql injection
Ā 
Blind sql injection
Blind sql injectionBlind sql injection
Blind sql injection
Ā 

More from Jeffrey Breen

Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with RJeffrey Breen
Ā 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & HadoopJeffrey Breen
Ā 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Jeffrey Breen
Ā 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
Ā 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Jeffrey Breen
Ā 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMJeffrey Breen
Ā 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeJeffrey Breen
Ā 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from RJeffrey Breen
Ā 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in RJeffrey Breen
Ā 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
Ā 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
Ā 
FAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewFAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewJeffrey Breen
Ā 

More from Jeffrey Breen (12)

Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with R
Ā 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
Ā 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Ā 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Ā 
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2
Ā 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Ā 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Ā 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from R
Ā 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in R
Ā 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
Ā 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
Ā 
FAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overviewFAA Aviation Forecasts 2011-2031 overview
FAA Aviation Forecasts 2011-2031 overview
Ā 

Recently uploaded

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
Ā 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
Ā 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
Ā 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
Ā 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
Ā 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
Ā 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
Ā 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
Ā 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
Ā 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
Ā 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
Ā 
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹Bhuvaneswari Subramani
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
Ā 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
Ā 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
Ā 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Ā 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
Ā 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Ā 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Ā 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Ā 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
Ā 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Ā 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Ā 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Ā 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Ā 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Ā 
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹
Elevate Developer Efficiency & build GenAI Application with Amazon Qā€‹
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Ā 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Ā 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ā 

R by example: mining Twitter for consumer attitudes towards airlines

  • 1. R by example: mining Twitter for consumer attitudes towards airlines presented at the Boston Predictive Analytics MeetUp by Jeffrey Breen President Cambridge Aviation Research jbreen@cambridge.aero June 2011 Cambridge Aviation Research ā€¢ 245 First Street ā€¢ Suite 1800 ā€¢ Cambridge, MA 02142 ā€¢ cambridge.aero Ā© Copyright 2010 by Cambridge Aviation Research. All rights reserved.
  • 2. Airlines top customer satisfaction... alphabetically http://www.theacsi.org/ 3
  • 3. Actually, they rank below the Post Office and health insurers 4
  • 4. which gives us plenty to listen to Completely unimpressed with @continental or @united. RT @dave_mcgregor: Poor communication, goofy reservations systems and Publicly pledging to all to turn my trip into a mess. never fly @delta again. The worst airline ever. U have lost my patronage @united #fail on wifi in red carpet clubs (too forever due to ur slow), delayed flight, customer service in red incompetence carpet club (too slow), hmmm do u see a trend? @United Weather delays may not be your fault, but you are in the customer service business. It's atrocious how people are getting treated! We were just told we are delayed 1.5 @SouthwestAir I know you don't make the hrs & next announcement on @JetBlue - weather. But at least pretend I am not a ā€œWe're selling headsets.ā€ Way to bother when I ask if the delay will make capitalize on our misfortune. miss my connection @SouthwestAir I hate you with every Hey @delta - you suck! Your prices single bone in my body are over the moon & to move a flight for delaying my flight by a cpl of days is $150.00. Insane. I 3 hours, 30mins before I hate you! U ruined my vacation! was supposed to board. #hate
  • 5. Game Plan Search Twitter for airline mentions & collect tweet text Score sentiment for Summarize for each each tweet airline Load sentiment word lists Compare Twitter sentiment with ACSI satisfaction score Scrape ACSI web site for airline customer satisfaction scores 14
  • 6. Game Plan Search Twitter for airline mentions & collect tweet text Score sentiment for Summarize for each each tweet airline Load sentiment word lists Compare Twitter sentiment with ACSI satisfaction score Scrape ACSI web site for airline customer satisfaction scores 15
  • 7. Searching Twitter in one line Rā€™s XML and RCurl packages make it easy to grab web data, but Jeff Gentryā€™s twitteR package makes searching Twitter almost too easy: > # load the package > library(twitteR) > # get the 1,500 most recent tweets mentioning ā€˜@deltaā€™: > delta.tweets = searchTwitter('@delta', n=1500) See what we got in return: A ā€œlistā€ in R is a collection of objects and its elements may be > length(delta.tweets) named or just numbered. [1] 1500 > class(delta.tweets) [1] "list" ā€œ[[ ]]ā€ is used to access elements.
  • 8. Examine the output Letā€™s take a look at the ļ¬rst tweet in the output list: > tweet = delta.tweets[[1]] tweet is an object of type ā€œstatusā€ from the ā€œtwitteRā€ package. > class(tweet) [1] "status" attr(,"package") It holds all the information about [1] "twitteR" the tweet returned from Twitter. The help page (ā€œ?statusā€) describes some accessor methods like getScreenName() and getText() which do what you would expect: > tweet$getScreenName() [1] "Alaqawari" > tweet$getText() [1] "I am ready to head home. Inshallah will try to get on the earlier flight to Fresno. @Delta @DeltaAssist"
  • 9. Extract the tweet text R has several (read: too many) ways to apply functions iteratively. ā€¢The plyr package uniļ¬es them all with a consistent naming convention. ā€¢The function name is determined by the input and output data types. We have a list and would like a simple array output, so we use ā€œlaplyā€: > delta.text = laply(delta.tweets, function(t) t$getText() ) > length(delta.text)[1] 1500 > head(delta.text, 5) [1] "I am ready to head home. Inshallah will try to get on the earlier flight to Fresno. @Delta @DeltaAssist" [2] "@Delta Releases 2010 Corporate Responsibility Report - @PRNewswire (press release) : http://tinyurl.com/64mz3oh" [3] "Another week, another upgrade! Thanks @Delta!" [4] "I'm not able to check in or select a seat for flight DL223/KL6023 to Seattle tomorrow. Help? @KLM @delta" [5] "In my boredom of waiting realized @deltaairlines is now @delta seriously..... Stil waiting and your not even unloading status yet"
  • 10. Game Plan Search Twitter for airline mentions & collect tweet text Score sentiment for Summarize for each each tweet airline Load sentiment word lists Compare Twitter sentiment with ACSI satisfaction score Scrape ACSI web site for airline customer satisfaction scores 19
  • 11. Estimating Sentiment There are many good papers and resources describing methods to estimate sentiment. These are very complex algorithms. For this tutorial, we use a very simple algorithm which assigns a score by simply counting the number of occurrences of ā€œpositiveā€ and ā€œnegativeā€ words in a tweet. The code for our score.sentiment() function can be found at the end of this deck. Hu & Liu have published an ā€œopinion lexiconā€ which categorizes approximately 6,800 words as positive or negative and which can be downloaded. Positive: love, best, cool, great, good, amazing Negative: hate, worst, sucks, awful, nightmare 20
  • 12. Load sentiment word lists 1. Download Hu & Liuā€™s opinion lexicon: http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html 2. Loading data is one of Rā€™s strengths. These are simple text ļ¬les, though they use ā€œ;ā€ as a comment character at the beginning: > hu.liu.pos = scan('../data/opinion-lexicon-English/positive- words.txt', what='character', comment.char=';') > hu.liu.neg = scan('../data/opinion-lexicon-English/negative- words.txt', what='character', comment.char=';') 3. Add a few industry-speciļ¬c and/or especially emphatic terms: > pos.words = c(hu.liu.pos, 'upgrade') The c() function > neg.words = c(hu.liu.neg, 'wtf', 'wait', combines objects 'waiting', 'epicfail', 'mechanical') into vectors or lists
  • 13. Game Plan Search Twitter for airline mentions & collect tweet text Score sentiment for Summarize for each each tweet airline Load sentiment word lists Compare Twitter sentiment with ACSI satisfaction score Scrape ACSI web site for airline customer satisfaction scores 22
  • 14. Algorithm sanity check > sample = c("You're awesome and I love you", "I hate and hate and hate. So angry. Die!", "Impressed and amazed: you are peerless in your achievement of unparalleled mediocrity.") > result = score.sentiment(sample, pos.words, neg.words) > class(result) data.frames hold tabular data so they [1] "data.frame" consist of columns & rows which can > result$score be accessed by name or number. [1] 2 -5 4 Here, ā€œscoreā€ is the name of a column. So, not so good with sarcasm. Here are a couple of real tweets: > score.sentiment(c("@Delta I'm going to need you to get it together. Delay on tarmac, delayed connection, crazy gate changes... #annoyed", "Surprised and happy that @Delta helped me avoid the 3.5 hr layover I was scheduled for. Patient and helpful agents. #remarkable"), pos.words, neg.words)$score [1] -4 5
  • 15. Accessing data.frames Hereā€™s the data.frame just returned from score.sentiment(): > result score text 1 2 You're awesome and I love you 2 -5 I hate and hate and hate. So angry. Die! 3 4 Impressed and amazed: you are peerless in your achievement of unparalleled mediocrity. Elements can be accessed by name or position, and positions can be ranges: > result[1,1] [1] 2 > result[1,'score'] [1] 2 > result[1:2, 'score'] [1] 2 -5 > result[c(1,3), 'score'] [1] 2 4 > result[,'score'] [1] 2 -5 4
  • 16. Score the tweets To score all of the Delta tweets, just feed their text into score.sentiment(): > delta.scores = score.sentiment(delta.text, pos.words, Progress bar neg.words, .progress='text') provided by |==================================================| 100% plyr Letā€™s add two new columns to identify the airline for when we combine all the scores later: > delta.scores$airline = 'Delta' > delta.scores$code = 'DLā€™
  • 17. Plot Deltaā€™s score distribution Rā€™s built-in hist() function will create and plot histograms of your data: > hist(delta.scores$score)
  • 18. The ggplot2 alternative ggplot2 is an alternative graphics package which generates more reļ¬ned graphics: > qplot(delta.scores$score)
  • 19. Lather. Rinse. Repeat To see how the other airlines fare, collect & score tweets for other airlines. Then combine all the results into a single ā€œall.scoresā€ data.frame: > all.scores = rbind( american.scores, continental.scores, delta.scores, jetblue.scores, southwest.scores, united.scores, us.scores ) rbind() combines rows from data.frames, arrays, and matrices
  • 20. Compare score distributions ggplot2 implements ā€œgrammar of graphicsā€, building plots in layers: > ggplot(data=all.scores) + # ggplot works on data.frames, always geom_bar(mapping=aes(x=score, fill=airline), binwidth=1) + facet_grid(airline~.) + # make a separate plot for each airline theme_bw() + scale_fill_brewer() # plain display, nicer colors ggplot2ā€™s faceting capability makes it easy to generate the same graph for different values of a variable, in this case ā€œairlineā€.
  • 21. Game Plan Search Twitter for airline mentions & collect tweet text Score sentiment for Summarize for each each tweet airline Load sentiment word lists Compare Twitter sentiment with ACSI satisfaction score Scrape ACSI web site for airline customer satisfaction scores 30
  • 22. Ignore the middle Letā€™s focus on very negative (<-2) and positive (>2) tweets: > all.scores$very.pos = as.numeric( all.scores$score >= 2 ) > all.scores$very.neg = as.numeric( all.scores$score <= -2 ) For each airline ( airline + code ), letā€™s use the ratio of very positive to very negative tweets as the overall sentiment score for each airline: > twitter.df = ddply(all.scores, c('airline', 'code'), summarise, pos.count = sum( very.pos ), neg.count = sum( very.neg ) ) > twitter.df$all.count = twitter.df$pos.count + twitter.df$neg.count > twitter.df$score = round( 100 * twitter.df$pos.count / twitter.df$all.count ) Sort with orderBy() from the doBy package: > orderBy(~-score, twitter.df)
  • 23. Any relation to ACSIā€™s airline scores? http://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines 18
  • 24. Game Plan Search Twitter for airline mentions & collect tweet text Score sentiment for Summarize for each each tweet airline Load sentiment word lists Compare Twitter sentiment with ACSI satisfaction score Scrape ACSI web site for airline customer satisfaction scores 33
  • 25. Scrape, donā€™t type XML package provides amazing readHTMLtable() function: > library(XML) > acsi.url = 'http://www.theacsi.org/index.php? option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines' > acsi.df = readHTMLTable(acsi.url, header=T, which=1, stringsAsFactors=F) > # only keep column #1 (name) and #18 (2010 score) > acsi.df = acsi.df[,c(1,18)] > head(acsi.df,1) 10 1 Southwest Airlines 79 Well, typing metadata is OK, I guess... clean up column names, etc: > colnames(acsi.df) = c('airline', 'score') NA (as in ā€œn/aā€) is > acsi.df$code = c('WN', NA, 'CO', NA, 'AA', 'DL', supported as a 'US', 'NW', 'UA') valid value > acsi.df$score = as.numeric(acsi.df$score) everywhere in R.
  • 26. Game Plan Search Twitter for airline mentions & collect tweet text Score sentiment for Summarize for each each tweet airline Load sentiment word lists Compare Twitter sentiment with ACSI satisfaction score Scrape ACSI web site for airline customer satisfaction scores 35
  • 27. Join and compare merge() joins two data.frames by the speciļ¬ed ā€œby=ā€ ļ¬elds. You can specify ā€˜suffixesā€™ to rename conļ¬‚icting column names: > compare.df = merge(twitter.df, acsi.df, by='code', suffixes=c('.twitter', '.acsi')) Unless you specify ā€œall=Tā€, non-matching rows are dropped (like a SQL INNER JOIN), and thatā€™s what happened to top scoring JetBlue. With a very low score, and low traffic to boot, soon-to-disappear Continental looks like an outlier. Letā€™s exclude: > compare.df = subset(compare.df, all.count > 100)
  • 28. an actual result! ggplot will even run lm() linear (and other) regressions for you with its geom_smooth() layer: > ggplot( compare.df ) + geom_point(aes(x=score.twitter, y=score.acsi, color=airline.twitter), size=5) + geom_smooth(aes(x=score.twitter, y=score.acsi, group=1), se=F, method="lm") + theme_bw() + opts(legend.position=c(0.2, 0.85)) 37 21
  • 30. R code for example scoring function score.sentiment = function(sentences, pos.words, neg.words, .progress='none') { require(plyr) require(stringr) # we got a vector of sentences. plyr will handle a list or a vector as an "l" for us # we want a simple array of scores back, so we use "l" + "a" + "ply" = laply: scores = laply(sentences, function(sentence, pos.words, neg.words) { # clean up sentences with R's regex-driven global substitute, gsub(): sentence = gsub('[[:punct:]]', '', sentence) sentence = gsub('[[:cntrl:]]', '', sentence) sentence = gsub('d+', '', sentence) # and convert to lower case: sentence = tolower(sentence) # split into words. str_split is in the stringr package word.list = str_split(sentence, 's+') # sometimes a list() is one level of hierarchy too much words = unlist(word.list) # compare our words to the dictionaries of positive & negative terms pos.matches = match(words, pos.words) neg.matches = match(words, neg.words) # match() returns the position of the matched term or NA # we just want a TRUE/FALSE: pos.matches = !is.na(pos.matches) neg.matches = !is.na(neg.matches) # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum(): score = sum(pos.matches) - sum(neg.matches) return(score) }, pos.words, neg.words, .progress=.progress ) scores.df = data.frame(score=scores, text=sentences) return(scores.df) } 39

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. delta.tweets[1] \n\n&quot;0533370677: RT @g: To the @Delta attendant at gate D39 in ATL: your attitude isn&apos;t the best, but you did get me on the earlier flight. Thank you. #ncslatl&quot;\n\n&amp;#x2018;original tweet has two parts: screenname and the text; we need to separate out the text&amp;#x201D;\n
  8. &amp;#x2018;This is one tweet, next will batch process the set&amp;#x2019;\n
  9. &amp;#x2018;can see here the text has been extracted&amp;#x2019;\n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n