SlideShare a Scribd company logo
1 of 71
Download to read offline
What do you really mean when you tweet?
Challenges for opinion mining on social media
Dr. Diana Maynard
University of Sheffield, UK
The Social Web

Information, thoughts
and opinions are shared
prolifically these days on
the social web
Who cares about social media though?

Isn't Twitter just full of
stupid messages about
Justin Bieber?
Well, social media has other uses too

http://socialmediatoday.com/node/1568271








One in six people have used social media to get information about an
emergency
One in two people would sign up for emails, text alerts, or applications
to receive any of the emergency information.
75% of people would use Facebook to post eyewitness information on
an emergency or newsworthy event; 22% would use blogs, 21% would
use Twitter
During an emergency, one in two people would use social media to let
loved ones know they are safe
It's all a bit new-fangled, isn't it?
●

Well actually, social media goes back a long way

●

The first email was sent in 1971

●

But it really goes back much further

●

●

The first documented postal service was in 550BC, although there was
evidence of written couriers long before that
However, communication speed is a little faster these days!
Let's rewind a little...
Drowning in information
• It can be difficult to get the
relevant information out of such
large volumes of data in a useful
way
• Social web analysis is all about the
users who are actively engaged
and generate content
• Social networks are pools of a
wide range of articulation
methods, from simple "I like it"
buttons to complete articles
Opinion Mining
• Along with NER, opinion
mining is a key component
in social web analysis
• NER: names of people,
organisations, locations
• Opinion mining: what
sentiments are being
expressed?
Opinion Mining is about finding out what people
think...
Amazon book reviews
TripAdvisor Hotel reviews
And one for the Portuguese speakers :-)
Rotten Tomatoes
Film Reviews
It's not just about product reviews
•

Much opinion mining research has been focused around
reviews of films, books, electronics etc.

• But there are many other uses
– companies want to know what people think
– finding out political and social opinions and moods
– investigating how public mood influences the stock market
– investigating and preserving community memories
– drawing inferences from social analytics
And taking it a step further
It allows us to answer questions like:
• What are the opinions on crucial social
events and the key people involved?
• How are these opinions distributed in
relation to demographic user data?
• How have these opinions evolved?
• Who are the opinion leaders?
• What is their impact and influence?
Analysing Public Mood
• Closely related to opinion mining is the
analysis of sentiment and mood
• Mood of the Nation project at Bristol
University
http://geopatterns.enm.bris.ac.uk/mood/
• Mood has proved more useful than
sentiment for things like stock market
prediction (fluctuations are driven mainly
by fear rather than by things like
happiness or sadness)
Derwent Capital Markets
●

●

●

●

Derwent Capital Markets launched a £25m fund in 2011 that made its
investments via social media analysis by evaluating whether people
are generally happy, sad, anxious or tired
DCM Capital used a proprietary algorithm to research the public
sentiment of stock, primarily through Twitter, to attempt to predict the
movements of the Dow Jones Industrial Average.
Bollen told the Sunday Times: "We recorded the sentiment of the
online community, but we couldn't prove if it was correct. So we
looked at the Dow Jones to see if there was a correlation. We believed
that if the markets fell, then the mood of people on Twitter would
fall.”
"But we realised it was the other way round — that a drop in the mood
or sentiment of the online community would precede a fall in the
market.”
But it didn't quite work out as planned...
●

●

●

●

●

●

It was later suggested that there are actually many flaws in Bollen's
work, and that it's impossible to predict the stock market in this way
The "Twitter Fund"─ formally, The Derwent Absolute Return Fund ─ was
launched in July 2011, but failed to survive the summer, despite posting
initial returns, and the company was sold for peanuts in Feb 2013
There's quite a lot of sloppiness in the reporting of methodology and
results, so it's not clear what can really be trusted
The advertised results are biased by selection (they picked the winners
after the race and tried to show correlation)
The accuracy claim is too general to be useful (you can't predict
individual stock prices, only the general trend)
However, most trading companies now use some form of social media
analysis to help with prediction, though it's usually quite shallow
Transatlantic Trends







This annual diplomatic report is a manually collected survey of US
and European public opnion
It informs politicians in international relations by revealing reasoning
behind multilateral negotiations
But it's expensive and time-consuming to create - the kind of thing
that global sentiment analysis can replace, and in real-time, instead
of annually
Twitter Gives you Flu!
●

●

●

Researchers at the University
of Rochester used
twitter analysis to predict who
would get flu
They looked at the role of
interactions between users on
social media on the real-life
spread of the disease
Researchers at Johns Hopkins
also reckon they can
do better at flu tracking via
Twitter analysis than the CDC.
The Social Oscars 2013
Brandwatch ran a project to investigate how closely public opinion
predicted/mirrored the results of the 2013 Oscars
Tracking opinions over time
●
●

●

●

Opinions can be extracted with a time stamp and/or a geo-location
We can then analyse changes to opinions about the same
entity/event over time, and other statistics
We can also measure the impact of an entity or event on the overall
sentiment about an entity or another event, over the course of time
(e.g. in politics)
Also possible to incorporate statistical (non-linguistic) techniques to
investigate dynamics of opinions, e.g. find statistical correlations
between interest in certain topics or entities/events and
number/impact/influence of tweets etc.
Viewing opinion changes over time
Mapping dynamics from social media: UK riots demo
Opinion mining is like “Ask the Audience”
But be careful!

Sentiment analyis isn't just about looking at the sentiment words
●

●
●

“It's a great movie if you have the taste and sensibilities of a 5-year-old
boy.”
“It's terrible Candidate X did so well in the debate last night.”
“I'd have liked the film a lot more if it had been a bit shorter.”

Situation is everything. If you and I are best friends, then my graceful
swearing at you is different than if it’s at my boss.
Death confuses opinion mining tools



Opinion mining
tools are good for a
general overview,
but not for some
situations
Whitney Houston wasn't very popular...
Or was she?
Why are many opinion mining tools unsuccessful?
• They don't work well at more than a very basic level
• They mainly use dictionary lookup for positive and negative
words
• They classify the tweets as positive or negative, but not with
respect to the keyword you're searching for
• First, the keyword search just retrieves any tweet mentioning
it, but not necessarily about it as a topic
• Second, there is no correlation between the keyword and the
sentiment: the sentiment refers to the tweet as a whole
• Sometimes this is fine, but it can also go horribly wrong
Why bother with opinion mining?
• It depends what kind of information you want
• Don't use opinion mining tools to help you win money on
quiz shows
• Recent research has shown that one knowledgeable
analyst is better than gathering general public sentiment
from lots of analysts and taking the majority opinion
• But only for some kinds of tasks
• If you want a general overview about public sentiment
on a topic like the Olympic Games or Justin Bieber, it'll
probably work out OK
Challenges imposed by social media
• Language: incorrect use of language makes NLP hard
●
Solution: specific pre-processing for Twitter. use shallow
analysis techniques with back-off strategies; incorporate
specific subcomponents for swear words, sarcasm etc.
• Relevance: topics and comments can rapidly diverge.
●

Solution: train a classifier or use clustering techniques

• Lack of context: hard to disambiguate entities
●
Solution: use metadata for further information, also
aggregation of data can be useful
Analysing language in social media
●

Sumbuddy: Hey, hao es your familie?
Guy: They got crushed by a bus and died.
Sumbuddy: Daz so sad...wanna get iscreem?

●

●

OMMMFG!!! JUST HEARD EMINEM'S “RAPGOD”. SMFH!!!
these other dudes might as well stop rapping if they not on
this level
@adambation Try reading this article , it looks like it would be
really helpful and not obvious at all #sarcasm
http://t.co/mo3vODoX
Short sentences in tweets
• Social media, and especially tweets, can be problematic because
sentences are very short and/or incomplete
• Typically, linguistic pre-processing tools such as tokenisers, POS
taggers and parsers do badly on such texts
• Even language identification tools can have problems
• Need for special NLP pre-processing tools
Lack of context causes ambiguity
Branching out from Lincoln park after dark ... Hello Russian Navy, it's
like the same thing but with glitter!

??
Getting the NEs right is crucial
Branching out from Lincoln park after dark ... Hello Russian Navy, it's like
the same thing but with glitter!
The Problem with NER
• Running standard IE tools (ANNIE) on 300 news articles – 87% Fmeasure

• Running ANNIE on some tweets - < 40% F-measure
Example: Persons in news articles
Example: Persons in tweets
TwitIE to the rescue
Language identification is tricky
●

Language identification tools such as TextCat need a decent
amount of text (around 20 words at least)

●

But Twitter has an average of only 10 tokens/tweet

●

Noisy nature of the words (abbreviations, misspellings).

●

Due to the length of the text, we can make the assumption that one
tweet is written in only one language

●

We have adapted the TextCat language identification plugin

●

Provided fingerprints for 5 languages: DE, EN, FR, ES, NL

●

You can extend it to new languages easily
Language detection examples
●

x
Tokenisation
• Plenty of “unusual”, but very important tokens in social
media:
– @Apple – mentions of company/brand/person names
– #fail, #SteveJobs – hashtags expressing sentiment, person
or company names
– :-(, :-), :-P – emoticons (punctuation and optionally letters)
– URLs
• Tokenisation is crucial for entity recognition and opinion
mining
Example
#WiredBizCon #nike vp said when @Apple saw what
http://nikeplus.com did, #SteveJobs was like wow I didn't expect
this at all.

Tokenising on white space doesn't work that well:

Nike and Apple are company names, but if we have tokens such
as #nike and @Apple, this will make the entity recognition
harder, as it will need to look at sub-token level

Tokenising on white space and punctuation characters doesn't
work well either: URLs get separated (http, nikeplus), as are
emoticons and email addresses
The TwitIE Tokeniser
●

●

●

●

Treat RTs and URLs as 1 token each
#nike is two tokens (# and nike) plus a separate annotation
Hashtag covering both. Same for @mentions -> UserID
Capitalisation is preserved, but an orthography feature is
added: all caps, lowercase, mixCase
Date and phone number normalisation, lowercasing, and
emoticons are optionally done later in separate modules

●

Consequently, tokenisation is faster and more generic

●

Also, more tailored to our NER module
Normalisation
• “RT @Bthompson WRITEZ: @libbyabrego honored?! Everybody
knows the libster is nice with it...lol...(thankkkks a bunch;))”
• OMG! I’m so guilty!!! Sprained biibii’s leg! ARGHHHHHH!!!!!!
• Similar to SMS normalisation
• For some later components to work well (POS tagger, parser), it
is necessary to produce a normalised version of each token
• BUT uppercasing, and letter and exclamation mark repetition
often convey strong sentiment, so we keep both versions of
tokens
• Syntactic normalisation: determine when @mentions and #tags
have syntactic value and should be kept in the sentence, vs
replies, retweets and topic tagging
A normalised example

●

●

Normaliser currently based on spelling correction and some lists of
common abbreviations
Outstanding issues:
●

●

Some abbreviations which span token boundaries (e.g. gr8, do n’t)
difficult to handle
Capitalisation and punctuation normalisation
TwitIE NER Results
Analysing Hashtags
What's in a hashtag?
●

●

●

Hashtags often contain smushed words
●
#SteveJobs
●
#CombineAFoodAndABand
●
#southamerica
For NER we want the individual tokens so
we can link them to the right entity
For opinion mining, individual words in
the hashtags often indicate sentiment,
sarcasm etc.
●
#greatidea
●
#worstdayever
How to analyse hashtags?
●

●

●

●

Camelcasing makes it relatively easy to separate the words,
using an adapted tokeniser, but many people don't bother
We use a simple approach based on dictionary matching the
longest consecutive strings, working L to R
●
#lifeisgreat -> #-life-is-great
●
#lovinglife -> #-loving-life
It's not foolproof, however
●
#greatstart -> #-greats-tart
To improve it, we could use contextual information, or we
could restrict matches to certain POS combinations (ADJ+N is
more likely than ADJ+V)
Irony and sarcasm
• I had never seen snow in Holland before but thanks to twitter and
facebook I now know what it looks like. Thanks guys, awesome!
• Life's too short, so be sure to read as many articles about celebrity
breakups as possible.
• I feel like there aren't enough singing competitions on TV .
#sarcasmexplosion
• I wish I was cool enough to stalk my ex-boyfriend ! #sarcasm
#bitchtweet
• On a bright note if downing gets injured we have Henderson to
come in
Sarcasm is a part of British culture
●

So much so that the BBC has its own webpage on sarcasm
designed to teach non-native English speakers how to be
sarcastic successfully in conversation
BBC sarcasm quiz
How do you know when someone is being
sarcastic?
• Use of hashtags in tweets such as #sarcasm, #irony, #whoknew etc.
• Large collections of tweets based on hashtags can be used to make
a training set for machine learning
• But you still have to know what to do with sarcasm once you've
found it
• Although sarcasm generally entails saying the opposite of what you
mean, it doesn't necessarily just invert the polarity of an opinion
• “It's not like I wanted to eat breakfast anyway” is negative when
uttered sarcastically, but non-opinionated when uttered neutrally.
Identifying the scope of sarcasm
I am not happy that I woke up at 5:15 this morning.

#greatstart #sarcasm

You are really mature. #lying #sarcasm
Experiment with sarcastic hashtags










Collected a corpus of 134 tweets containing the hashtag
#sarcasm
Manually annotated sentences with sentiment

266 sentences, of which 68 opinionated (25%)

62 negative, 6 positive
Also annotated the same corpus as if the sarcasm was absent
Compared how well our applications performed on each, with
and without sarcasm analysis
The results were a little surprising
Even when we KNEW the statement was sarcastic, we didn't
always get the polarity of the opinion right
Effect of sarcasm on sentiment analysis
Sarcastic corpus

Precision

Recall

F1

Opinionated

74.58

63.77

68.75

Opinion+polarity - Regular

20.34

17.39

18.75

Polarity-only - Regular

27.27

27.27

27.27

Opinion+polarity - Sarcastic

57.63

49.28

53.13

Polarity-only - Sarcastic

77.02

77.28

77.28

Regular corpus
Opinionated
Opinion+polarity - Regular

Precision
57.89
45.61

Recall
58.93
46.43

F1
58.41
46.02

Polarity-only - Regular

78.79

78.79

78.79

Opinion+polarity - Sarcastic

22.81

23.21

23.01

Polarity-only - Sarcastic

39.40

39.39

39.39
What about non-textual content?
We can also do opinion mining on images and
multimedia
Image-opinion identification
• Facial expression analysis/classification
–
Helps with facial similarity calculations and face
recognition
–
Can be used to predict sentiment/polarity
–
Can be combined with analysis text from
document
●

Coarse-grained opinion classification
–
Looking at image-feature classification for
abstract concepts (sentiment / privacy /
attractiveness)
–
e.g. looking at image colours, placement of
interesting images in the picture
Multimodal opinion analysis


Investigate correlation between images and
whole-document opinions








Do documents asserting specific opinions
get illustrated with the same imagery?
e.g. articles about euro-scepticism in the
UK might be illustrated with images of
specific Conservative peers….
Is there correlation between low-level
image features and specific opinions?

Investigate finer-grained (i.e. sub-document)
correlations between imagery and opinions


e.g. sentence-level correlations
incorporating analysis of the document
layout
Demo: extracting opinions from images
So where does this leave us?
●

Social media is a tricky but interesting medium to analyse

●

Opinion mining is ubiquitous, but it's still far from perfect

●

●

●
●

●

There are lots of linguistic and social quirks that fool sentiment
analysis tools.
The good news is that this means there are lots of interesting
problems for us to research
And it doesn’t mean we shouldn’t use existing opinion mining tools
The benefits of a modular approach mean that we can pick the bits
that are most useful
Take-away message: it is critical to use the right tool for the right job
Don't be misled by the advertising: caveat emptor!
Acknowledgements
Further information
• Research supported by the EU-funded ARCOMEM, uComp and
TrendMiner projects
• See http://www.arcomem.eu and http://www.trend-miner.eu for
more details
• More information about GATE at http://gate.ac.uk
• Opinion mining demo:
http://demos.gate.ac.uk/arcomem/opinions/
• Learn about the technical details in the STIL 2013 tutorial: Practical
Opinion Mining for social media (Wednesday 11.30am)
Questions?

More Related Content

What's hot

Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real worldDiana Maynard
 
Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Diana Maynard
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATEDiana Maynard
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaDiana Maynard
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Diana Maynard
 
Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseGlobal Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseSeth Grimes
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisDiana Maynard
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningGeorge Ang
 
Sentiment analysis and opinion mining
Sentiment analysis and opinion miningSentiment analysis and opinion mining
Sentiment analysis and opinion miningSumit Sony
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...Yandex
 
Improving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingImproving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingUXPA International
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisYelena Mejova
 
Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Kavita Ganesan
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...Prateek Singh
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...CSCJournals
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News DetectorIrisYoon5
 

What's hot (20)

Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real world
 
Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATE
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Cls8 decarbonet
Cls8 decarbonetCls8 decarbonet
Cls8 decarbonet
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseGlobal Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and Sense
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media Analysis
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Sentiment analysis and opinion mining
Sentiment analysis and opinion miningSentiment analysis and opinion mining
Sentiment analysis and opinion mining
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
A Dream of Predicting Elections and Trading Stocks using Twitter - Yelena Mej...
 
Improving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive InterviewingImproving Your Surveys and Questionnaires with Cognitive Interviewing
Improving Your Surveys and Questionnaires with Cognitive Interviewing
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
 
Document(2)
Document(2)Document(2)
Document(2)
 
Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News Detector
 

Similar to What do you really mean when you tweet? Challenges for opinion mining on social media.

Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Diana Maynard
 
Social Media for NPO's
Social Media for NPO'sSocial Media for NPO's
Social Media for NPO'sAgency 323
 
Crisis Communications in a Social Media Age
Crisis Communications in a Social Media AgeCrisis Communications in a Social Media Age
Crisis Communications in a Social Media AgeJim Rettew
 
The Media Zones Where People Live And How To Connect With Them
The Media Zones Where People Live And How To Connect With ThemThe Media Zones Where People Live And How To Connect With Them
The Media Zones Where People Live And How To Connect With ThemKDMC
 
Are Filter Bubbles Real?
Are Filter Bubbles Real?Are Filter Bubbles Real?
Are Filter Bubbles Real?Axel Bruns
 
Managing crisis world vision
Managing crisis   world visionManaging crisis   world vision
Managing crisis world visionDavid Phillips
 
ScienceOnline impact workshop
ScienceOnline impact workshop ScienceOnline impact workshop
ScienceOnline impact workshop SpotOnLondon
 
Getting Fresh…Socially. A Social Fresh EAST Recap.
Getting Fresh…Socially. A Social Fresh EAST Recap.Getting Fresh…Socially. A Social Fresh EAST Recap.
Getting Fresh…Socially. A Social Fresh EAST Recap.ClearEdge Marketing
 
Social Media: Why it Matters
Social Media: Why it MattersSocial Media: Why it Matters
Social Media: Why it MattersLloyd Brown
 
Evidence Live Social Media Workshop
Evidence Live Social Media Workshop Evidence Live Social Media Workshop
Evidence Live Social Media Workshop Douglas Badenoch
 
analyzing public sentiments using twitter feeds
 analyzing public sentiments using twitter feeds analyzing public sentiments using twitter feeds
analyzing public sentiments using twitter feedsOrakzay
 
Let's Talk About Social Networking
Let's Talk About Social NetworkingLet's Talk About Social Networking
Let's Talk About Social NetworkingSteve Lowisz
 
[r]evolution: Educating Social Media - Workshop Slides
[r]evolution: Educating Social Media - Workshop Slides[r]evolution: Educating Social Media - Workshop Slides
[r]evolution: Educating Social Media - Workshop SlidesNathanielCarlson2
 
Raising Kids in a Digital World - Oasis Youth Center 2016
Raising Kids in a Digital World - Oasis Youth Center 2016Raising Kids in a Digital World - Oasis Youth Center 2016
Raising Kids in a Digital World - Oasis Youth Center 2016Holly Gerla
 
#JTSMAsocial - a social media workshop
#JTSMAsocial - a social media workshop#JTSMAsocial - a social media workshop
#JTSMAsocial - a social media workshopmedavep
 
Five Social Media Tricks to Grow Your Audience - for Colombia 3.0 Conference
Five Social Media Tricks to Grow Your Audience - for Colombia 3.0 ConferenceFive Social Media Tricks to Grow Your Audience - for Colombia 3.0 Conference
Five Social Media Tricks to Grow Your Audience - for Colombia 3.0 ConferenceDave LaFontaine
 

Similar to What do you really mean when you tweet? Challenges for opinion mining on social media. (20)

Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Social Media for NPO's
Social Media for NPO'sSocial Media for NPO's
Social Media for NPO's
 
Ff1 social media for festivals introduction
Ff1 social media for festivals   introductionFf1 social media for festivals   introduction
Ff1 social media for festivals introduction
 
Crisis Communications in a Social Media Age
Crisis Communications in a Social Media AgeCrisis Communications in a Social Media Age
Crisis Communications in a Social Media Age
 
The Media Zones Where People Live And How To Connect With Them
The Media Zones Where People Live And How To Connect With ThemThe Media Zones Where People Live And How To Connect With Them
The Media Zones Where People Live And How To Connect With Them
 
Are Filter Bubbles Real?
Are Filter Bubbles Real?Are Filter Bubbles Real?
Are Filter Bubbles Real?
 
Managing crisis world vision
Managing crisis   world visionManaging crisis   world vision
Managing crisis world vision
 
ScienceOnline impact workshop
ScienceOnline impact workshop ScienceOnline impact workshop
ScienceOnline impact workshop
 
Getting Fresh…Socially. A Social Fresh EAST Recap.
Getting Fresh…Socially. A Social Fresh EAST Recap.Getting Fresh…Socially. A Social Fresh EAST Recap.
Getting Fresh…Socially. A Social Fresh EAST Recap.
 
Personal. Portable. Participatory. Pervasive.
Personal. Portable. Participatory. Pervasive.Personal. Portable. Participatory. Pervasive.
Personal. Portable. Participatory. Pervasive.
 
Social Media: Why it Matters
Social Media: Why it MattersSocial Media: Why it Matters
Social Media: Why it Matters
 
Evidence Live Social Media Workshop
Evidence Live Social Media Workshop Evidence Live Social Media Workshop
Evidence Live Social Media Workshop
 
analyzing public sentiments using twitter feeds
 analyzing public sentiments using twitter feeds analyzing public sentiments using twitter feeds
analyzing public sentiments using twitter feeds
 
Iap2 Conference
Iap2 ConferenceIap2 Conference
Iap2 Conference
 
Let's Talk About Social Networking
Let's Talk About Social NetworkingLet's Talk About Social Networking
Let's Talk About Social Networking
 
[r]evolution: Educating Social Media - Workshop Slides
[r]evolution: Educating Social Media - Workshop Slides[r]evolution: Educating Social Media - Workshop Slides
[r]evolution: Educating Social Media - Workshop Slides
 
Raising Kids in a Digital World - Oasis Youth Center 2016
Raising Kids in a Digital World - Oasis Youth Center 2016Raising Kids in a Digital World - Oasis Youth Center 2016
Raising Kids in a Digital World - Oasis Youth Center 2016
 
#JTSMAsocial - a social media workshop
#JTSMAsocial - a social media workshop#JTSMAsocial - a social media workshop
#JTSMAsocial - a social media workshop
 
Bigdatahuman
BigdatahumanBigdatahuman
Bigdatahuman
 
Five Social Media Tricks to Grow Your Audience - for Colombia 3.0 Conference
Five Social Media Tricks to Grow Your Audience - for Colombia 3.0 ConferenceFive Social Media Tricks to Grow Your Audience - for Colombia 3.0 Conference
Five Social Media Tricks to Grow Your Audience - for Colombia 3.0 Conference
 

More from Diana Maynard

Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social mediaDiana Maynard
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayDiana Maynard
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Diana Maynard
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferencesDiana Maynard
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Diana Maynard
 
The language of social media
The language of social mediaThe language of social media
The language of social mediaDiana Maynard
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-searchDiana Maynard
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Diana Maynard
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsDiana Maynard
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Diana Maynard
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...Diana Maynard
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDiana Maynard
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Diana Maynard
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 

More from Diana Maynard (15)

Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social media
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferences
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutions
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged Sword
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

What do you really mean when you tweet? Challenges for opinion mining on social media.

  • 1. What do you really mean when you tweet? Challenges for opinion mining on social media Dr. Diana Maynard University of Sheffield, UK
  • 2. The Social Web Information, thoughts and opinions are shared prolifically these days on the social web
  • 3. Who cares about social media though? Isn't Twitter just full of stupid messages about Justin Bieber?
  • 4. Well, social media has other uses too http://socialmediatoday.com/node/1568271
  • 5.     One in six people have used social media to get information about an emergency One in two people would sign up for emails, text alerts, or applications to receive any of the emergency information. 75% of people would use Facebook to post eyewitness information on an emergency or newsworthy event; 22% would use blogs, 21% would use Twitter During an emergency, one in two people would use social media to let loved ones know they are safe
  • 6. It's all a bit new-fangled, isn't it? ● Well actually, social media goes back a long way ● The first email was sent in 1971 ● But it really goes back much further ● ● The first documented postal service was in 550BC, although there was evidence of written couriers long before that However, communication speed is a little faster these days!
  • 7. Let's rewind a little...
  • 8.
  • 9. Drowning in information • It can be difficult to get the relevant information out of such large volumes of data in a useful way • Social web analysis is all about the users who are actively engaged and generate content • Social networks are pools of a wide range of articulation methods, from simple "I like it" buttons to complete articles
  • 10. Opinion Mining • Along with NER, opinion mining is a key component in social web analysis • NER: names of people, organisations, locations • Opinion mining: what sentiments are being expressed?
  • 11. Opinion Mining is about finding out what people think...
  • 14. And one for the Portuguese speakers :-)
  • 16. It's not just about product reviews • Much opinion mining research has been focused around reviews of films, books, electronics etc. • But there are many other uses – companies want to know what people think – finding out political and social opinions and moods – investigating how public mood influences the stock market – investigating and preserving community memories – drawing inferences from social analytics
  • 17. And taking it a step further It allows us to answer questions like: • What are the opinions on crucial social events and the key people involved? • How are these opinions distributed in relation to demographic user data? • How have these opinions evolved? • Who are the opinion leaders? • What is their impact and influence?
  • 18. Analysing Public Mood • Closely related to opinion mining is the analysis of sentiment and mood • Mood of the Nation project at Bristol University http://geopatterns.enm.bris.ac.uk/mood/ • Mood has proved more useful than sentiment for things like stock market prediction (fluctuations are driven mainly by fear rather than by things like happiness or sadness)
  • 19.
  • 20. Derwent Capital Markets ● ● ● ● Derwent Capital Markets launched a £25m fund in 2011 that made its investments via social media analysis by evaluating whether people are generally happy, sad, anxious or tired DCM Capital used a proprietary algorithm to research the public sentiment of stock, primarily through Twitter, to attempt to predict the movements of the Dow Jones Industrial Average. Bollen told the Sunday Times: "We recorded the sentiment of the online community, but we couldn't prove if it was correct. So we looked at the Dow Jones to see if there was a correlation. We believed that if the markets fell, then the mood of people on Twitter would fall.” "But we realised it was the other way round — that a drop in the mood or sentiment of the online community would precede a fall in the market.”
  • 21. But it didn't quite work out as planned... ● ● ● ● ● ● It was later suggested that there are actually many flaws in Bollen's work, and that it's impossible to predict the stock market in this way The "Twitter Fund"─ formally, The Derwent Absolute Return Fund ─ was launched in July 2011, but failed to survive the summer, despite posting initial returns, and the company was sold for peanuts in Feb 2013 There's quite a lot of sloppiness in the reporting of methodology and results, so it's not clear what can really be trusted The advertised results are biased by selection (they picked the winners after the race and tried to show correlation) The accuracy claim is too general to be useful (you can't predict individual stock prices, only the general trend) However, most trading companies now use some form of social media analysis to help with prediction, though it's usually quite shallow
  • 22. Transatlantic Trends    This annual diplomatic report is a manually collected survey of US and European public opnion It informs politicians in international relations by revealing reasoning behind multilateral negotiations But it's expensive and time-consuming to create - the kind of thing that global sentiment analysis can replace, and in real-time, instead of annually
  • 23. Twitter Gives you Flu! ● ● ● Researchers at the University of Rochester used twitter analysis to predict who would get flu They looked at the role of interactions between users on social media on the real-life spread of the disease Researchers at Johns Hopkins also reckon they can do better at flu tracking via Twitter analysis than the CDC.
  • 24. The Social Oscars 2013 Brandwatch ran a project to investigate how closely public opinion predicted/mirrored the results of the 2013 Oscars
  • 25. Tracking opinions over time ● ● ● ● Opinions can be extracted with a time stamp and/or a geo-location We can then analyse changes to opinions about the same entity/event over time, and other statistics We can also measure the impact of an entity or event on the overall sentiment about an entity or another event, over the course of time (e.g. in politics) Also possible to incorporate statistical (non-linguistic) techniques to investigate dynamics of opinions, e.g. find statistical correlations between interest in certain topics or entities/events and number/impact/influence of tweets etc.
  • 27. Mapping dynamics from social media: UK riots demo
  • 28. Opinion mining is like “Ask the Audience”
  • 29. But be careful! Sentiment analyis isn't just about looking at the sentiment words ● ● ● “It's a great movie if you have the taste and sensibilities of a 5-year-old boy.” “It's terrible Candidate X did so well in the debate last night.” “I'd have liked the film a lot more if it had been a bit shorter.” Situation is everything. If you and I are best friends, then my graceful swearing at you is different than if it’s at my boss.
  • 30. Death confuses opinion mining tools  Opinion mining tools are good for a general overview, but not for some situations
  • 31. Whitney Houston wasn't very popular...
  • 33. Why are many opinion mining tools unsuccessful? • They don't work well at more than a very basic level • They mainly use dictionary lookup for positive and negative words • They classify the tweets as positive or negative, but not with respect to the keyword you're searching for • First, the keyword search just retrieves any tweet mentioning it, but not necessarily about it as a topic • Second, there is no correlation between the keyword and the sentiment: the sentiment refers to the tweet as a whole • Sometimes this is fine, but it can also go horribly wrong
  • 34. Why bother with opinion mining? • It depends what kind of information you want • Don't use opinion mining tools to help you win money on quiz shows • Recent research has shown that one knowledgeable analyst is better than gathering general public sentiment from lots of analysts and taking the majority opinion • But only for some kinds of tasks • If you want a general overview about public sentiment on a topic like the Olympic Games or Justin Bieber, it'll probably work out OK
  • 35. Challenges imposed by social media • Language: incorrect use of language makes NLP hard ● Solution: specific pre-processing for Twitter. use shallow analysis techniques with back-off strategies; incorporate specific subcomponents for swear words, sarcasm etc. • Relevance: topics and comments can rapidly diverge. ● Solution: train a classifier or use clustering techniques • Lack of context: hard to disambiguate entities ● Solution: use metadata for further information, also aggregation of data can be useful
  • 36. Analysing language in social media ● Sumbuddy: Hey, hao es your familie? Guy: They got crushed by a bus and died. Sumbuddy: Daz so sad...wanna get iscreem? ● ● OMMMFG!!! JUST HEARD EMINEM'S “RAPGOD”. SMFH!!! these other dudes might as well stop rapping if they not on this level @adambation Try reading this article , it looks like it would be really helpful and not obvious at all #sarcasm http://t.co/mo3vODoX
  • 37. Short sentences in tweets • Social media, and especially tweets, can be problematic because sentences are very short and/or incomplete • Typically, linguistic pre-processing tools such as tokenisers, POS taggers and parsers do badly on such texts • Even language identification tools can have problems • Need for special NLP pre-processing tools
  • 38. Lack of context causes ambiguity Branching out from Lincoln park after dark ... Hello Russian Navy, it's like the same thing but with glitter! ??
  • 39. Getting the NEs right is crucial Branching out from Lincoln park after dark ... Hello Russian Navy, it's like the same thing but with glitter!
  • 40. The Problem with NER • Running standard IE tools (ANNIE) on 300 news articles – 87% Fmeasure • Running ANNIE on some tweets - < 40% F-measure
  • 41. Example: Persons in news articles
  • 43. TwitIE to the rescue
  • 44. Language identification is tricky ● Language identification tools such as TextCat need a decent amount of text (around 20 words at least) ● But Twitter has an average of only 10 tokens/tweet ● Noisy nature of the words (abbreviations, misspellings). ● Due to the length of the text, we can make the assumption that one tweet is written in only one language ● We have adapted the TextCat language identification plugin ● Provided fingerprints for 5 languages: DE, EN, FR, ES, NL ● You can extend it to new languages easily
  • 46. Tokenisation • Plenty of “unusual”, but very important tokens in social media: – @Apple – mentions of company/brand/person names – #fail, #SteveJobs – hashtags expressing sentiment, person or company names – :-(, :-), :-P – emoticons (punctuation and optionally letters) – URLs • Tokenisation is crucial for entity recognition and opinion mining
  • 47. Example #WiredBizCon #nike vp said when @Apple saw what http://nikeplus.com did, #SteveJobs was like wow I didn't expect this at all.  Tokenising on white space doesn't work that well:  Nike and Apple are company names, but if we have tokens such as #nike and @Apple, this will make the entity recognition harder, as it will need to look at sub-token level  Tokenising on white space and punctuation characters doesn't work well either: URLs get separated (http, nikeplus), as are emoticons and email addresses
  • 48. The TwitIE Tokeniser ● ● ● ● Treat RTs and URLs as 1 token each #nike is two tokens (# and nike) plus a separate annotation Hashtag covering both. Same for @mentions -> UserID Capitalisation is preserved, but an orthography feature is added: all caps, lowercase, mixCase Date and phone number normalisation, lowercasing, and emoticons are optionally done later in separate modules ● Consequently, tokenisation is faster and more generic ● Also, more tailored to our NER module
  • 49. Normalisation • “RT @Bthompson WRITEZ: @libbyabrego honored?! Everybody knows the libster is nice with it...lol...(thankkkks a bunch;))” • OMG! I’m so guilty!!! Sprained biibii’s leg! ARGHHHHHH!!!!!! • Similar to SMS normalisation • For some later components to work well (POS tagger, parser), it is necessary to produce a normalised version of each token • BUT uppercasing, and letter and exclamation mark repetition often convey strong sentiment, so we keep both versions of tokens • Syntactic normalisation: determine when @mentions and #tags have syntactic value and should be kept in the sentence, vs replies, retweets and topic tagging
  • 50. A normalised example ● ● Normaliser currently based on spelling correction and some lists of common abbreviations Outstanding issues: ● ● Some abbreviations which span token boundaries (e.g. gr8, do n’t) difficult to handle Capitalisation and punctuation normalisation
  • 53. What's in a hashtag? ● ● ● Hashtags often contain smushed words ● #SteveJobs ● #CombineAFoodAndABand ● #southamerica For NER we want the individual tokens so we can link them to the right entity For opinion mining, individual words in the hashtags often indicate sentiment, sarcasm etc. ● #greatidea ● #worstdayever
  • 54. How to analyse hashtags? ● ● ● ● Camelcasing makes it relatively easy to separate the words, using an adapted tokeniser, but many people don't bother We use a simple approach based on dictionary matching the longest consecutive strings, working L to R ● #lifeisgreat -> #-life-is-great ● #lovinglife -> #-loving-life It's not foolproof, however ● #greatstart -> #-greats-tart To improve it, we could use contextual information, or we could restrict matches to certain POS combinations (ADJ+N is more likely than ADJ+V)
  • 55. Irony and sarcasm • I had never seen snow in Holland before but thanks to twitter and facebook I now know what it looks like. Thanks guys, awesome! • Life's too short, so be sure to read as many articles about celebrity breakups as possible. • I feel like there aren't enough singing competitions on TV . #sarcasmexplosion • I wish I was cool enough to stalk my ex-boyfriend ! #sarcasm #bitchtweet • On a bright note if downing gets injured we have Henderson to come in
  • 56. Sarcasm is a part of British culture ● So much so that the BBC has its own webpage on sarcasm designed to teach non-native English speakers how to be sarcastic successfully in conversation
  • 58. How do you know when someone is being sarcastic? • Use of hashtags in tweets such as #sarcasm, #irony, #whoknew etc. • Large collections of tweets based on hashtags can be used to make a training set for machine learning • But you still have to know what to do with sarcasm once you've found it • Although sarcasm generally entails saying the opposite of what you mean, it doesn't necessarily just invert the polarity of an opinion • “It's not like I wanted to eat breakfast anyway” is negative when uttered sarcastically, but non-opinionated when uttered neutrally.
  • 59. Identifying the scope of sarcasm I am not happy that I woke up at 5:15 this morning. #greatstart #sarcasm You are really mature. #lying #sarcasm
  • 60. Experiment with sarcastic hashtags       Collected a corpus of 134 tweets containing the hashtag #sarcasm Manually annotated sentences with sentiment  266 sentences, of which 68 opinionated (25%)  62 negative, 6 positive Also annotated the same corpus as if the sarcasm was absent Compared how well our applications performed on each, with and without sarcasm analysis The results were a little surprising Even when we KNEW the statement was sarcastic, we didn't always get the polarity of the opinion right
  • 61. Effect of sarcasm on sentiment analysis Sarcastic corpus Precision Recall F1 Opinionated 74.58 63.77 68.75 Opinion+polarity - Regular 20.34 17.39 18.75 Polarity-only - Regular 27.27 27.27 27.27 Opinion+polarity - Sarcastic 57.63 49.28 53.13 Polarity-only - Sarcastic 77.02 77.28 77.28 Regular corpus Opinionated Opinion+polarity - Regular Precision 57.89 45.61 Recall 58.93 46.43 F1 58.41 46.02 Polarity-only - Regular 78.79 78.79 78.79 Opinion+polarity - Sarcastic 22.81 23.21 23.01 Polarity-only - Sarcastic 39.40 39.39 39.39
  • 63. We can also do opinion mining on images and multimedia
  • 64. Image-opinion identification • Facial expression analysis/classification – Helps with facial similarity calculations and face recognition – Can be used to predict sentiment/polarity – Can be combined with analysis text from document ● Coarse-grained opinion classification – Looking at image-feature classification for abstract concepts (sentiment / privacy / attractiveness) – e.g. looking at image colours, placement of interesting images in the picture
  • 65. Multimodal opinion analysis  Investigate correlation between images and whole-document opinions     Do documents asserting specific opinions get illustrated with the same imagery? e.g. articles about euro-scepticism in the UK might be illustrated with images of specific Conservative peers…. Is there correlation between low-level image features and specific opinions? Investigate finer-grained (i.e. sub-document) correlations between imagery and opinions  e.g. sentence-level correlations incorporating analysis of the document layout
  • 67. So where does this leave us? ● Social media is a tricky but interesting medium to analyse ● Opinion mining is ubiquitous, but it's still far from perfect ● ● ● ● ● There are lots of linguistic and social quirks that fool sentiment analysis tools. The good news is that this means there are lots of interesting problems for us to research And it doesn’t mean we shouldn’t use existing opinion mining tools The benefits of a modular approach mean that we can pick the bits that are most useful Take-away message: it is critical to use the right tool for the right job
  • 68. Don't be misled by the advertising: caveat emptor!
  • 70. Further information • Research supported by the EU-funded ARCOMEM, uComp and TrendMiner projects • See http://www.arcomem.eu and http://www.trend-miner.eu for more details • More information about GATE at http://gate.ac.uk • Opinion mining demo: http://demos.gate.ac.uk/arcomem/opinions/ • Learn about the technical details in the STIL 2013 tutorial: Practical Opinion Mining for social media (Wednesday 11.30am)