Download so you can read the notes fields! Abstract:
Sentiment analysis is a rudimentary classification of messages into buckets like positive, negative, and neutral. On its own, sentiment analysis rarely answers key business questions. Though it is automatic and scalable.
Now what would you do with unlimited human analysts? You’d ask them to classify messages into categories that enable you to take action. Machine learning models with humans-in-the-loop can power sophisticated classification.
This talk walks through case studies that demonstrate the value of categorizations beyond sentiment: detecting the toxicity/supportiveness of Reddit communities, understanding the effectiveness of Always’ #likeagirl campaign, and routing text messages to UNICEF.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Sentiment is just a stepping stone: Getting more out of Natural Language Processing/Machine Learning
1. Social Media & Web Analytics Innovation
Sentiment is just a stepping stone
2. Social Media & Web Analytics Innovation
Hello online viewers of this slide deck!
• A lot of the content here is visual—you’ll want to download
the full presentation and read the notes fields
• You can also (soon) find the video version by looking at the
Social Media & Web Analytics Innovation site
• You can also stay tuned for more content by checking out
our blog: http://www.idibon.com/blog
• The case studies are, partly, covered by these blog posts btw:
• http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-
darkest-depths-of-the-interwebs/
• http://idibon.com/run-fast-as-you-can-likeagirl-advocates-and-
brand-campaign-roi/
• http://idibon.com/idibon-supports-unicef-provide-natural-language-
processing-sms-based-social-monitoring-systems-africa/
3. Social Media & Web Analytics Innovation
What’s ahead
• Quick overview of sentiment analysis
• It’s tricky
• And limited
• Can we do more?
• Yep
• Case studies
• Detecting toxicity/supportiveness of Reddit communities
• Understanding the effectiveness of Always’ #LikeAGirl
campaign
• Routing text messages to different groups in UNICEF
4. Social Media & Web Analytics Innovation
We are not
robots
5. Social Media & Web Analytics Innovation
Though
automation
makes our
lives easier
18. Social Media & Web Analytics Innovation
4 billion web pages
20 million candidates
1-10 words each
178,104 polarity phrases
19. Social Media & Web Analytics Innovation
(In English)
(only)
20. Social Media & Web Analytics Innovation
Dutch
tet
“Underscores the polarity
of the clause and expresses
either irritation or surprise,
as if he or she had
expected the opposite
state of affairs”
21. Social Media & Web Analytics Innovation
Tongan
si’i and si’a
Different determiners
(~the, that, etc) express
sympathy
22. Social Media & Web Analytics Innovation
Cantonese
-k at the end of particles
“An emotion intensifier”
23. Social Media & Web Analytics Innovation
95% of the world’s conversations
are not in English
25. Social Media & Web Analytics Innovation
Different domains have different proportions
0% 10% 20% 30% 40% 50% 60% 70%
Positive
Negative
Conflict
Neutral
Restaurants
Laptops
31. Social Media & Web Analytics Innovation
How is sentiment for particular categories?
0% 10% 20% 30% 40% 50% 60% 70% 80%
Positive
Negative
Anecdotes
Ambience
Service
Price
Food
32. Social Media & Web Analytics Innovation
Setting the bar—at a minimum:
Accuracy
(which is tied to your training data)
+
An ability to do something
33. Social Media & Web Analytics Innovation
BEYOND SENTIMENT
34. Social Media & Web Analytics Innovation
What would you do with unlimited human analysts?
You’d ask them to classify messages into categories that
enable you to take action.
Machine learning models with humans-in-the-loop can
power sophisticated classification.
38. Social Media & Web Analytics Innovation
Toxicity > sentiment
• People don’t like things; they talk about them
• Negative comments aren’t the same as toxic comments
• Negative can be constructive
• Finding hateful and hate-inciting speech—that’s
important
• To keep people safe
• To keep communities healthy
39. Social Media & Web Analytics Innovation
The importance of definition
• If people can’t agree on what’s-in and what’s-out, it’s
hard to train a machine
41. Social Media & Web Analytics Innovation
Wait a sec! Aren’t these ducks?
(Can we agree to disagree?)
42. Social Media & Web Analytics Innovation
The importance of definition
• If people can’t agree on what’s-in and what’s-out, it’s
hard to train a machine
• In our case toxicity was defined as:
• ad hominem attacks (directed at specific people)
• bigoted comments (e.g., sexist, racist, homophobic, etc)
• Set definitions
• Then see if people are consistent
• Run pilots
• Do inter-annotator agreement
• Iterate
43. Social Media & Web Analytics Innovation
Sentiment is not IRRELEVANT
• A lot of comments are Neutral
• So that doesn’t teach us much about hate speech
• And we’ll waste a lot of time and money getting training
data on Neutral
• So we ran an experiment:
• Annotate random data
• Annotate stuff that our sentiment models say is Negative
44. Social Media & Web Analytics Innovation
Work savings!
• Items chosen for review based on our sentiment model
were MUCH more likely to be toxic or supportive
• A decrease of 96% of effort
46. Social Media & Web Analytics Innovation
Analyst time savings is a key benefit
73%
83%
88%
80%
91%
81%
87%
85%
90%
99%
% analyst time saved
% accuracy (compared to humans)
Finding relevant business articles
News category 1 News category 2 Health sciences
News category 4 Manufacturing
47. Social Media & Web Analytics Innovation
Okay back to community health
48. Social Media & Web Analytics Innovation
Finding healthy communities (supportive)
49. Social Media & Web Analytics Innovation
And unhealthy ones (toxic)
51. Social Media & Web Analytics Innovation
Unstructured data gets structured (bonus: a
system that gets smarter over time)
Adaptive System
Machine
Learning
Optimization
Human
Annotation
Prediction
Engine
Structured Data Reports
Action
52. Social Media & Web Analytics Innovation
By structuring text, you can do all kinds of
visualizations
53. Social Media & Web Analytics Innovation
Learning more about ad campaigns than just “people
liked it”: #LikeAGirl
54. Social Media & Web Analytics Innovation
The most re-shared #LikeAGirl post
55. Social Media & Web Analytics Innovation
60 second ad
= ~ $9 million
114.4 million viewers
= ~ $0.08 per viewer
56. Social Media & Web Analytics Innovation
Always only spent 30%
of what Anheuser-Busch did
But they had twice the tweets
57. Social Media & Web Analytics Innovation
Not all sharers and resharers
are of equal value
59. Social Media & Web Analytics Innovation
Influencers extend the brand a lot
60. Social Media & Web Analytics Innovation
Posts by brand and ad advocates
reach twice as far as posts by @Always
61. Social Media & Web Analytics Innovation
If we lumped everyone who used #LikeAGirl together
We wouldn’t know the difference between
People talking about the ad (and products)
And people talking about the cause
62. Social Media & Web Analytics Innovation
Antagonists mainly posted their sexist content to #LikeABoy
Defenders overwhelmed them with 3-4 times the content (yay!)
63. Social Media & Web Analytics Innovation
Positive sentiment would lump everyone together
And negative sentiment would lump
Antagonists (sexists)
in with
Defenders (anti-sexist)
64. Social Media & Web Analytics Innovation
Routing messages that matter
65. Social Media & Web Analytics Innovation
Processing millions of SMS in 12 African languages
Intent of sender
(i.e. report a problem, ask
a question or make a
suggestion)
Categorization
(i.e. orphans and
vulnerable children,
violence against children,
health, nutrition)
Language detection
(i.e. English, Acholi,
Karamojong, Luganda,
Nkole, Swahili, Lango)
Location
(i.e. village names)
69. Social Media & Web Analytics Innovation
Top 3 categories in Nigeria
9.69%
17.68%
39.44%
Employment
U-report support
Health
70. Social Media & Web Analytics Innovation
Quick conclusion
• Sentiment analysis is pretty rudimentary
• On its own, it rarely answers key business questions
• Though it IS automatic and scalable
• Think of it as an example of natural language processing
• There’s a lot more you can do
• The key is formulating specific questions
• And training the system on RELEVANT data
• For this, you’ll need to optimize humans
71. Social Media & Web Analytics Innovation
email tyler@idibon.com
twitter @idibon
www idibon.com
THANK YOU!
72. Social Media & Web Analytics Innovation
Accuracy of ~20 teams
Restaurant
categories (F-score)
Restaurant category
polarity (F-score)
Top score 88.57 82.92
Median 74.24 69.75
Baseline (~ “let’s
always guess the
most popular
category)
68.89 64.09
We care about overall accuracy, so we need to multiply how often the right category goes
with the right polarity.
73. Social Media & Web Analytics Innovation
95% of the world’s conversations are not in
English. Idibon covers 99% of the world’s GDP.
Rapidly tag and filter your chosen topics
and criteria in any language
Monitor how people respond to your brand
differently around the world
One unified system versus data cobbled together
from disparate systems
Idibon works with:
English, Portuguese (Brazilian and
from Portugal), Spanish, Italian,
French, Russian, German, Turkish,
Arabic, Japanese, Greek,
Mandarin Chinese, Persian,
Polish, Dutch, Swedish, Serbian,
Romanian, Korean, Hungarian,
Bulgarian, Hindi, Croatian, Czech,
Ukrainian, Finnish, Hebrew, Urdu,
Catalan, Slovak, Indonesian,
Malay, Vietnamese, Bengali, Thai,
Navajo, Latvian, Estonian,
Lithuanian, Kurdish, Yoruba,
Amharic, Zulu, Hausa, Kazakh,
Sindhi, Punjabi, Tagalog,
Cebuano, Danish and Emoji.
77. Social Media & Web Analytics Innovation
Navajo
=go
Emotional evaluation in
narrative
Editor's Notes
(Part of what happens in this talk is: whoa, people and language are complicated)
One of the reasons we like things like sentiment analysis is because they are scalable/automatic.
But, uh, machines have their limitations
This slide and the next two talk about three aspects of language.
We might expect “referential” to be the easiest for machines…but think about reference. Sometimes we talk about “The JW Marriott in San Francisco” but we also talk about “the Marriott” and “here” and “there” all meaning the same thing. Doh.
One aspect of language is that we use it with audiences…who we often mean to influence.
And we of course express ourselves.
There are a lot of feelings, so there are a lot of ways of talking about them.
From “wefeelfine.org”.
http://purl.stanford.edu/fm335ct1355 for full context of this (lists assembled by various computational linguists, psychologists, and others)
The basic point: there are a fair number of words that seem to carry emotional content
Language is tricky because people do all sorts of stuff.
Language CHANGE is one of the chief rules of language.
This is not a word/spelling that has always existed. But it clearly conveys sentiment.
But what about longer phrases? “Doctors” “ordering” things is usually bad but “just what the doctor ordered” is a positive sentiment.
Perhaps “snug” is positive, but “bug” is usually bad. “Snug as a bug” or “Snug as a bug in a rug” are positive.
(Or pug in an ugg on the rug)
Velikovich et al (2010)
(Craenenbroeck & Haegeman, 2007, p. 175)
Languages do things differently.
This is mostly interesting because if you just took a dictionary of emotional terms in English and translated them into Tongan to get a Tongan list…you’d fail. You might get “happy” but you probably wouldn’t try to get determiners and yet, here they are, being emotional.
Btw, there is evidence that this and that in English DO carry some affective meaning: https://corplinguistics.wordpress.com/2011/11/17/who-is-the-sarah-palin-of-the-canterbury-tales/
Different determiners (the, that, etc) express sympathy to the DP they head (Hendrick, 2005)
The –k that’s appended to Cantonese particles is an “emotion intensifier” (Sybesma & Li, 2007). If you’re a Cantonese speaker, take a look at the particles in the table on page 4 here: http://linguistics.berkeley.edu/~herman/documents/CantoneseFPs_SyntaxCircle_Leung.pdf
This is an example of a sound that is PART of words. That’s also tricky if you want to do translations.
Basically: don’t do translations. Work in the language with training data provided by native speakers who understand cultural context.
Making a bunch of lists is not practical for every language (or every category) you care about.
You can use machine learning but the key will be creating training data in a smart way.
From SemEval 2014
http://www.aclweb.org/anthology/S/S14/S14-2.pdf
Genres make a difference. People speak differently in different contexts. It’s not that different domains/genres/contexts have nothing to do with each other but conventions can vary a lot.
This is part of why you want to train on the data you care about.
You do not, to think about the last slide, want to use laptop reviews to predict restaurant reviews.
People express sentiment differently in different domains. For example, it’s useful to understand the emotional state of people calling or emailing customer support—but people doing that aren’t expressing the whole range of emotions. Mainly you get rage and resignation. You don’t often get much in the way of joy or pride.
So they are mostly unhappy. What is sentiment analysis going to tell you? You minimally need to figure out WHAT they are unhappy about.
You want to route the right content to the right person.
It may be that Lily is the right person to route furious people to but not the best person to route resigned people to.
Or maybe you should optimize on something other than emotion—maybe type of issue.
From SemEval 2014
http://www.aclweb.org/anthology/S/S14/S14-2.pdf
Notice that people are very positive about food, but negative about service.
Check out Dan Jurafsky’s work on Yelp reviews—1 star reviews are never about the food, it’s always about traumatic service. (Literally: people use the language of trauma!)
First case study!
http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/
Lately, Reddit has gotten a lot of press for having terrible, awful communities
It’s a topic that even redditors talk about (qualitatively—we wanted to answer it quantitatively)
The important thing is having definitions people will agree with and can be consistent with…and which actually answer organizational objectives. Do you care about whether duck decoys and/or rubber duckies are ducks or not? WHY?
http://blog.ioactive.com/2013/05/security-101-machine-learning-and-big.html
The trickiest thing about ad hominem attacks as a definition is: what to do with trash talk in sports/gaming. Tricky!
Let’s quickly step out to talk about one of the major values of using natural language processing: workload reduction.
In this quick tangent, we’re looking at classifying business news as Relevant or Irrelevant for one of our clients (anonymized!)
As you can see—different categories have different results.
News category 1 is awesome—you really don’t have to show human analysts much data to get all the Relevant stuff (you show them 10% of the data and still get 99% of what the client cares about)
Manufacturing is less awesome. You can reduce your workload to just 73% of what it was…but you have to accept that you’ll only get 83% of the stuff you care about (you’ll miss 17%). If you want to get more like 90% accuracy, you need to review more documents. You “only” get a workload reduction of ~56%.
Ideally, you want a system that gets better over time.
This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/
The DIY (do it yourself) group is the one that is most supportive and least toxic. This data ties to actual upvote/downvote behavior. Meaning that you’re not actually a supportive community if everyone down votes the supportive comments, nor are you a toxic community if everyone downvotes the toxic comments.
This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/
It’s only when everyone upvotes toxic comments that you are a toxic community by our definition here.
We also specifically looked at bigotry.
Indeed, /r/TheRedPill, is seen as the most bigoted. It’s a subreddit dedicated to proud male chauvinism.
This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/
This is the basic stuff you want. (It’s a little self-serving because Idibon’s adaptive system is what makes us special but we really do believe that optimizing training on relevant data with meaningful categories is THE way to deliver business value.)
By using computers to create an initial understanding of data and elevate specific cases for Human Annotation, we use computers to make human decisions smarter, and humans to make computer decisions smarter. Our system optimizes work by using cutting edge Machine Learning that improves accuracy and learns iteratively. Our Prediction Engine provides initial conclusions for further evaluation by human analysts and is also what allows us to scale ten of millions messages a day. Our Optimization process teaches our algorithm what results to select for, essentially refining its accuracy. The key take away here is that we optimize for human analysts time; we can cluster data initially and automatically, then we can escalate specific cases to human annotation. Much of the learning is unsupervised and therefore faster, cheaper and actually more accurate.
After iterations in our adaptive system, previously unstructured data is now structured.
This structured data can be delivered in different outputs, including CSV file exports for your analysts to build reports or direct routing to customer service agents to take action.
Once you have structured your unstructured data, it’s easy to visualize it—in plot.ly as before, in Excel, or as here, in Tableau (this visualization took 15 minutes for someone who had never used Tableau at all to produce from our data).
You can identify advocates and measure success of marketing campaigns using natural language processing. Idibon analyzed the #likeagirl campaign, which Always ran during and after this year’s Super Bowl. (GO WATCH IT IF YOU HAVEN’T!)
Case study two:
http://idibon.com/run-fast-as-you-can-likeagirl-advocates-and-brand-campaign-roi/
This information is easy to extract. It doesn’t require any Natural Language Processing.
Real money is on the line
And there’s already some ways of charting success without sophisticated text analytics/NLP.
Buuuuuut there’s more you can do with NLP.
This focuses on the 2,519 most influential sharers.
Look at how content creators who aren’t associated with @Always extend the brand!
Case study three:
http://idibon.com/idibon-supports-unicef-provide-natural-language-processing-sms-based-social-monitoring-systems-africa/
Photo: http://unicefaids.tumblr.com/post/37835112363/photo-young-people-in-kitwe-zambia-explore-the
The United Nations Children’s Fund (UNICEF) is a United Nations branch that provides long-term humanitarian and developmental assistance to children and mothers in developing countries. Idibon provides scalable natural language processing and analytics to UNICEF’s multinational U-report applications, enabling UNICEF to process text messages sent from citizens in Uganda and Nigeria “to better understand and empower marginalized communities that are often excluded due to language barriers.” (Evan Wheeler, CTO of UNICEF’s Global Innovation Centre)
UNICEF U-report only has six dedicated analysts to process and respond to millions of messages a month and Idibon’s technology enables the organization to operate efficiently and at scale.
Specifically, Idibon processes each SMS in four ways:
Intent of sender – to prioritize support/services (UNICEF receives more than a million messages a month and can only respond to about a thousand)
Categorization – to prioritize support/services and to route to appropriate analyst
Language detection – to route to appropriate analyst
Location – to identify where to send support/services
Press release: http://unicefstories.org/2015/02/09/idibon-supports-unicef-to-provide-natural-language-processing-to-sms-based-social-monitoring-systems-in-africa/
Environment is an important issue.
But it looks to be about 1.4% of the data…which means you do have to get enough data to build a model. Note that different countries/languages talk about the environment differently (Uganda=droughts, cows; Nigeria: oil). So you may have more or less heterogeneity in your rarer categories.
Image from http://www.theatlantic.com/photo/2011/06/nigeria-the-cost-of-oil/100082/
For more recent news: http://www.theguardian.com/environment/2015/jan/07/niger-delta-communities-to-sue-shell-in-london-for-oil-spill-compensation
“Environment” is clearly an important issue in Nigeria but only 1.4% of the messages are classified that way.
(One other thing: high/low percentages don’t necessarily correspond to personal or societal importance.)
Each needle found makes the next one easier to find, buuuuuuut some things you want to find are just too rare. You can’t model things that aren’t in the data.
At UNICEF, different people care about different categories—the people who respond to rumors of ebola outbreaks or cures are different than the people trying to keep track of economic issues.
Most actionable is, of course, finding people who specifically require support about participating in the community.
Accuracy in opinion mining is not easy (multiply these rows)
Brands want to support the full range of communications that are important to your consumers all over the world. We can offer insight to multiple languages in one system as opposed to using multiple systems then having to consolidate the findings. We do this through a network of crowdsourcing and human analysts around the world, which through annotation, teaches our algorithm to continuously improve.
Distributions don’t mean what you think they do. People are very positive online. Weird.
https://xkcd.com/1098/
Genres (for fun)
Genres (for fun)
Sorry, this is very linguistic—the point here is that this marker does different things in different syntactic contexts.
In Navajo, =go normally serves as a subordinate marker, but it can also appear in utterances where there is no matrix sentence. When it is used this way (as in narration), it marks emotional evaluation and background information (see Mithun (2008) on Navajo as well as other languages that have similarly behaving subordinate markers).