Sentiment is just a stepping stone: Getting more out of Natural Language Processing/Machine Learning

Social Media & Web Analytics Innovation
Sentiment is just a stepping stone

Hello online viewers of this slide deck!
• A lot of the content here is visual—you’ll want to download
the full presentation and read the notes fields
• You can also (soon) find the video version by looking at the
Social Media & Web Analytics Innovation site
• You can also stay tuned for more content by checking out
our blog: http://www.idibon.com/blog
• The case studies are, partly, covered by these blog posts btw:
• http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-
darkest-depths-of-the-interwebs/
• http://idibon.com/run-fast-as-you-can-likeagirl-advocates-and-
brand-campaign-roi/
• http://idibon.com/idibon-supports-unicef-provide-natural-language-
processing-sms-based-social-monitoring-systems-africa/

What’s ahead
• Quick overview of sentiment analysis
• It’s tricky
• And limited
• Can we do more?
• Yep
• Case studies
• Detecting toxicity/supportiveness of Reddit communities
• Understanding the effectiveness of Always’ #LikeAGirl
campaign
• Routing text messages to different groups in UNICEF

We are not
robots

Though
automation
makes our
lives easier

Referential

Persuasive

Expressive

How do you feel?

13 expert polarity lexicons
Words on 2 or more
= 10,592 affective words

We don’t
stand still

Yasssssss!

Snug as a bug in a rug

4 billion web pages
20 million candidates
1-10 words each
178,104 polarity phrases

(In English)
(only)

Dutch
tet
“Underscores the polarity
of the clause and expresses
either irritation or surprise,
as if he or she had
expected the opposite
state of affairs”

Tongan
si’i and si’a
Different determiners
(~the, that, etc) express
sympathy

Cantonese
-k at the end of particles
“An emotion intensifier”

95% of the world’s conversations
are not in English

Different domains have different proportions
0% 10% 20% 30% 40% 50% 60% 70%
Positive
Negative
Conflict
Neutral
Restaurants
Laptops

“Okay, okay. Sentiment is complicated”

Real question: Can you take action?

How is sentiment for particular categories?
0% 10% 20% 30% 40% 50% 60% 70% 80%
Positive
Negative
Anecdotes
Ambience
Service
Price
Food

Setting the bar—at a minimum:
Accuracy
(which is tied to your training data)
+
An ability to do something

BEYOND SENTIMENT

What would you do with unlimited human analysts?
You’d ask them to classify messages into categories that
enable you to take action.
Machine learning models with humans-in-the-loop can
power sophisticated classification.

Toxicity > sentiment
• People don’t like things; they talk about them
• Negative comments aren’t the same as toxic comments
• Negative can be constructive
• Finding hateful and hate-inciting speech—that’s
important
• To keep people safe
• To keep communities healthy

The importance of definition
• If people can’t agree on what’s-in and what’s-out, it’s
hard to train a machine

Wait a sec! Aren’t these ducks?
(Can we agree to disagree?)

The importance of definition
• If people can’t agree on what’s-in and what’s-out, it’s
hard to train a machine
• In our case toxicity was defined as:
• ad hominem attacks (directed at specific people)
• bigoted comments (e.g., sexist, racist, homophobic, etc)
• Set definitions
• Then see if people are consistent
• Run pilots
• Do inter-annotator agreement
• Iterate

Sentiment is not IRRELEVANT
• A lot of comments are Neutral
• So that doesn’t teach us much about hate speech
• And we’ll waste a lot of time and money getting training
data on Neutral
• So we ran an experiment:
• Annotate random data
• Annotate stuff that our sentiment models say is Negative

Work savings!
• Items chosen for review based on our sentiment model
were MUCH more likely to be toxic or supportive
• A decrease of 96% of effort

Analyst time savings is a key benefit
73%
83%
88%
80%
91%
81%
87%
85%
90%
99%
% analyst time saved
% accuracy (compared to humans)
Finding relevant business articles
News category 1 News category 2 Health sciences
News category 4 Manufacturing

Okay back to community health

Finding healthy communities (supportive)

And unhealthy ones (toxic)

Unstructured data gets structured (bonus: a
system that gets smarter over time)
Adaptive System
Machine
Learning
Optimization
Human
Annotation
Prediction
Engine
Structured Data Reports
Action

By structuring text, you can do all kinds of
visualizations

Learning more about ad campaigns than just “people
liked it”: #LikeAGirl

The most re-shared #LikeAGirl post

60 second ad
= ~ $9 million
114.4 million viewers
= ~ $0.08 per viewer

Always only spent 30%
of what Anheuser-Busch did
But they had twice the tweets

Not all sharers and resharers
are of equal value

Influencers extend the brand a lot

Posts by brand and ad advocates
reach twice as far as posts by @Always

If we lumped everyone who used #LikeAGirl together
We wouldn’t know the difference between
People talking about the ad (and products)
And people talking about the cause

Antagonists mainly posted their sexist content to #LikeABoy
Defenders overwhelmed them with 3-4 times the content (yay!)

Positive sentiment would lump everyone together
And negative sentiment would lump
Antagonists (sexists)
in with
Defenders (anti-sexist)

Routing messages that matter

Processing millions of SMS in 12 African languages
Intent of sender
(i.e. report a problem, ask
a question or make a
suggestion)
Categorization
(i.e. orphans and
vulnerable children,
violence against children,
health, nutrition)
Language detection
(i.e. English, Acholi,
Karamojong, Luganda,
Nkole, Swahili, Lango)
Location
(i.e. village names)

1.4%

Top 3 categories in Nigeria
9.69%
17.68%
39.44%
Employment
U-report support
Health

Quick conclusion
• Sentiment analysis is pretty rudimentary
• On its own, it rarely answers key business questions
• Though it IS automatic and scalable
• Think of it as an example of natural language processing
• There’s a lot more you can do
• The key is formulating specific questions
• And training the system on RELEVANT data
• For this, you’ll need to optimize humans

email tyler@idibon.com
twitter @idibon
www idibon.com
THANK YOU!

Accuracy of ~20 teams
Restaurant
categories (F-score)
Restaurant category
polarity (F-score)
Top score 88.57 82.92
Median 74.24 69.75
Baseline (~ “let’s
always guess the
most popular
category)
68.89 64.09
We care about overall accuracy, so we need to multiply how often the right category goes
with the right polarity.

95% of the world’s conversations are not in
English. Idibon covers 99% of the world’s GDP.
 Rapidly tag and filter your chosen topics
and criteria in any language
 Monitor how people respond to your brand
differently around the world
 One unified system versus data cobbled together
from disparate systems
Idibon works with:
English, Portuguese (Brazilian and
from Portugal), Spanish, Italian,
French, Russian, German, Turkish,
Arabic, Japanese, Greek,
Mandarin Chinese, Persian,
Polish, Dutch, Swedish, Serbian,
Romanian, Korean, Hungarian,
Bulgarian, Hindi, Croatian, Czech,
Ukrainian, Finnish, Hebrew, Urdu,
Catalan, Slovak, Indonesian,
Malay, Vietnamese, Bengali, Thai,
Navajo, Latvian, Estonian,
Lithuanian, Kurdish, Yoruba,
Amharic, Zulu, Hausa, Kazakh,
Sindhi, Punjabi, Tagalog,
Cebuano, Danish and Emoji.

Navajo
=go
Emotional evaluation in
narrative

Sentiment is just a stepping stone: Getting more out of Natural Language Processing/Machine Learning

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Sentiment is just a stepping stone: Getting more out of Natural Language Processing/Machine Learning

Editor's Notes