SlideShare a Scribd company logo
1 of 77
Social Media & Web Analytics Innovation
Sentiment is just a stepping stone
Social Media & Web Analytics Innovation
Hello online viewers of this slide deck!
• A lot of the content here is visual—you’ll want to download
the full presentation and read the notes fields
• You can also (soon) find the video version by looking at the
Social Media & Web Analytics Innovation site
• You can also stay tuned for more content by checking out
our blog: http://www.idibon.com/blog
• The case studies are, partly, covered by these blog posts btw:
• http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-
darkest-depths-of-the-interwebs/
• http://idibon.com/run-fast-as-you-can-likeagirl-advocates-and-
brand-campaign-roi/
• http://idibon.com/idibon-supports-unicef-provide-natural-language-
processing-sms-based-social-monitoring-systems-africa/
Social Media & Web Analytics Innovation
What’s ahead
• Quick overview of sentiment analysis
• It’s tricky
• And limited
• Can we do more?
• Yep
• Case studies
• Detecting toxicity/supportiveness of Reddit communities
• Understanding the effectiveness of Always’ #LikeAGirl
campaign
• Routing text messages to different groups in UNICEF
Social Media & Web Analytics Innovation
We are not
robots
Social Media & Web Analytics Innovation
Though
automation
makes our
lives easier
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Referential
Social Media & Web Analytics Innovation
Persuasive
Social Media & Web Analytics Innovation
Expressive
Social Media & Web Analytics Innovation
How do you feel?
Social Media & Web Analytics Innovation
13 expert polarity lexicons
Words on 2 or more
= 10,592 affective words
Social Media & Web Analytics Innovation
We don’t
stand still
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Yasssssss!
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Snug as a bug in a rug
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
4 billion web pages
20 million candidates
1-10 words each
178,104 polarity phrases
Social Media & Web Analytics Innovation
(In English)
(only)
Social Media & Web Analytics Innovation
Dutch
tet
“Underscores the polarity
of the clause and expresses
either irritation or surprise,
as if he or she had
expected the opposite
state of affairs”
Social Media & Web Analytics Innovation
Tongan
si’i and si’a
Different determiners
(~the, that, etc) express
sympathy
Social Media & Web Analytics Innovation
Cantonese
-k at the end of particles
“An emotion intensifier”
Social Media & Web Analytics Innovation
95% of the world’s conversations
are not in English
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Different domains have different proportions
0% 10% 20% 30% 40% 50% 60% 70%
Positive
Negative
Conflict
Neutral
Restaurants
Laptops
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
“Okay, okay. Sentiment is complicated”
Social Media & Web Analytics Innovation
Real question: Can you take action?
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
How is sentiment for particular categories?
0% 10% 20% 30% 40% 50% 60% 70% 80%
Positive
Negative
Anecdotes
Ambience
Service
Price
Food
Social Media & Web Analytics Innovation
Setting the bar—at a minimum:
Accuracy
(which is tied to your training data)
+
An ability to do something
Social Media & Web Analytics Innovation
BEYOND SENTIMENT
Social Media & Web Analytics Innovation
What would you do with unlimited human analysts?
You’d ask them to classify messages into categories that
enable you to take action.
Machine learning models with humans-in-the-loop can
power sophisticated classification.
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Toxicity > sentiment
• People don’t like things; they talk about them
• Negative comments aren’t the same as toxic comments
• Negative can be constructive
• Finding hateful and hate-inciting speech—that’s
important
• To keep people safe
• To keep communities healthy
Social Media & Web Analytics Innovation
The importance of definition
• If people can’t agree on what’s-in and what’s-out, it’s
hard to train a machine
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Wait a sec! Aren’t these ducks?
(Can we agree to disagree?)
Social Media & Web Analytics Innovation
The importance of definition
• If people can’t agree on what’s-in and what’s-out, it’s
hard to train a machine
• In our case toxicity was defined as:
• ad hominem attacks (directed at specific people)
• bigoted comments (e.g., sexist, racist, homophobic, etc)
• Set definitions
• Then see if people are consistent
• Run pilots
• Do inter-annotator agreement
• Iterate
Social Media & Web Analytics Innovation
Sentiment is not IRRELEVANT
• A lot of comments are Neutral
• So that doesn’t teach us much about hate speech
• And we’ll waste a lot of time and money getting training
data on Neutral
• So we ran an experiment:
• Annotate random data
• Annotate stuff that our sentiment models say is Negative
Social Media & Web Analytics Innovation
Work savings!
• Items chosen for review based on our sentiment model
were MUCH more likely to be toxic or supportive
• A decrease of 96% of effort
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Analyst time savings is a key benefit
73%
83%
88%
80%
91%
81%
87%
85%
90%
99%
% analyst time saved
% accuracy (compared to humans)
Finding relevant business articles
News category 1 News category 2 Health sciences
News category 4 Manufacturing
Social Media & Web Analytics Innovation
Okay back to community health
Social Media & Web Analytics Innovation
Finding healthy communities (supportive)
Social Media & Web Analytics Innovation
And unhealthy ones (toxic)
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Unstructured data gets structured (bonus: a
system that gets smarter over time)
Adaptive System
Machine
Learning
Optimization
Human
Annotation
Prediction
Engine
Structured Data Reports
Action
Social Media & Web Analytics Innovation
By structuring text, you can do all kinds of
visualizations
Social Media & Web Analytics Innovation
Learning more about ad campaigns than just “people
liked it”: #LikeAGirl
Social Media & Web Analytics Innovation
The most re-shared #LikeAGirl post
Social Media & Web Analytics Innovation
60 second ad
= ~ $9 million
114.4 million viewers
= ~ $0.08 per viewer
Social Media & Web Analytics Innovation
Always only spent 30%
of what Anheuser-Busch did
But they had twice the tweets
Social Media & Web Analytics Innovation
Not all sharers and resharers
are of equal value
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Influencers extend the brand a lot
Social Media & Web Analytics Innovation
Posts by brand and ad advocates
reach twice as far as posts by @Always
Social Media & Web Analytics Innovation
If we lumped everyone who used #LikeAGirl together
We wouldn’t know the difference between
People talking about the ad (and products)
And people talking about the cause
Social Media & Web Analytics Innovation
Antagonists mainly posted their sexist content to #LikeABoy
Defenders overwhelmed them with 3-4 times the content (yay!)
Social Media & Web Analytics Innovation
Positive sentiment would lump everyone together
And negative sentiment would lump
Antagonists (sexists)
in with
Defenders (anti-sexist)
Social Media & Web Analytics Innovation
Routing messages that matter
Social Media & Web Analytics Innovation
Processing millions of SMS in 12 African languages
Intent of sender
(i.e. report a problem, ask
a question or make a
suggestion)
Categorization
(i.e. orphans and
vulnerable children,
violence against children,
health, nutrition)
Language detection
(i.e. English, Acholi,
Karamojong, Luganda,
Nkole, Swahili, Lango)
Location
(i.e. village names)
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
1.4%
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Top 3 categories in Nigeria
9.69%
17.68%
39.44%
Employment
U-report support
Health
Social Media & Web Analytics Innovation
Quick conclusion
• Sentiment analysis is pretty rudimentary
• On its own, it rarely answers key business questions
• Though it IS automatic and scalable
• Think of it as an example of natural language processing
• There’s a lot more you can do
• The key is formulating specific questions
• And training the system on RELEVANT data
• For this, you’ll need to optimize humans
Social Media & Web Analytics Innovation
email tyler@idibon.com
twitter @idibon
www idibon.com
THANK YOU!
Social Media & Web Analytics Innovation
Accuracy of ~20 teams
Restaurant
categories (F-score)
Restaurant category
polarity (F-score)
Top score 88.57 82.92
Median 74.24 69.75
Baseline (~ “let’s
always guess the
most popular
category)
68.89 64.09
We care about overall accuracy, so we need to multiply how often the right category goes
with the right polarity.
Social Media & Web Analytics Innovation
95% of the world’s conversations are not in
English. Idibon covers 99% of the world’s GDP.
 Rapidly tag and filter your chosen topics
and criteria in any language
 Monitor how people respond to your brand
differently around the world
 One unified system versus data cobbled together
from disparate systems
Idibon works with:
English, Portuguese (Brazilian and
from Portugal), Spanish, Italian,
French, Russian, German, Turkish,
Arabic, Japanese, Greek,
Mandarin Chinese, Persian,
Polish, Dutch, Swedish, Serbian,
Romanian, Korean, Hungarian,
Bulgarian, Hindi, Croatian, Czech,
Ukrainian, Finnish, Hebrew, Urdu,
Catalan, Slovak, Indonesian,
Malay, Vietnamese, Bengali, Thai,
Navajo, Latvian, Estonian,
Lithuanian, Kurdish, Yoruba,
Amharic, Zulu, Hausa, Kazakh,
Sindhi, Punjabi, Tagalog,
Cebuano, Danish and Emoji.
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Social Media & Web Analytics Innovation
Navajo
=go
Emotional evaluation in
narrative

More Related Content

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Sentiment is just a stepping stone: Getting more out of Natural Language Processing/Machine Learning

  • 1. Social Media & Web Analytics Innovation Sentiment is just a stepping stone
  • 2. Social Media & Web Analytics Innovation Hello online viewers of this slide deck! • A lot of the content here is visual—you’ll want to download the full presentation and read the notes fields • You can also (soon) find the video version by looking at the Social Media & Web Analytics Innovation site • You can also stay tuned for more content by checking out our blog: http://www.idibon.com/blog • The case studies are, partly, covered by these blog posts btw: • http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the- darkest-depths-of-the-interwebs/ • http://idibon.com/run-fast-as-you-can-likeagirl-advocates-and- brand-campaign-roi/ • http://idibon.com/idibon-supports-unicef-provide-natural-language- processing-sms-based-social-monitoring-systems-africa/
  • 3. Social Media & Web Analytics Innovation What’s ahead • Quick overview of sentiment analysis • It’s tricky • And limited • Can we do more? • Yep • Case studies • Detecting toxicity/supportiveness of Reddit communities • Understanding the effectiveness of Always’ #LikeAGirl campaign • Routing text messages to different groups in UNICEF
  • 4. Social Media & Web Analytics Innovation We are not robots
  • 5. Social Media & Web Analytics Innovation Though automation makes our lives easier
  • 6. Social Media & Web Analytics Innovation
  • 7. Social Media & Web Analytics Innovation Referential
  • 8. Social Media & Web Analytics Innovation Persuasive
  • 9. Social Media & Web Analytics Innovation Expressive
  • 10. Social Media & Web Analytics Innovation How do you feel?
  • 11. Social Media & Web Analytics Innovation 13 expert polarity lexicons Words on 2 or more = 10,592 affective words
  • 12. Social Media & Web Analytics Innovation We don’t stand still
  • 13. Social Media & Web Analytics Innovation
  • 14. Social Media & Web Analytics Innovation Yasssssss!
  • 15. Social Media & Web Analytics Innovation
  • 16. Social Media & Web Analytics Innovation Snug as a bug in a rug
  • 17. Social Media & Web Analytics Innovation
  • 18. Social Media & Web Analytics Innovation 4 billion web pages 20 million candidates 1-10 words each 178,104 polarity phrases
  • 19. Social Media & Web Analytics Innovation (In English) (only)
  • 20. Social Media & Web Analytics Innovation Dutch tet “Underscores the polarity of the clause and expresses either irritation or surprise, as if he or she had expected the opposite state of affairs”
  • 21. Social Media & Web Analytics Innovation Tongan si’i and si’a Different determiners (~the, that, etc) express sympathy
  • 22. Social Media & Web Analytics Innovation Cantonese -k at the end of particles “An emotion intensifier”
  • 23. Social Media & Web Analytics Innovation 95% of the world’s conversations are not in English
  • 24. Social Media & Web Analytics Innovation
  • 25. Social Media & Web Analytics Innovation Different domains have different proportions 0% 10% 20% 30% 40% 50% 60% 70% Positive Negative Conflict Neutral Restaurants Laptops
  • 26. Social Media & Web Analytics Innovation
  • 27. Social Media & Web Analytics Innovation “Okay, okay. Sentiment is complicated”
  • 28. Social Media & Web Analytics Innovation Real question: Can you take action?
  • 29. Social Media & Web Analytics Innovation
  • 30. Social Media & Web Analytics Innovation
  • 31. Social Media & Web Analytics Innovation How is sentiment for particular categories? 0% 10% 20% 30% 40% 50% 60% 70% 80% Positive Negative Anecdotes Ambience Service Price Food
  • 32. Social Media & Web Analytics Innovation Setting the bar—at a minimum: Accuracy (which is tied to your training data) + An ability to do something
  • 33. Social Media & Web Analytics Innovation BEYOND SENTIMENT
  • 34. Social Media & Web Analytics Innovation What would you do with unlimited human analysts? You’d ask them to classify messages into categories that enable you to take action. Machine learning models with humans-in-the-loop can power sophisticated classification.
  • 35. Social Media & Web Analytics Innovation
  • 36. Social Media & Web Analytics Innovation
  • 37. Social Media & Web Analytics Innovation
  • 38. Social Media & Web Analytics Innovation Toxicity > sentiment • People don’t like things; they talk about them • Negative comments aren’t the same as toxic comments • Negative can be constructive • Finding hateful and hate-inciting speech—that’s important • To keep people safe • To keep communities healthy
  • 39. Social Media & Web Analytics Innovation The importance of definition • If people can’t agree on what’s-in and what’s-out, it’s hard to train a machine
  • 40. Social Media & Web Analytics Innovation
  • 41. Social Media & Web Analytics Innovation Wait a sec! Aren’t these ducks? (Can we agree to disagree?)
  • 42. Social Media & Web Analytics Innovation The importance of definition • If people can’t agree on what’s-in and what’s-out, it’s hard to train a machine • In our case toxicity was defined as: • ad hominem attacks (directed at specific people) • bigoted comments (e.g., sexist, racist, homophobic, etc) • Set definitions • Then see if people are consistent • Run pilots • Do inter-annotator agreement • Iterate
  • 43. Social Media & Web Analytics Innovation Sentiment is not IRRELEVANT • A lot of comments are Neutral • So that doesn’t teach us much about hate speech • And we’ll waste a lot of time and money getting training data on Neutral • So we ran an experiment: • Annotate random data • Annotate stuff that our sentiment models say is Negative
  • 44. Social Media & Web Analytics Innovation Work savings! • Items chosen for review based on our sentiment model were MUCH more likely to be toxic or supportive • A decrease of 96% of effort
  • 45. Social Media & Web Analytics Innovation
  • 46. Social Media & Web Analytics Innovation Analyst time savings is a key benefit 73% 83% 88% 80% 91% 81% 87% 85% 90% 99% % analyst time saved % accuracy (compared to humans) Finding relevant business articles News category 1 News category 2 Health sciences News category 4 Manufacturing
  • 47. Social Media & Web Analytics Innovation Okay back to community health
  • 48. Social Media & Web Analytics Innovation Finding healthy communities (supportive)
  • 49. Social Media & Web Analytics Innovation And unhealthy ones (toxic)
  • 50. Social Media & Web Analytics Innovation
  • 51. Social Media & Web Analytics Innovation Unstructured data gets structured (bonus: a system that gets smarter over time) Adaptive System Machine Learning Optimization Human Annotation Prediction Engine Structured Data Reports Action
  • 52. Social Media & Web Analytics Innovation By structuring text, you can do all kinds of visualizations
  • 53. Social Media & Web Analytics Innovation Learning more about ad campaigns than just “people liked it”: #LikeAGirl
  • 54. Social Media & Web Analytics Innovation The most re-shared #LikeAGirl post
  • 55. Social Media & Web Analytics Innovation 60 second ad = ~ $9 million 114.4 million viewers = ~ $0.08 per viewer
  • 56. Social Media & Web Analytics Innovation Always only spent 30% of what Anheuser-Busch did But they had twice the tweets
  • 57. Social Media & Web Analytics Innovation Not all sharers and resharers are of equal value
  • 58. Social Media & Web Analytics Innovation
  • 59. Social Media & Web Analytics Innovation Influencers extend the brand a lot
  • 60. Social Media & Web Analytics Innovation Posts by brand and ad advocates reach twice as far as posts by @Always
  • 61. Social Media & Web Analytics Innovation If we lumped everyone who used #LikeAGirl together We wouldn’t know the difference between People talking about the ad (and products) And people talking about the cause
  • 62. Social Media & Web Analytics Innovation Antagonists mainly posted their sexist content to #LikeABoy Defenders overwhelmed them with 3-4 times the content (yay!)
  • 63. Social Media & Web Analytics Innovation Positive sentiment would lump everyone together And negative sentiment would lump Antagonists (sexists) in with Defenders (anti-sexist)
  • 64. Social Media & Web Analytics Innovation Routing messages that matter
  • 65. Social Media & Web Analytics Innovation Processing millions of SMS in 12 African languages Intent of sender (i.e. report a problem, ask a question or make a suggestion) Categorization (i.e. orphans and vulnerable children, violence against children, health, nutrition) Language detection (i.e. English, Acholi, Karamojong, Luganda, Nkole, Swahili, Lango) Location (i.e. village names)
  • 66. Social Media & Web Analytics Innovation
  • 67. Social Media & Web Analytics Innovation 1.4%
  • 68. Social Media & Web Analytics Innovation
  • 69. Social Media & Web Analytics Innovation Top 3 categories in Nigeria 9.69% 17.68% 39.44% Employment U-report support Health
  • 70. Social Media & Web Analytics Innovation Quick conclusion • Sentiment analysis is pretty rudimentary • On its own, it rarely answers key business questions • Though it IS automatic and scalable • Think of it as an example of natural language processing • There’s a lot more you can do • The key is formulating specific questions • And training the system on RELEVANT data • For this, you’ll need to optimize humans
  • 71. Social Media & Web Analytics Innovation email tyler@idibon.com twitter @idibon www idibon.com THANK YOU!
  • 72. Social Media & Web Analytics Innovation Accuracy of ~20 teams Restaurant categories (F-score) Restaurant category polarity (F-score) Top score 88.57 82.92 Median 74.24 69.75 Baseline (~ “let’s always guess the most popular category) 68.89 64.09 We care about overall accuracy, so we need to multiply how often the right category goes with the right polarity.
  • 73. Social Media & Web Analytics Innovation 95% of the world’s conversations are not in English. Idibon covers 99% of the world’s GDP.  Rapidly tag and filter your chosen topics and criteria in any language  Monitor how people respond to your brand differently around the world  One unified system versus data cobbled together from disparate systems Idibon works with: English, Portuguese (Brazilian and from Portugal), Spanish, Italian, French, Russian, German, Turkish, Arabic, Japanese, Greek, Mandarin Chinese, Persian, Polish, Dutch, Swedish, Serbian, Romanian, Korean, Hungarian, Bulgarian, Hindi, Croatian, Czech, Ukrainian, Finnish, Hebrew, Urdu, Catalan, Slovak, Indonesian, Malay, Vietnamese, Bengali, Thai, Navajo, Latvian, Estonian, Lithuanian, Kurdish, Yoruba, Amharic, Zulu, Hausa, Kazakh, Sindhi, Punjabi, Tagalog, Cebuano, Danish and Emoji.
  • 74. Social Media & Web Analytics Innovation
  • 75. Social Media & Web Analytics Innovation
  • 76. Social Media & Web Analytics Innovation
  • 77. Social Media & Web Analytics Innovation Navajo =go Emotional evaluation in narrative

Editor's Notes

  1. (Part of what happens in this talk is: whoa, people and language are complicated)
  2. One of the reasons we like things like sentiment analysis is because they are scalable/automatic.
  3. But, uh, machines have their limitations
  4. This slide and the next two talk about three aspects of language. We might expect “referential” to be the easiest for machines…but think about reference. Sometimes we talk about “The JW Marriott in San Francisco” but we also talk about “the Marriott” and “here” and “there” all meaning the same thing. Doh.
  5. One aspect of language is that we use it with audiences…who we often mean to influence.
  6. And we of course express ourselves.
  7. There are a lot of feelings, so there are a lot of ways of talking about them. From “wefeelfine.org”.
  8. http://purl.stanford.edu/fm335ct1355 for full context of this (lists assembled by various computational linguists, psychologists, and others) The basic point: there are a fair number of words that seem to carry emotional content
  9. Language is tricky because people do all sorts of stuff.
  10. Language CHANGE is one of the chief rules of language.
  11. This is not a word/spelling that has always existed. But it clearly conveys sentiment.
  12. But what about longer phrases? “Doctors” “ordering” things is usually bad but “just what the doctor ordered” is a positive sentiment.
  13. Perhaps “snug” is positive, but “bug” is usually bad. “Snug as a bug” or “Snug as a bug in a rug” are positive.
  14. (Or pug in an ugg on the rug)
  15. Velikovich et al (2010)
  16. (Craenenbroeck & Haegeman, 2007, p. 175) Languages do things differently.
  17. This is mostly interesting because if you just took a dictionary of emotional terms in English and translated them into Tongan to get a Tongan list…you’d fail. You might get “happy” but you probably wouldn’t try to get determiners and yet, here they are, being emotional. Btw, there is evidence that this and that in English DO carry some affective meaning: https://corplinguistics.wordpress.com/2011/11/17/who-is-the-sarah-palin-of-the-canterbury-tales/ Different determiners (the, that, etc) express sympathy to the DP they head (Hendrick, 2005)
  18. The –k that’s appended to Cantonese particles is an “emotion intensifier” (Sybesma & Li, 2007). If you’re a Cantonese speaker, take a look at the particles in the table on page 4 here: http://linguistics.berkeley.edu/~herman/documents/CantoneseFPs_SyntaxCircle_Leung.pdf This is an example of a sound that is PART of words. That’s also tricky if you want to do translations. Basically: don’t do translations. Work in the language with training data provided by native speakers who understand cultural context.
  19. Making a bunch of lists is not practical for every language (or every category) you care about. You can use machine learning but the key will be creating training data in a smart way.
  20. From SemEval 2014 http://www.aclweb.org/anthology/S/S14/S14-2.pdf
  21. Genres make a difference. People speak differently in different contexts. It’s not that different domains/genres/contexts have nothing to do with each other but conventions can vary a lot. This is part of why you want to train on the data you care about. You do not, to think about the last slide, want to use laptop reviews to predict restaurant reviews.
  22. People express sentiment differently in different domains. For example, it’s useful to understand the emotional state of people calling or emailing customer support—but people doing that aren’t expressing the whole range of emotions. Mainly you get rage and resignation. You don’t often get much in the way of joy or pride. So they are mostly unhappy. What is sentiment analysis going to tell you? You minimally need to figure out WHAT they are unhappy about.
  23. You want to route the right content to the right person. It may be that Lily is the right person to route furious people to but not the best person to route resigned people to. Or maybe you should optimize on something other than emotion—maybe type of issue.
  24. From SemEval 2014 http://www.aclweb.org/anthology/S/S14/S14-2.pdf Notice that people are very positive about food, but negative about service. Check out Dan Jurafsky’s work on Yelp reviews—1 star reviews are never about the food, it’s always about traumatic service. (Literally: people use the language of trauma!)
  25. First case study! http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/
  26. Lately, Reddit has gotten a lot of press for having terrible, awful communities
  27. It’s a topic that even redditors talk about (qualitatively—we wanted to answer it quantitatively)
  28. http://blog.ioactive.com/2013/05/security-101-machine-learning-and-big.html
  29. The important thing is having definitions people will agree with and can be consistent with…and which actually answer organizational objectives. Do you care about whether duck decoys and/or rubber duckies are ducks or not? WHY? http://blog.ioactive.com/2013/05/security-101-machine-learning-and-big.html
  30. The trickiest thing about ad hominem attacks as a definition is: what to do with trash talk in sports/gaming. Tricky!
  31. Let’s quickly step out to talk about one of the major values of using natural language processing: workload reduction. In this quick tangent, we’re looking at classifying business news as Relevant or Irrelevant for one of our clients (anonymized!)
  32. As you can see—different categories have different results. News category 1 is awesome—you really don’t have to show human analysts much data to get all the Relevant stuff (you show them 10% of the data and still get 99% of what the client cares about) Manufacturing is less awesome. You can reduce your workload to just 73% of what it was…but you have to accept that you’ll only get 83% of the stuff you care about (you’ll miss 17%). If you want to get more like 90% accuracy, you need to review more documents. You “only” get a workload reduction of ~56%. Ideally, you want a system that gets better over time.
  33. This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/ The DIY (do it yourself) group is the one that is most supportive and least toxic. This data ties to actual upvote/downvote behavior. Meaning that you’re not actually a supportive community if everyone down votes the supportive comments, nor are you a toxic community if everyone downvotes the toxic comments.
  34. This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/ It’s only when everyone upvotes toxic comments that you are a toxic community by our definition here.
  35. We also specifically looked at bigotry. Indeed, /r/TheRedPill, is seen as the most bigoted. It’s a subreddit dedicated to proud male chauvinism. This is interactive, check out: http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/
  36. This is the basic stuff you want. (It’s a little self-serving because Idibon’s adaptive system is what makes us special but we really do believe that optimizing training on relevant data with meaningful categories is THE way to deliver business value.) By using computers to create an initial understanding of data and elevate specific cases for Human Annotation, we use computers to make human decisions smarter, and humans to make computer decisions smarter. Our system optimizes work by using cutting edge Machine Learning that improves accuracy and learns iteratively. Our Prediction Engine provides initial conclusions for further evaluation by human analysts and is also what allows us to scale ten of millions messages a day. Our Optimization process teaches our algorithm what results to select for, essentially refining its accuracy. The key take away here is that we optimize for human analysts time; we can cluster data initially and automatically, then we can escalate specific cases to human annotation. Much of the learning is unsupervised and therefore faster, cheaper and actually more accurate. After iterations in our adaptive system, previously unstructured data is now structured. This structured data can be delivered in different outputs, including CSV file exports for your analysts to build reports or direct routing to customer service agents to take action.
  37. Once you have structured your unstructured data, it’s easy to visualize it—in plot.ly as before, in Excel, or as here, in Tableau (this visualization took 15 minutes for someone who had never used Tableau at all to produce from our data).
  38. You can identify advocates and measure success of marketing campaigns using natural language processing. Idibon analyzed the #likeagirl campaign, which Always ran during and after this year’s Super Bowl. (GO WATCH IT IF YOU HAVEN’T!) Case study two: http://idibon.com/run-fast-as-you-can-likeagirl-advocates-and-brand-campaign-roi/
  39. This information is easy to extract. It doesn’t require any Natural Language Processing.
  40. Real money is on the line
  41. And there’s already some ways of charting success without sophisticated text analytics/NLP.
  42. Buuuuuut there’s more you can do with NLP.
  43. This focuses on the 2,519 most influential sharers.
  44. Look at how content creators who aren’t associated with @Always extend the brand!
  45. Case study three: http://idibon.com/idibon-supports-unicef-provide-natural-language-processing-sms-based-social-monitoring-systems-africa/ Photo: http://unicefaids.tumblr.com/post/37835112363/photo-young-people-in-kitwe-zambia-explore-the
  46. The United Nations Children’s Fund (UNICEF) is a United Nations branch that provides long-term humanitarian and developmental assistance to children and mothers in developing countries. Idibon provides scalable natural language processing and analytics to UNICEF’s multinational U-report applications, enabling UNICEF to process text messages sent from citizens in Uganda and Nigeria “to better understand and empower marginalized communities that are often excluded due to language barriers.” (Evan Wheeler, CTO of UNICEF’s Global Innovation Centre) UNICEF U-report only has six dedicated analysts to process and respond to millions of messages a month and Idibon’s technology enables the organization to operate efficiently and at scale. Specifically, Idibon processes each SMS in four ways: Intent of sender – to prioritize support/services (UNICEF receives more than a million messages a month and can only respond to about a thousand) Categorization – to prioritize support/services and to route to appropriate analyst Language detection – to route to appropriate analyst Location – to identify where to send support/services Press release: http://unicefstories.org/2015/02/09/idibon-supports-unicef-to-provide-natural-language-processing-to-sms-based-social-monitoring-systems-in-africa/
  47. Environment is an important issue. But it looks to be about 1.4% of the data…which means you do have to get enough data to build a model. Note that different countries/languages talk about the environment differently (Uganda=droughts, cows; Nigeria: oil). So you may have more or less heterogeneity in your rarer categories. Image from http://www.theatlantic.com/photo/2011/06/nigeria-the-cost-of-oil/100082/ For more recent news: http://www.theguardian.com/environment/2015/jan/07/niger-delta-communities-to-sue-shell-in-london-for-oil-spill-compensation
  48. “Environment” is clearly an important issue in Nigeria but only 1.4% of the messages are classified that way. (One other thing: high/low percentages don’t necessarily correspond to personal or societal importance.)
  49. Each needle found makes the next one easier to find, buuuuuuut some things you want to find are just too rare. You can’t model things that aren’t in the data.
  50. At UNICEF, different people care about different categories—the people who respond to rumors of ebola outbreaks or cures are different than the people trying to keep track of economic issues. Most actionable is, of course, finding people who specifically require support about participating in the community.
  51. Accuracy in opinion mining is not easy (multiply these rows)
  52. Brands want to support the full range of communications that are important to your consumers all over the world. We can offer insight to multiple languages in one system as opposed to using multiple systems then having to consolidate the findings. We do this through a network of crowdsourcing and human analysts around the world, which through annotation, teaches our algorithm to continuously improve.
  53. Distributions don’t mean what you think they do. People are very positive online. Weird. https://xkcd.com/1098/
  54. Genres (for fun)
  55. Genres (for fun)
  56. Sorry, this is very linguistic—the point here is that this marker does different things in different syntactic contexts. In Navajo, =go normally serves as a subordinate marker, but it can also appear in utterances where there is no matrix sentence. When it is used this way (as in narration), it marks emotional evaluation and background information (see Mithun (2008) on Navajo as well as other languages that have similarly behaving subordinate markers).