The document discusses how MeaningCloud's customization tools can improve the accuracy of text analysis. It describes how precision and recall work and how custom dictionaries and classification models tailored to specific domains can boost accuracy levels. The webinar demonstrates MeaningCloud's graphical tools for creating custom entities, concepts, and models to analyze topics, sentiments, and mentions within text according to the customized resources. MeaningCloud aims to make high-quality semantic analysis affordable and customizable for both technical and non-technical users.
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
1. Better Text Analytics with
MeaningCloud
How our customization tools can boost text
analysis accuracy
Webinar - Daedalus / MeaningCloud, May 14, 2015
2. Introduction
Presenter
Logistics
Send text questions, or
“Raise your hand” to speak and we’ll open your mic
Will publish link to recorded webinar
Jarred McGinnis, PhD
Business Development, UK
3. Agenda
Text analytics: accuracy, precision, recall
Customized linguistic resources for improved accuracy
MeaningCloud customization tools
Conclusions, Q&A
4. Text analytics
Extract meaning and actionable insights from unstructured content
Automatization of costly manual activities
Opinions
Facts
Concepts
Organizations
People
Semantic
Analysis
Relationships
Themes
5. Just how precise is precise?
Precision is relative
Even experts aren’t 100% precise
Tests involving human analysts: 85-95% agreement
Along with precision, recall is also important
High precision
High recall
High precision
Low recall
Low precision
High recall
Identified by algorithm
6. Accuracy: precision & recall
Precision and recall are inversely related
Trade-off needed
Requirements are application-specific
Brand monitoring in social media: high precision, low recall
Counter-terrorism : high recall, low precision
Precision – Recall Curve
7. State of the Art for Text Analysis
Precision Measurements
Topic Extraction: 70-85%
Classification: 70-80%
Sentiment Analysis: 60-70%
Quality improvement depends on the adaptation of the tools and
resources to the application / task
13. VoC / Customer Insights scenario
Social networks, forums
Survey verbatims
encuestas
Contact Center interactions:
voice, email…
Structure and
extract meaning
What companies/
brands are they mentioning?
What are they
talking about?
What’s their opinion?
Analysis
Insights
14. Opinions
The sentence “The
highest interest rate in
industry!” is…
Positive, if talking
about savings
Negative, if talking
about mortgages
Customized linguistic resources improve
accuracy
Mentions
Names of banks and
financial companies, e.g.,
Citibank, BBVA
Product names, e.g., Your
Waysm Account. Compass
Account…
Themes
Example: analysis of a bank’s customer opinions
Products
Accounts
Checkin
g
Savings
Borrowing
Credit
Mortgage
Channel
Office
Phone
Internet
18. Creating a new dictionary
Possible to import
dictionary from file
19. Creating a new entity
Aliases: It is NOT
necessary to explicitly
include “trivial”
aliases as the engine
generates typical
variants
Use your own
ontology
Possible to include
additional semantic
info
26. Defining a new category:
hybrid approach
Rule-based
Training-based
Possible to opt for one of
the approaches, or to
combine both, depending on
the application
27. Defining a category: training
Fed with precodified training texts
Based on machine learning technology
28. Defining a category: rules
Terms that
Are indispensable
Are banned
Increase relevance
Reduce relevance
29. Improving precision and recall using rules and
training
Statistical Rules Hybrid
Benefits Fast, provided tagged
texts are available
Good accuracy for long
texts
No false positives
Very good accuracy for
limited environments
Can be easily started with
training texts
Does not need exhaustive
definition of rules
Dis-
advantages
“Black box” approach
False positives difficult
to correct
Bias in results,
depending on training
Costly if starting from
zero
False negatives,
depending on rule
quality
Difficult to scale
Requires deep domain
knowledge
34. Sentiment: use of custom entity and concept
dictionaries
Polarity associated with Barclays
35. Custom sentiment dictionaries (COMING SOON)
Not all terms have the same polarity in all domains
E.g., in the luxury goods’ domain the term “cheap” doesn’t necessarily have a positive
polarity (like in other domains)
Define a luxury goods custom sentiment dictionary where: “cheap” N
A given term can have different polarities, according to context
We’re presently testing this feature. If you want to take part in the private beta send an
email to support@meaningcloud.com
Term Context Polarity
close stock market NEUTRAL
close deal, contract Pos
close company Neg
36. Conclusions
How to improve accuracy?
Graphical tools
Possibility to include own dictionaries and models
Broad coverage: mentions, themes, opinions…
Empowered users
High accuracy analysis is within your reach.
37. Democratizing the extraction of meaning
High quality semantic
analysis
Optimized technology mix
Continuously updates semantic
resources
High-level APIs, e.g., Corporate
Reputation
Customizable to customer
domain: models, dictionaries,
sentiment
Affordable, no risks
Mature, tested technology
Test and use for FREE
(40,000 requests per
month)
Pay per use
No commitment or
permanence
Commercial plans beginning
at $99 /mo
For developers and non
technical users
Add-in for Excel
Standard web services
APIs
Plug-ins and SDKs for
diverse environments and
languages
Plug-and-play approach
OpinionesTemas
Hechos
Conceptos
Organizaciones
Personas
Relaciones
38. Thank you for your attention!
Questions, suggestions...
Jarred McGinnis, PhD
Business Development, UK
jarred@daedalus.es
http://www.meaningcloud.com
http://www.daedalus.es