Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Text mining of Social Network Data for Business Intelligence - iLabs camp
1. Text Mining of Social Network Data
for Business Applications
ANKIT SHARMA, DATA SCIENCE PRACTICES, IMPETUS
2. Content
Data
Unstructured
data as
business
opportunity
Text mining
Learning from
textual data
Social media
Learning from
social media
Sentiment
analysis and
opinion
mining
Topic
modeling
Tools for text
mining
Use Cases
Hotel
review
demo
Advertising
campaign
analysis
Data Science
Practices at
Impetus
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 2
3. Data
Structured data
Tables, Records
Semi-structured
data
XML, JSON
Unstructured data
Text, Audio, Video,
conversations, Web,
Wikis, Documents,
Web logs…
Social Media data
Tweets, Blogs,
Facebook, other
social platforms
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 3
5. Unstructured data as business opportunity
“Unstructured” data such as natural language, which is distinguished from the “structured”
information found in conventional spreadsheets and databases.
Unstructured data constitutes 80% of the whole enterprise data (Gartner Research)
Unstructured text can contain business critical information, untapped opportunities and latent
risks
Example:
Consumer’s thoughts and opinions, found in communications such as emails, web pages,
reports, surveys, contracts, blogs, wikis, and reports.
Whether it’s a customer complaints, employee feedback, analyst opinions, or competitors'
intentions, this valuable and actionable information lies hidden in unstructured text repositories
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 5
6. Text mining
Text Mining integrates innovative text analytics approaches, tools and solutions to leverage the
unstructured data
Typical text mining tasks include-
Text categorization
Text clustering
Concept/entity extraction
Production of granular taxonomies
Sentiment analysis
Document summarization
Entity relation modeling
For a company, the successful management of unstructured information may lead to more
profitable decisions and business opportunities
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 6
7. Learning from text mining
Classification
Spam detection
Document
organization
Clustering
Trend analysis
Topic identification
Web mining
Trend analysis
Ontology creation
Opinion mining
Natural Language
Processing
Text summarization
Question answering
Information
extraction
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 8
8. Logical view of documents
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 9
9. Sentiment Analysis and Opinion Mining
Opinion mining, sentiment analysis, and subjectivity analysis are introduced as computational
analysis of opinion, sentiment, and subjectivity in online text
Subjectivity analysis or subjectivity classification is automatically discriminating opinion
containing text from objective text representing factual information
Sentiment analysis originated from machine learning (ML), information retrieval (IR) and
natural language processing (NLP)
Opinion Mining originated from the Web search and IR community and involved processing
search results for a given product, retrieving attributes and aggregating users’ opinions
10Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA
10. Topic Modeling
A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents
Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections
Identification of emerging topics in communities, trending topics in social media, hot topics in online discussion may be critical for
businesses
LDA (Latent Dirichlet Allocation), is a generative model that allows sets of observations to be explained by unobserved groups that
explain why some parts of the data are similar. It was developed by David Blei, Andrew Ng, and Michael Jordan in 2003.
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 11
11. Topic Modeling Cond…
“I listen to Motorhead, Pink Floyd and Metallica whenever I’m travelling in my car.”
Now topic modeling might predict this text as 75% about Music and 25% about cars
"dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will
appear in documents about cats, and "the" and "is" will appear equally in both.
What type of analysis LDA can perform:
◦ Topic identification
◦ Which topic are similar?
◦ Which documents are similar based on topic allocations
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 12
12. Social Media based Sentiment Analysis
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 13
Data from Internet and
Web 2.0
• Buzz Monitoring
• Sentiment Analysis
• Content Categorization
Trends Detection
and Recommendation
• Brand Image Monitoring
• Sentiment trends in
customer comments
• Discovering undercurrents &
recommend adjustments
• Overall Vs Service Attributes
based Sentiment Extraction
• Real-time monitoring of
consumer perceptions
• Identification of Data
sources (Twitter/ Facebook/
Discussion boards)
• Collection of consumer
expressed text
13. Solution framework
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 14
Data from Internet
and Web 2.0
Overall/ Attribute
level Sentiment
Analysis
Trends Detection
and
Recommendation
Service Attributes
Identification
Buzz Monitoring
and Summary
Report
Content
Categorization
Sentiment Trends
Summary
Classified
Content/Topics*
* Not included in this work
14. Tools for Text mining
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 15
15. Use Case
- HOTEL REVIEW
- ADVERTISING CAMPAIGN
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 16
16. Hotel Review demo
Objective:
Analyze hotel review text data
Calculating hotel’s rating based on the review sentiment analysis
Visualization of data on Maps in a web based platform with features like
zooming, clicking and hover
Design a web-based User-interface for larger data
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 17
Data:
254,574 Reviews 2560 Hotel
6 Countries
UAE, CANADA, CHINA, INDIA, UK, USA
10 Cities
Beijing, Dubai, England, Illinois, Montreal, Nevada, New
Delhi, New York City, Quebec, San Francisco, Shanghai
17. Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 18
Data Science :
LSI (Latent Semantic Indexing)
Sentiment Analysis
Part of speech tagging
Feature extraction
Feature based opinion mining
Open search : Apache Solr
Database : Apache Cassandra
Maps : Google Maps API
Sentiment score for each
hotel based on the
sentiment analysis of its
reviews by calculating the
polarity of the reviews with
positive and negative words
Hotel feature based opinion
mining for following features
– Food, Room, Location,
Service, Price, etc.
19. Advertising campaign buzz monitoring
Social Media Monitoring and Analysis for an advertising campaign
Feature-level Buzz Summary for Company name, Campaign and other
hidden features
Blogs Analysis for campaign name
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 20
Data collected from:
Tweets for 1 month
4000+ Tweets (including 1440 re-tweets)
43 Blogs and comments were crawled and analyzed
Features
20. Results for blogs
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 21
83%
17%
0%
Blogs
33%
43%
24%
Comments
Negative Positive Neutral
39%
40%
21%
Blogs & Comments
0
2
4
6
8
10
12
14
16
18
Neutral Positive Negative
Feature-level Buzz Summary for 3 features
NumberofBlogs
21. Insights
The ad campaign has a overall negative sentiment associated with this
People were using hashtags to express negative sentiment like #bad #worstadsever
The attribute/feature “********” has also negatively opinionated by users
There is some hike in associated tweets due to frequent advertisements during a particular day
This analysis is based on 30 days tweets only!!!
In the long run, more visible trends can be monitored
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 22
22. Data Science Practices at Impetus
Thursday, August 7, 2014 DATA SCIENCE PRACTICES, IMPETUS - INDIA 23
DSP
Statistical
model
development
Text
mining
Financial
data
analysis
Healthcare
data
analysis
Manufacturing
data analysis
Web
analytics
Funnel
Analysis