Knowledge discovery in social media mining for market analysis

Knowledge Discovery in
Social Media Mining for
Market Analysis
By: Senuri Wijenayake

 Introduction
Problem Addressed: Three research areas in Social Media
Mining
 Predictive Power
 Community Detection
 Influence Propagation
Focus: Analyzed the existing literature and find
applications in Social Media for Knowledge Discovery for
Market Analysis

 Background
Fact 1: Facebook has over 1.55 billion active users by
November 2015
(extracted from Statistics Portal – November 2015)
Fact 2: All adults spend at least 2 hours a day on some
form of social media network

 Focus of Research
A rich source of
data with
human
sentiment and
behavior
Developed
online
relationships
and groups
Online
interactions
where people
voice their ideas
Understand
customer
satisfaction and
changing
customer
requirements
Focused
marketing
campaigns for
better results
Influencing
consumer
behavior
effectively via
influential users

 Using Social Media to make Predictions
Progress So Far:
 Human Intuition – Can’t be duplicated
 Data Based Models – Inadequate data to represent
human cognitive process
SOLUTION: Use data available on social media for
predictive analysis.

 Using Social Media to make Predictions
Progress So Far:
 Yahoo Finance Message Board – Stock market
variability (Antweiler & Frank 2004)
 Google Search Queries – Track disease outbreaks
(Ginsberg et al. 2009)
 Amazon Reviews – Predicting product sales (Ghose &
Ipeirotis 2011)

 General Framework for SMM for Predictions
Stage 1: Preprocessing
 Social Media data are
unstructured
 Convert them into
high quality structured
data, suitable for data
mining
 Quality: Strong et al.
(1997)
 Objectivity
 Completeness
 Sufficiency
Stage 2: Predictive
Analysis
 Develop a model to
make accurate
predictions on a new
set of data (Harold
2013)
 Methodologies:
 Market Models
 Survey Models
 Statistical Models

 Data Preprocessing
Problem Solution
Data Cleaning Missing values
Noise
Outliers
Substitution
Regression
Data
Integration
Entity Identification
Redundancy
Schema based Entity
Identification
Duplicate Detection
Data
Transformation
Data can’t be used straight
away for mining
Generalize
Attribute Construction
Data Reduction Large amounts of data
requires a significant
processing power
Data Cube Aggregation
Attribute Selection

 Application of Predictions in Market Analysis
Objective: How the knowledge available could be used to
make predictions with regard to market analysis and how
successful is it ?
 Microblogging (Twitter) is most popular
Focus: Twitter data for predicting box
office performance of movies

Literature:
 Asur & Huberman (2010) used correlation and
regression based models on Twitter data
 Leskovec (2011) rectified imperfections which could rise
due to incomplete data
 Vasu Jain (2013) used sentiment analysis for predictions
 Gaikar & Marakarkandy (2015) introduced a framework for
using Twitter data for sentiment analysis and making
predictions

Gaikar & Marakarkandy (2015)
Predict box office
performance of a
Bollywood movie as
a hit, flop or an
average
Predict the opening
weekend revenue
collection

 Twitter for Predictions: Methodology
Module 1: Data Extraction
 The most trending hashtag on Twitter and
related hashtags are extracted (HashTags.org)
 Twitter4j API used to connect and extract
tweets from Twitter servers
 Stored in mySQL database
 Movie star ratings taken from Timex Celebex
A complete set of most relevant data has been
extracted

Module 2: Sentiment Analysis

Module 3: Predictive Analysis
 Predicting movie performance
 Input: Sentiment score + Movie Star Rating
 Process: Fuzzy Inference based model is
created
 Output: Box office movie performance as Hit,
Flop or Average

Module 3: Predictive Analysis
 Predicting weekend collection
 Input: Hype factor, Shows per day on all
screens, average full house collection
 Process:
 Output: Estimated opening weekend collection

 Twitter for Predictions: Findings & Evaluation
 10269 tweets for 14 movies released in a
period of six months (relevant, complete,
sufficient) was considered
 Actor ratings in the month of release was
considered
 Predictions compared against the real ratings
extracted from IMDB (near perfect predictions)
 Mean Square Error used to evaluate the
effectiveness of the predictive model (<7%
error rate)

 Twitter for Predictions: Findings & Evaluation

 Twitter for Predictions: Applications
 If the predicted revenue < budgeted revenue,
increase marketing and publicity efforts
 Can determine the maximum allowable
promotional budget
 Limitations:
 Only two predictor variables used to predict
box office performance (sentiment score +
actor rating)
 Use more variables

 Using Social Media for Community Detection
 60% of American population chose social media as
their first choice for information seeking (Scot et al.
2014)
 Social relationships transferred to the internet
 Online communities based on similar interests and
opinions have been created
 Opinion based community detection can be used to
identify such online communities

Literature:
 Park & Cho (2012) identified online communities as an
information source for apparel shopping
 Dev (2014) proposed an algorithm for community
detection in social media based on different interaction
methods (no opinion mining)
 Kavoura (2014) identified the impact of online
communities for communication
 Dinsoreanu & Potolea introduced a framework for
opinion based community detection in social media
 Using Social Media for Community Detection

 Data Preparation:
 Extracted user comments from blog posts and forums
 A classification model for opinion mining created a set of
labelled documents and 5 grammar rules introduced by
Turney 2002.
 Extracted tokens (after filtering) are classified into positive
and negative opinions using SVM and NB. A sentiment
score assigned to each token.
 Tokens stored in a structure format (includes the id,
holder, opinion keyword, polarity score etc.)
 Community Detection: Methodology

 Opinion based Community Detection:
 Identifying communities based on similar interests in
multiple targets
 Aggregate functions to represent the similarity of
opinions in multiple targets
 Similarity graphs based on Euclidean distance were drawn

 Opinion based Community Detection:
 Similarity Functions:

 1000 labelled documents used as the training
set for NB and SVM
 Near perfect classification of opinions can be
obtained
 A user generated data set was used to apply
community detection algorithms
 Findings:
 Linear functions perform poorly when number of
targets increase
 Exponential functions with cutoff perform best with
increasing opinions
 Community Detection: Findings & Evaluation

 A practice application of community
detection was not conducted
 Suggestion: The proposed framework can be
applied in the pharmaceutical industry for
online community detection
 Background Literature:
 “CyberRx” by Radar & Subhan (2013)
 Community Detection: Limitations

 Community Detection: Potential Application
CyberRx New Approach
Data Collection Forums and Blogs using
Google Alerts
Additional sources such
as bulletin boards
Keywords Used Formal names and
language
More popular brand
names and consumer
driven language
Opinion Mining Manual Automated (SVM
classifier)
Community
Detection
Manual Aggregated functions
and Similarity graphs
Findings Two main communities,
- Side effects, medications
- Changing medication
More specific
communities can be
identified

 Community Detection: Potential Application
 Knowledge such as,
 Most prevalent diseases classified based on
geography and demography
 Most popularly used brands of drugs
 Competing alternatives for a given drug
 Information of specifications, variations,
duration ad personal experience of side effects
(both normal and abnormal)

 Using Social Media for Influence Propagation
 People influence each other via online interactions and
communications
 Purchase decisions are heavily influenced by eWoM in
social media networks
 34% of Twitter users post product related opinions at
least once a week (ROI Research Institute)
 Objective: Target most influential user on social media
to activate a chain of influence driven by eWoM

 Literature:
 Khobzi (2014) conducted a basic content based
analysis on Facebook posts, to identify the connection
between the sentiment and the popularity of the post
 Kaiser et al. (2012) analyzed opinion formation and
influential users based on data collected on iPhone
reviews
 Okazaki et al. (2014) explored the different types of
customer engagement in social media networks and
their impact on influence propagation
 Using Social Media for Influence Propagation

 Influence Propagation: Methodology
 Focus group: IKEA customers
 Training set included 300 preprocessed Tweets
 Classified manually based on customer emotional
status and content
 Emotional Status: Satisfied, Dissatisfied, Neutral
 Content: Information, Sharing, Opinion, Question, Reply
 Trained NB, KNN, SVM classfiers
 NB performed best

 Influence Propagation: Application
 New data set: 4000 tweets
 Users were seen as nodes and tweets as their
relationships
 Google’s PageRank algorithm to determine the relative
importance of each user
 Findings:
 One satisfied user sharing information (positive eWoM)
 Three dissatisfied users spreading negative opinions

 Influence Propagation: Suggestions
 Conclusion:
 Influential Users can be identified
 Different customer satisfaction levels are crucial
 Suggestions:
 Using celebrities and converting their followers into influence
makers.
 Additional incentives could be provided to encourage
engagement in discussions
 Closely monitor for dissatisfied customers online and
occasionally mediate in retweets suggesting feasible solutions
and demonstrate their commitment

 Knowledge Discovery in SMM: Conclusions
 Consolidates the potential knowledge areas that could
be exploited for market analysis via community
detection in, predictive power of and influence
propagation in social media.
 Properly preprocessed social media data, with
acceptable quality when applied to robust statistical
models could predict future market trends with
considerable accuracy.
 Social media taken social relationships to the digital
platform and have created opinion based communities
online. These can be used to identify genuine
consumer requirements.

 Knowledge Discovery in SMM: Conclusions
 People express their genuine consumer experiences on
social media networks which clearly influence
purchasing decisions of other potential consumers.
 An efficient framework can identify influential users
online and trigger a chain of positive eWoM promoting
viral marketing.

Knowledge discovery in social media mining for market analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (14)

Similar to Knowledge discovery in social media mining for market analysis

Similar to Knowledge discovery in social media mining for market analysis (20)

Recently uploaded

Recently uploaded (20)

Knowledge discovery in social media mining for market analysis

Editor's Notes