SlideShare a Scribd company logo
1 of 35
Knowledge Discovery in
Social Media Mining for
Market Analysis
By: Senuri Wijenayake
 Introduction
Problem Addressed: Three research areas in Social Media
Mining
 Predictive Power
 Community Detection
 Influence Propagation
Focus: Analyzed the existing literature and find
applications in Social Media for Knowledge Discovery for
Market Analysis
 Background
Fact 1: Facebook has over 1.55 billion active users by
November 2015
(extracted from Statistics Portal – November 2015)
Fact 2: All adults spend at least 2 hours a day on some
form of social media network
 Focus of Research
A rich source of
data with
human
sentiment and
behavior
Developed
online
relationships
and groups
Online
interactions
where people
voice their ideas
Understand
customer
satisfaction and
changing
customer
requirements
Focused
marketing
campaigns for
better results
Influencing
consumer
behavior
effectively via
influential users
 Using Social Media to make Predictions
Progress So Far:
 Human Intuition – Can’t be duplicated
 Data Based Models – Inadequate data to represent
human cognitive process
SOLUTION: Use data available on social media for
predictive analysis.
 Using Social Media to make Predictions
Progress So Far:
 Yahoo Finance Message Board – Stock market
variability (Antweiler & Frank 2004)
 Google Search Queries – Track disease outbreaks
(Ginsberg et al. 2009)
 Amazon Reviews – Predicting product sales (Ghose &
Ipeirotis 2011)
 General Framework for SMM for Predictions
Stage 1: Preprocessing
 Social Media data are
unstructured
 Convert them into
high quality structured
data, suitable for data
mining
 Quality: Strong et al.
(1997)
 Objectivity
 Completeness
 Sufficiency
Stage 2: Predictive
Analysis
 Develop a model to
make accurate
predictions on a new
set of data (Harold
2013)
 Methodologies:
 Market Models
 Survey Models
 Statistical Models
 Data Preprocessing
Problem Solution
Data Cleaning Missing values
Noise
Outliers
Substitution
Regression
Data
Integration
Entity Identification
Redundancy
Schema based Entity
Identification
Duplicate Detection
Data
Transformation
Data can’t be used straight
away for mining
Generalize
Attribute Construction
Data Reduction Large amounts of data
requires a significant
processing power
Data Cube Aggregation
Attribute Selection
 Application of Predictions in Market Analysis
Objective: How the knowledge available could be used to
make predictions with regard to market analysis and how
successful is it ?
 Microblogging (Twitter) is most popular
Focus: Twitter data for predicting box
office performance of movies
 Application of Predictions in Market Analysis
Literature:
 Asur & Huberman (2010) used correlation and
regression based models on Twitter data
 Leskovec (2011) rectified imperfections which could rise
due to incomplete data
 Vasu Jain (2013) used sentiment analysis for predictions
 Gaikar & Marakarkandy (2015) introduced a framework for
using Twitter data for sentiment analysis and making
predictions
 Application of Predictions in Market Analysis
Gaikar & Marakarkandy (2015)
Predict box office
performance of a
Bollywood movie as
a hit, flop or an
average
Predict the opening
weekend revenue
collection
 Twitter for Predictions: Methodology
Module 1: Data Extraction
 The most trending hashtag on Twitter and
related hashtags are extracted (HashTags.org)
 Twitter4j API used to connect and extract
tweets from Twitter servers
 Stored in mySQL database
 Movie star ratings taken from Timex Celebex
A complete set of most relevant data has been
extracted
 Twitter for Predictions: Methodology
Module 2: Sentiment Analysis
 Twitter for Predictions: Methodology
Module 3: Predictive Analysis
 Predicting movie performance
 Input: Sentiment score + Movie Star Rating
 Process: Fuzzy Inference based model is
created
 Output: Box office movie performance as Hit,
Flop or Average
 Twitter for Predictions: Methodology
Module 3: Predictive Analysis
 Predicting weekend collection
 Input: Hype factor, Shows per day on all
screens, average full house collection
 Process:
 Output: Estimated opening weekend collection
 Twitter for Predictions: Findings & Evaluation
 10269 tweets for 14 movies released in a
period of six months (relevant, complete,
sufficient) was considered
 Actor ratings in the month of release was
considered
 Predictions compared against the real ratings
extracted from IMDB (near perfect predictions)
 Mean Square Error used to evaluate the
effectiveness of the predictive model (<7%
error rate)
 Twitter for Predictions: Findings & Evaluation
 Twitter for Predictions: Applications
 If the predicted revenue < budgeted revenue,
increase marketing and publicity efforts
 Can determine the maximum allowable
promotional budget
 Limitations:
 Only two predictor variables used to predict
box office performance (sentiment score +
actor rating)
 Use more variables
 Using Social Media for Community Detection
 60% of American population chose social media as
their first choice for information seeking (Scot et al.
2014)
 Social relationships transferred to the internet
 Online communities based on similar interests and
opinions have been created
 Opinion based community detection can be used to
identify such online communities
Literature:
 Park & Cho (2012) identified online communities as an
information source for apparel shopping
 Dev (2014) proposed an algorithm for community
detection in social media based on different interaction
methods (no opinion mining)
 Kavoura (2014) identified the impact of online
communities for communication
 Dinsoreanu & Potolea introduced a framework for
opinion based community detection in social media
 Using Social Media for Community Detection
 Data Preparation:
 Extracted user comments from blog posts and forums
 A classification model for opinion mining created a set of
labelled documents and 5 grammar rules introduced by
Turney 2002.
 Extracted tokens (after filtering) are classified into positive
and negative opinions using SVM and NB. A sentiment
score assigned to each token.
 Tokens stored in a structure format (includes the id,
holder, opinion keyword, polarity score etc.)
 Community Detection: Methodology
 Opinion based Community Detection:
 Identifying communities based on similar interests in
multiple targets
 Aggregate functions to represent the similarity of
opinions in multiple targets
 Similarity graphs based on Euclidean distance were drawn
 Community Detection: Methodology
 Opinion based Community Detection:
 Similarity Functions:
 Community Detection: Methodology
 1000 labelled documents used as the training
set for NB and SVM
 Near perfect classification of opinions can be
obtained
 A user generated data set was used to apply
community detection algorithms
 Findings:
 Linear functions perform poorly when number of
targets increase
 Exponential functions with cutoff perform best with
increasing opinions
 Community Detection: Findings & Evaluation
 A practice application of community
detection was not conducted
 Suggestion: The proposed framework can be
applied in the pharmaceutical industry for
online community detection
 Background Literature:
 “CyberRx” by Radar & Subhan (2013)
 Community Detection: Limitations
 Community Detection: Potential Application
CyberRx New Approach
Data Collection Forums and Blogs using
Google Alerts
Additional sources such
as bulletin boards
Keywords Used Formal names and
language
More popular brand
names and consumer
driven language
Opinion Mining Manual Automated (SVM
classifier)
Community
Detection
Manual Aggregated functions
and Similarity graphs
Findings Two main communities,
- Side effects, medications
- Changing medication
More specific
communities can be
identified
 Community Detection: Potential Application
 Knowledge such as,
 Most prevalent diseases classified based on
geography and demography
 Most popularly used brands of drugs
 Competing alternatives for a given drug
 Information of specifications, variations,
duration ad personal experience of side effects
(both normal and abnormal)
 Using Social Media for Influence Propagation
 People influence each other via online interactions and
communications
 Purchase decisions are heavily influenced by eWoM in
social media networks
 34% of Twitter users post product related opinions at
least once a week (ROI Research Institute)
 Objective: Target most influential user on social media
to activate a chain of influence driven by eWoM
 Literature:
 Khobzi (2014) conducted a basic content based
analysis on Facebook posts, to identify the connection
between the sentiment and the popularity of the post
 Kaiser et al. (2012) analyzed opinion formation and
influential users based on data collected on iPhone
reviews
 Okazaki et al. (2014) explored the different types of
customer engagement in social media networks and
their impact on influence propagation
 Using Social Media for Influence Propagation
 Influence Propagation: Methodology
 Focus group: IKEA customers
 Training set included 300 preprocessed Tweets
 Classified manually based on customer emotional
status and content
 Emotional Status: Satisfied, Dissatisfied, Neutral
 Content: Information, Sharing, Opinion, Question, Reply
 Trained NB, KNN, SVM classfiers
 NB performed best
 Influence Propagation: Application
 New data set: 4000 tweets
 Users were seen as nodes and tweets as their
relationships
 Google’s PageRank algorithm to determine the relative
importance of each user
 Findings:
 One satisfied user sharing information (positive eWoM)
 Three dissatisfied users spreading negative opinions
 Influence Propagation: Suggestions
 Conclusion:
 Influential Users can be identified
 Different customer satisfaction levels are crucial
 Suggestions:
 Using celebrities and converting their followers into influence
makers.
 Additional incentives could be provided to encourage
engagement in discussions
 Closely monitor for dissatisfied customers online and
occasionally mediate in retweets suggesting feasible solutions
and demonstrate their commitment
 Knowledge Discovery in SMM: Conclusions
 Consolidates the potential knowledge areas that could
be exploited for market analysis via community
detection in, predictive power of and influence
propagation in social media.
 Properly preprocessed social media data, with
acceptable quality when applied to robust statistical
models could predict future market trends with
considerable accuracy.
 Social media taken social relationships to the digital
platform and have created opinion based communities
online. These can be used to identify genuine
consumer requirements.
 Knowledge Discovery in SMM: Conclusions
 People express their genuine consumer experiences on
social media networks which clearly influence
purchasing decisions of other potential consumers.
 An efficient framework can identify influential users
online and trigger a chain of positive eWoM promoting
viral marketing.
Questions & Answers

More Related Content

What's hot

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Predicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataPredicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataIJECEIAES
 
Chapter 6 presentation
Chapter 6 presentationChapter 6 presentation
Chapter 6 presentationsabucher
 
Chapter 6 presentation
Chapter 6 presentationChapter 6 presentation
Chapter 6 presentationMiles223
 
Team Lecture on Blog
Team Lecture on BlogTeam Lecture on Blog
Team Lecture on Blogmcleanq
 
The impact of sentiment analysis from user on Facebook to enhanced the servic...
The impact of sentiment analysis from user on Facebook to enhanced the servic...The impact of sentiment analysis from user on Facebook to enhanced the servic...
The impact of sentiment analysis from user on Facebook to enhanced the servic...IJECEIAES
 
Interaction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsInteraction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsUniversity of Bergen
 
2019 BEA Ignite: Miao Guo
2019 BEA Ignite: Miao Guo2019 BEA Ignite: Miao Guo
2019 BEA Ignite: Miao GuoMichael Bruce
 
IRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET Journal
 
How media consumption habits are changing.
How media consumption habits are changing.How media consumption habits are changing.
How media consumption habits are changing.CatchTalk.TV
 
Planning to Evaluate Earned, Social/Digital Media Campaigns
Planning to Evaluate Earned, Social/Digital Media CampaignsPlanning to Evaluate Earned, Social/Digital Media Campaigns
Planning to Evaluate Earned, Social/Digital Media CampaignsEman Aly
 
Implications of writing, reading, and tagging on the web for reflection supp...
Implications of writing, reading, and tagging  on the web for reflection supp...Implications of writing, reading, and tagging  on the web for reflection supp...
Implications of writing, reading, and tagging on the web for reflection supp...Christian Glahn
 
Designing Cybersecurity Policies with Field Experiments
Designing Cybersecurity Policies with Field ExperimentsDesigning Cybersecurity Policies with Field Experiments
Designing Cybersecurity Policies with Field ExperimentsGene Moo Lee
 
Research Roadmap for Automatic Persona Generation (2018)
Research Roadmap for Automatic Persona Generation (2018)Research Roadmap for Automatic Persona Generation (2018)
Research Roadmap for Automatic Persona Generation (2018)Joni Salminen
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Facebook - iStrategy Atlanta
Facebook - iStrategy AtlantaFacebook - iStrategy Atlanta
Facebook - iStrategy AtlantaiStrategy
 

What's hot (19)

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Predicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataPredicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand Metadata
 
Chapter 6 presentation
Chapter 6 presentationChapter 6 presentation
Chapter 6 presentation
 
Chapter 6 presentation
Chapter 6 presentationChapter 6 presentation
Chapter 6 presentation
 
Team Lecture on Blog
Team Lecture on BlogTeam Lecture on Blog
Team Lecture on Blog
 
The impact of sentiment analysis from user on Facebook to enhanced the servic...
The impact of sentiment analysis from user on Facebook to enhanced the servic...The impact of sentiment analysis from user on Facebook to enhanced the servic...
The impact of sentiment analysis from user on Facebook to enhanced the servic...
 
Interaction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsInteraction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender Systems
 
2019 BEA Ignite: Miao Guo
2019 BEA Ignite: Miao Guo2019 BEA Ignite: Miao Guo
2019 BEA Ignite: Miao Guo
 
IRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation System
 
How media consumption habits are changing.
How media consumption habits are changing.How media consumption habits are changing.
How media consumption habits are changing.
 
Social Network Marketing
Social Network MarketingSocial Network Marketing
Social Network Marketing
 
Planning to Evaluate Earned, Social/Digital Media Campaigns
Planning to Evaluate Earned, Social/Digital Media CampaignsPlanning to Evaluate Earned, Social/Digital Media Campaigns
Planning to Evaluate Earned, Social/Digital Media Campaigns
 
Implications of writing, reading, and tagging on the web for reflection supp...
Implications of writing, reading, and tagging  on the web for reflection supp...Implications of writing, reading, and tagging  on the web for reflection supp...
Implications of writing, reading, and tagging on the web for reflection supp...
 
Designing Cybersecurity Policies with Field Experiments
Designing Cybersecurity Policies with Field ExperimentsDesigning Cybersecurity Policies with Field Experiments
Designing Cybersecurity Policies with Field Experiments
 
Using social network sites
Using social network sites Using social network sites
Using social network sites
 
Research Roadmap for Automatic Persona Generation (2018)
Research Roadmap for Automatic Persona Generation (2018)Research Roadmap for Automatic Persona Generation (2018)
Research Roadmap for Automatic Persona Generation (2018)
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Facebook - iStrategy Atlanta
Facebook - iStrategy AtlantaFacebook - iStrategy Atlanta
Facebook - iStrategy Atlanta
 

Viewers also liked

Using Social Media In Cross Media Direct Influence Marketing
Using Social Media In Cross Media   Direct Influence MarketingUsing Social Media In Cross Media   Direct Influence Marketing
Using Social Media In Cross Media Direct Influence MarketingJeffrey Stewart
 
Starship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery RobotsStarship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery RobotsAndré Karpištšenko
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 
Social media and behaviour change
Social media and behaviour changeSocial media and behaviour change
Social media and behaviour changeMax St John
 
USING SOCIAL MEDIA TO INFLUENCE POSITIVE BEHAVIOUR CHANGE
USING SOCIAL MEDIA TO INFLUENCE POSITIVE BEHAVIOUR CHANGEUSING SOCIAL MEDIA TO INFLUENCE POSITIVE BEHAVIOUR CHANGE
USING SOCIAL MEDIA TO INFLUENCE POSITIVE BEHAVIOUR CHANGESocial Change UK
 
Social Media Influence 2010: Alexandra Wheeler, Digital Director, Starbucks
Social Media Influence 2010: Alexandra Wheeler, Digital Director, StarbucksSocial Media Influence 2010: Alexandra Wheeler, Digital Director, Starbucks
Social Media Influence 2010: Alexandra Wheeler, Digital Director, StarbucksSustainly
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in DatabasesDiwas Kandel
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Related Literature and Related Studies
Related Literature and Related StudiesRelated Literature and Related Studies
Related Literature and Related StudiesJenny Reyes
 
Consumer behavior and factors influencing consumer behavior
Consumer behavior and factors influencing consumer behaviorConsumer behavior and factors influencing consumer behavior
Consumer behavior and factors influencing consumer behaviorWish Mrt'xa
 

Viewers also liked (14)

Using Social Media In Cross Media Direct Influence Marketing
Using Social Media In Cross Media   Direct Influence MarketingUsing Social Media In Cross Media   Direct Influence Marketing
Using Social Media In Cross Media Direct Influence Marketing
 
Knowledge Discovery in Production
Knowledge Discovery in ProductionKnowledge Discovery in Production
Knowledge Discovery in Production
 
Starship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery RobotsStarship, Building Intelligent Delivery Robots
Starship, Building Intelligent Delivery Robots
 
Social Media engagement as an e-commerce driver, a consumer behavior perspect...
Social Media engagement as an e-commerce driver, a consumer behavior perspect...Social Media engagement as an e-commerce driver, a consumer behavior perspect...
Social Media engagement as an e-commerce driver, a consumer behavior perspect...
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Social media and behaviour change
Social media and behaviour changeSocial media and behaviour change
Social media and behaviour change
 
USING SOCIAL MEDIA TO INFLUENCE POSITIVE BEHAVIOUR CHANGE
USING SOCIAL MEDIA TO INFLUENCE POSITIVE BEHAVIOUR CHANGEUSING SOCIAL MEDIA TO INFLUENCE POSITIVE BEHAVIOUR CHANGE
USING SOCIAL MEDIA TO INFLUENCE POSITIVE BEHAVIOUR CHANGE
 
Social Media Influence 2010: Alexandra Wheeler, Digital Director, Starbucks
Social Media Influence 2010: Alexandra Wheeler, Digital Director, StarbucksSocial Media Influence 2010: Alexandra Wheeler, Digital Director, Starbucks
Social Media Influence 2010: Alexandra Wheeler, Digital Director, Starbucks
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 
Business-IT Alignment
Business-IT AlignmentBusiness-IT Alignment
Business-IT Alignment
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
INDIAN TEA MARKET
INDIAN TEA MARKET INDIAN TEA MARKET
INDIAN TEA MARKET
 
Related Literature and Related Studies
Related Literature and Related StudiesRelated Literature and Related Studies
Related Literature and Related Studies
 
Consumer behavior and factors influencing consumer behavior
Consumer behavior and factors influencing consumer behaviorConsumer behavior and factors influencing consumer behavior
Consumer behavior and factors influencing consumer behavior
 

Similar to Knowledge discovery in social media mining for market analysis

Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011idoguy
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET Journal
 
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolAutomatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolLaurie Smith
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systemsvivatechijri
 
Online social network analysis with machine learning techniques
Online social network analysis with machine learning techniquesOnline social network analysis with machine learning techniques
Online social network analysis with machine learning techniquesHari KC
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
customer behavior analysis for social media
customer behavior analysis for social mediacustomer behavior analysis for social media
customer behavior analysis for social mediaINFOGAIN PUBLICATION
 
Digital Insights & Analytics
Digital Insights & AnalyticsDigital Insights & Analytics
Digital Insights & AnalyticsKen Burbary
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7
 
Mediawave, social media monitoring & data analytics
Mediawave, social media monitoring & data analyticsMediawave, social media monitoring & data analytics
Mediawave, social media monitoring & data analyticsDwi Wahyono
 
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET Journal
 
A Review of machine learning approaches to mine Social Choice of voters.
A Review of machine learning approaches to mine Social Choice of voters.A Review of machine learning approaches to mine Social Choice of voters.
A Review of machine learning approaches to mine Social Choice of voters.IRJET Journal
 
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIATHE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIAIJCSES Journal
 
Social Media Data Analysis and Visualization Tools
Social Media Data Analysis and Visualization ToolsSocial Media Data Analysis and Visualization Tools
Social Media Data Analysis and Visualization ToolsSayani Majumder
 
Consumer Purchase Intention Prediction System
Consumer Purchase Intention Prediction SystemConsumer Purchase Intention Prediction System
Consumer Purchase Intention Prediction SystemIRJET Journal
 

Similar to Knowledge discovery in social media mining for market analysis (20)

Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Monitoring opinion on esop through social media and clustering its polarity
Monitoring opinion on esop through social media and clustering its polarityMonitoring opinion on esop through social media and clustering its polarity
Monitoring opinion on esop through social media and clustering its polarity
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment Analysis
 
ML
ML ML
ML
 
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolAutomatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Online social network analysis with machine learning techniques
Online social network analysis with machine learning techniquesOnline social network analysis with machine learning techniques
Online social network analysis with machine learning techniques
 
Sub1557
Sub1557Sub1557
Sub1557
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
customer behavior analysis for social media
customer behavior analysis for social mediacustomer behavior analysis for social media
customer behavior analysis for social media
 
H018135054
H018135054H018135054
H018135054
 
Digital Insights & Analytics
Digital Insights & AnalyticsDigital Insights & Analytics
Digital Insights & Analytics
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
Mediawave, social media monitoring & data analytics
Mediawave, social media monitoring & data analyticsMediawave, social media monitoring & data analytics
Mediawave, social media monitoring & data analytics
 
243
243243
243
 
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
 
A Review of machine learning approaches to mine Social Choice of voters.
A Review of machine learning approaches to mine Social Choice of voters.A Review of machine learning approaches to mine Social Choice of voters.
A Review of machine learning approaches to mine Social Choice of voters.
 
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIATHE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
 
Social Media Data Analysis and Visualization Tools
Social Media Data Analysis and Visualization ToolsSocial Media Data Analysis and Visualization Tools
Social Media Data Analysis and Visualization Tools
 
Consumer Purchase Intention Prediction System
Consumer Purchase Intention Prediction SystemConsumer Purchase Intention Prediction System
Consumer Purchase Intention Prediction System
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Knowledge discovery in social media mining for market analysis

  • 1. Knowledge Discovery in Social Media Mining for Market Analysis By: Senuri Wijenayake
  • 2.  Introduction Problem Addressed: Three research areas in Social Media Mining  Predictive Power  Community Detection  Influence Propagation Focus: Analyzed the existing literature and find applications in Social Media for Knowledge Discovery for Market Analysis
  • 3.  Background Fact 1: Facebook has over 1.55 billion active users by November 2015 (extracted from Statistics Portal – November 2015) Fact 2: All adults spend at least 2 hours a day on some form of social media network
  • 4.  Focus of Research A rich source of data with human sentiment and behavior Developed online relationships and groups Online interactions where people voice their ideas Understand customer satisfaction and changing customer requirements Focused marketing campaigns for better results Influencing consumer behavior effectively via influential users
  • 5.  Using Social Media to make Predictions Progress So Far:  Human Intuition – Can’t be duplicated  Data Based Models – Inadequate data to represent human cognitive process SOLUTION: Use data available on social media for predictive analysis.
  • 6.  Using Social Media to make Predictions Progress So Far:  Yahoo Finance Message Board – Stock market variability (Antweiler & Frank 2004)  Google Search Queries – Track disease outbreaks (Ginsberg et al. 2009)  Amazon Reviews – Predicting product sales (Ghose & Ipeirotis 2011)
  • 7.  General Framework for SMM for Predictions Stage 1: Preprocessing  Social Media data are unstructured  Convert them into high quality structured data, suitable for data mining  Quality: Strong et al. (1997)  Objectivity  Completeness  Sufficiency Stage 2: Predictive Analysis  Develop a model to make accurate predictions on a new set of data (Harold 2013)  Methodologies:  Market Models  Survey Models  Statistical Models
  • 8.  Data Preprocessing Problem Solution Data Cleaning Missing values Noise Outliers Substitution Regression Data Integration Entity Identification Redundancy Schema based Entity Identification Duplicate Detection Data Transformation Data can’t be used straight away for mining Generalize Attribute Construction Data Reduction Large amounts of data requires a significant processing power Data Cube Aggregation Attribute Selection
  • 9.  Application of Predictions in Market Analysis Objective: How the knowledge available could be used to make predictions with regard to market analysis and how successful is it ?  Microblogging (Twitter) is most popular Focus: Twitter data for predicting box office performance of movies
  • 10.  Application of Predictions in Market Analysis Literature:  Asur & Huberman (2010) used correlation and regression based models on Twitter data  Leskovec (2011) rectified imperfections which could rise due to incomplete data  Vasu Jain (2013) used sentiment analysis for predictions  Gaikar & Marakarkandy (2015) introduced a framework for using Twitter data for sentiment analysis and making predictions
  • 11.  Application of Predictions in Market Analysis Gaikar & Marakarkandy (2015) Predict box office performance of a Bollywood movie as a hit, flop or an average Predict the opening weekend revenue collection
  • 12.  Twitter for Predictions: Methodology Module 1: Data Extraction  The most trending hashtag on Twitter and related hashtags are extracted (HashTags.org)  Twitter4j API used to connect and extract tweets from Twitter servers  Stored in mySQL database  Movie star ratings taken from Timex Celebex A complete set of most relevant data has been extracted
  • 13.  Twitter for Predictions: Methodology Module 2: Sentiment Analysis
  • 14.  Twitter for Predictions: Methodology Module 3: Predictive Analysis  Predicting movie performance  Input: Sentiment score + Movie Star Rating  Process: Fuzzy Inference based model is created  Output: Box office movie performance as Hit, Flop or Average
  • 15.  Twitter for Predictions: Methodology Module 3: Predictive Analysis  Predicting weekend collection  Input: Hype factor, Shows per day on all screens, average full house collection  Process:  Output: Estimated opening weekend collection
  • 16.  Twitter for Predictions: Findings & Evaluation  10269 tweets for 14 movies released in a period of six months (relevant, complete, sufficient) was considered  Actor ratings in the month of release was considered  Predictions compared against the real ratings extracted from IMDB (near perfect predictions)  Mean Square Error used to evaluate the effectiveness of the predictive model (<7% error rate)
  • 17.  Twitter for Predictions: Findings & Evaluation
  • 18.  Twitter for Predictions: Applications  If the predicted revenue < budgeted revenue, increase marketing and publicity efforts  Can determine the maximum allowable promotional budget  Limitations:  Only two predictor variables used to predict box office performance (sentiment score + actor rating)  Use more variables
  • 19.  Using Social Media for Community Detection  60% of American population chose social media as their first choice for information seeking (Scot et al. 2014)  Social relationships transferred to the internet  Online communities based on similar interests and opinions have been created  Opinion based community detection can be used to identify such online communities
  • 20. Literature:  Park & Cho (2012) identified online communities as an information source for apparel shopping  Dev (2014) proposed an algorithm for community detection in social media based on different interaction methods (no opinion mining)  Kavoura (2014) identified the impact of online communities for communication  Dinsoreanu & Potolea introduced a framework for opinion based community detection in social media  Using Social Media for Community Detection
  • 21.  Data Preparation:  Extracted user comments from blog posts and forums  A classification model for opinion mining created a set of labelled documents and 5 grammar rules introduced by Turney 2002.  Extracted tokens (after filtering) are classified into positive and negative opinions using SVM and NB. A sentiment score assigned to each token.  Tokens stored in a structure format (includes the id, holder, opinion keyword, polarity score etc.)  Community Detection: Methodology
  • 22.  Opinion based Community Detection:  Identifying communities based on similar interests in multiple targets  Aggregate functions to represent the similarity of opinions in multiple targets  Similarity graphs based on Euclidean distance were drawn  Community Detection: Methodology
  • 23.  Opinion based Community Detection:  Similarity Functions:  Community Detection: Methodology
  • 24.  1000 labelled documents used as the training set for NB and SVM  Near perfect classification of opinions can be obtained  A user generated data set was used to apply community detection algorithms  Findings:  Linear functions perform poorly when number of targets increase  Exponential functions with cutoff perform best with increasing opinions  Community Detection: Findings & Evaluation
  • 25.  A practice application of community detection was not conducted  Suggestion: The proposed framework can be applied in the pharmaceutical industry for online community detection  Background Literature:  “CyberRx” by Radar & Subhan (2013)  Community Detection: Limitations
  • 26.  Community Detection: Potential Application CyberRx New Approach Data Collection Forums and Blogs using Google Alerts Additional sources such as bulletin boards Keywords Used Formal names and language More popular brand names and consumer driven language Opinion Mining Manual Automated (SVM classifier) Community Detection Manual Aggregated functions and Similarity graphs Findings Two main communities, - Side effects, medications - Changing medication More specific communities can be identified
  • 27.  Community Detection: Potential Application  Knowledge such as,  Most prevalent diseases classified based on geography and demography  Most popularly used brands of drugs  Competing alternatives for a given drug  Information of specifications, variations, duration ad personal experience of side effects (both normal and abnormal)
  • 28.  Using Social Media for Influence Propagation  People influence each other via online interactions and communications  Purchase decisions are heavily influenced by eWoM in social media networks  34% of Twitter users post product related opinions at least once a week (ROI Research Institute)  Objective: Target most influential user on social media to activate a chain of influence driven by eWoM
  • 29.  Literature:  Khobzi (2014) conducted a basic content based analysis on Facebook posts, to identify the connection between the sentiment and the popularity of the post  Kaiser et al. (2012) analyzed opinion formation and influential users based on data collected on iPhone reviews  Okazaki et al. (2014) explored the different types of customer engagement in social media networks and their impact on influence propagation  Using Social Media for Influence Propagation
  • 30.  Influence Propagation: Methodology  Focus group: IKEA customers  Training set included 300 preprocessed Tweets  Classified manually based on customer emotional status and content  Emotional Status: Satisfied, Dissatisfied, Neutral  Content: Information, Sharing, Opinion, Question, Reply  Trained NB, KNN, SVM classfiers  NB performed best
  • 31.  Influence Propagation: Application  New data set: 4000 tweets  Users were seen as nodes and tweets as their relationships  Google’s PageRank algorithm to determine the relative importance of each user  Findings:  One satisfied user sharing information (positive eWoM)  Three dissatisfied users spreading negative opinions
  • 32.  Influence Propagation: Suggestions  Conclusion:  Influential Users can be identified  Different customer satisfaction levels are crucial  Suggestions:  Using celebrities and converting their followers into influence makers.  Additional incentives could be provided to encourage engagement in discussions  Closely monitor for dissatisfied customers online and occasionally mediate in retweets suggesting feasible solutions and demonstrate their commitment
  • 33.  Knowledge Discovery in SMM: Conclusions  Consolidates the potential knowledge areas that could be exploited for market analysis via community detection in, predictive power of and influence propagation in social media.  Properly preprocessed social media data, with acceptable quality when applied to robust statistical models could predict future market trends with considerable accuracy.  Social media taken social relationships to the digital platform and have created opinion based communities online. These can be used to identify genuine consumer requirements.
  • 34.  Knowledge Discovery in SMM: Conclusions  People express their genuine consumer experiences on social media networks which clearly influence purchasing decisions of other potential consumers.  An efficient framework can identify influential users online and trigger a chain of positive eWoM promoting viral marketing.

Editor's Notes

  1. My focus is to understand how the unstructured data in social media could be transformed into valuable knowledge via the application of social media mining techniques, and how it can be applied in a real world application for market analysis.
  2. Attribute Construction : Create new attributes by combining many
  3. Objective: How the knowledge available could be used to make predictions with regard to market analysis and how successful is it ?
  4. The methodology, findings, limitations and suggestions are presented
  5. Input: the populated DB file and the Keyword DB file Process: PLSA classifier implemented in Matlab used for sentiment analysis classification, compared each word with the keywords in the dictionary file Output: Sentiment score was assigned to each tweet and a total score was taken considering all the tweets (cell2mat) Why Probabilistic Latent Semantic Analysis?
  6. Why FIS?
  7. Hype Factor was obtained using the number of distinct users tweeting and their average follower count
  8. (director rating, producer rating, impact of promotional and production budgets)
  9. Features of the data set: Number of key opinion, number of distinct users, degree of membership, maximum allowable similarity between two communities