SlideShare a Scribd company logo
1 of 22
Social Media Brand
Positioning:
Perceptual Mapping using Twitter Data
David Gerson
Gersondave@gmail.com
Why do this?
•It’s a known business standard for
enabling stakeholders, clients, and
decision makers to easily see and
compare like and unalike elements.
Problems
•At their most numerical they typically are
qualitative in nature.
•Even more are designed using a
“scorecard” approach.
•If a scorecard is used it is limited to only a
few points and a qualitative assessment to
define the numeric measures.
•Distances and positioning are defined
using a “human element”.
Perceptual Maps
•Perceptual mapping is
a diagrammatic technique used by
asset marketers that attempts to visually
display the perceptions of customers or
potential customers. [wikipedia]
•Perceptual maps enable you to find
opportunities in the market for a new product
or to identify potentially competitive products.
Perceptual Maps
•Perceptual mapping is
a diagrammatic technique used by
asset marketers that attempts to visually
display the perceptions of customers or
potential customers. [wikipedia]
•Perceptual maps enable you to find
opportunities in the market for a new product
or to identify potentially competitive products.
Attributed to http://npdbook.com/
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
The Current Process
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Twitter Extraction
•Implementation
• Streaming Api was left online for 1 week pulling data a target group of fast food companies. This twitter
data collected will be used to generate our feature set.
• A second set of data, without filters is used to create a control set of data that will help us determine
what words are key to food, and which words are most important in the context of fast food
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Twitter Extraction
•Packages:
• Twython library provides an easily accessible API wrapper which can be used for the twitter streaming
API.
• The Twython API allows a user to plug into twitter and access data
•Limitations
• The api didn’t seamlessly handle some necessary parameters to filter twitter data (ie. language)
• The classic unicode/asci conversion problems are rife in the twitter dataset.
• The Firehose API was deprecated while this project was being worked on. Without the ability to parse to
the feed based on language and without firehose access I alternatively used the Swiftkey dataset.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Tokenizing Dataset
•Implementation
• Tokenizing is simple, there are a few ways to do it but the easiest is to split the data. First by column “n”
and then by an empty string.
• At the tokenization stage it was also ideal to filter, and format my tokens.
• Also at this stage I perform deduplication of the data (if a line appears more than once remove it, this is
there to help manage spammers.)
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Stemming Tokens
•Value Add of Stemming
• Stemming is a technique first proposed in the late 1960’s by Julie Beth Lovins but was finalized later by
Martin Porter whose algorithm has come to be the de-facto standard for stemming.
• Stemming is used to remove roots of words so you only have the root word. (e.g. moved -> move)
• In the context in this analysis you can compare root words of these stores.
•Implementation
• The newer version of the algorithm, Porter2 is readily available as a python package.
• In the context of this analysis I stemmed both my core food dataset and then the general firehose
corpus.
• For simplicity and kindness to my RAM I stored the stemmed output as a csv.
http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/dan-porters.pdf
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
TFIDF: Term Frequency inverse document frequency
•Value Add of TFIDF
• TFIDF simplified allows you to find the relative occurrence of a word in a series of documents (tweets),
and provides a simple way to compare it to the occurrence of other words.
•Implementation
• TFIDF is fit to a larger set of firehose data, in this case the firehose data is broken apart into tweet
documents about any and all thoughts a twitter user might be interested in.
• After creating a TFIDF model I then use the TFIDF object from the firehose data and compare it to the set
of data I have from the restaurants. The list of words with the highest score are considered the “most
important” in the context of food and restaurants.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
TFIDF: Term Frequency inverse document frequency
•Packages
• TFIDF is simple and easy to implement in Scikit Learn.
• I simply point my string document objects into this function, create a tokenizer relevant to my textfiles,
and then I simply run the function.
• The output is a dictionary of words and their TFIDF scores which need to be read into a tuple and sorted.
• I then create a filter based the sorted list by TFIDF score and use that to remove all non-relevant terms
from the food list which I will use as a feature.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Wordcount Matrix
•Implementation
1. First I create a dictionary object of all words (a default dict would work just as well.)
2. I then create a set of all words and compare that to a separate list of restaurants.
3. Based on the restaurant in the list I run a separate list to increment the word counters stored for each element.
4. Finally I take the wordcount and save it off as a csv that can be imported as a matrix.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
MDS/ NMDS / CA (PCA)
•What is MDS and why is it used for perceptual mapping.
• MDS uses matrix operations to compute the distances between elements and plot them while
maintaining the distance between all elements.
• While it is made to handle continuous variables in standard MDS if you have ordinal or comparison data
than going with a non-metric MDS solution is necessary. A nonmetric MDS gives you results as your data
elements compare to each other, rather than trying to solve for the total differences between them.
• Where MDS differs from PCA is that they have entirely different goals and are studied separately. While
the goal of PCA is dimensionality reduction in support of factor analysis the goal of MDS is to simplify the
visual inspection of elements and their relationships to other like elements.
• Another interesting way this analysis can be used is to find similarity between your measurements , for
instance if you are using MDS with demographic data you would probably see that minivan owners and
families have a very similar vector.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
MDS/ NMDS / CA (PCA)
•What is MDS and why is it used for perceptual mapping. (cont.)
• The easiest way to think about this is to use the concept of unidimensional scaling and apply it to a
multidimensional environment.
>
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Scikit Learn NMDS Plot
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
R MDS Plot
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
R NMDS (Vegan Package)
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
PCA Bi-plot
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Analysis of the R NMDS
•What we can determine with this analysis
1. Wendy’s and Burger King have favorable offerings for chicken
and bacon.
2. Taco bell doesn’t have a notable salad offering. McDonalds also
has a far distance from that term.
3. Price of Chipotle, Pizzahut, and McDonalds is frequently
references.
4. Tacobell owns the term “crunch”
•What to do next
1. Pull in additional data to find the relative profitability of these
firms and align them with our terms. If any “blue ocean” space
is seen, that could be a potential business opportunity.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Pain Points and Lessons Learned
• Ascii and Unicode conversion issues are a constant pain. It’s much easier to be overaggressive with casting, also
make sure that all modules and classes specify the type of text being used.
• For long calculations it is best to use pickle to checkpoint the work done and to make sure you have the processing
saved off.
• Sometimes R is the right call, particularly when it comes to plotting.
• The Scikit learn has almost all functions needed and it is easier to stay there as opposed to trying to find other best
of breed packages.
• LDA for topic modeling would be a great next step to reduce dimensionality.
Questions?

More Related Content

What's hot

Weikang Pharmaceuticals Co. Ltd.: Channel Management Dilemma
Weikang Pharmaceuticals Co. Ltd.: Channel Management DilemmaWeikang Pharmaceuticals Co. Ltd.: Channel Management Dilemma
Weikang Pharmaceuticals Co. Ltd.: Channel Management DilemmaRoma Kumari
 
Managing The Three Types of IT
Managing The Three Types of ITManaging The Three Types of IT
Managing The Three Types of ITallen.lin
 
Dell Computers (A) : Field Service for Corporate Clients
Dell Computers (A) : Field Service for Corporate Clients Dell Computers (A) : Field Service for Corporate Clients
Dell Computers (A) : Field Service for Corporate Clients Vijay Somu
 
Zenith (HDTV) Case Study by Dhiraj Agarwal
Zenith (HDTV) Case Study by Dhiraj AgarwalZenith (HDTV) Case Study by Dhiraj Agarwal
Zenith (HDTV) Case Study by Dhiraj AgarwalDhiraj Agarwal
 
PAPER BOAT- Presentation on Marketing Mix & Promotion Mix.
PAPER BOAT- Presentation on Marketing Mix & Promotion Mix.PAPER BOAT- Presentation on Marketing Mix & Promotion Mix.
PAPER BOAT- Presentation on Marketing Mix & Promotion Mix.Mimansha Bahadur
 
MIS - Final Presentation
MIS - Final PresentationMIS - Final Presentation
MIS - Final PresentationSampad Acharya
 
Goodyear: The Aquatred Launch : Harvard Case Analysis
Goodyear: The Aquatred Launch : Harvard Case AnalysisGoodyear: The Aquatred Launch : Harvard Case Analysis
Goodyear: The Aquatred Launch : Harvard Case AnalysisSameer Mathur
 
Corporate governance issues on satyam group 8
Corporate governance issues on satyam group 8Corporate governance issues on satyam group 8
Corporate governance issues on satyam group 8nitin688
 
Manzana insurance case study analysis.
Manzana insurance case study analysis.Manzana insurance case study analysis.
Manzana insurance case study analysis.Abanta Kumar Majumdar
 
Target data breach case study
Target data breach case studyTarget data breach case study
Target data breach case studyAbhilash vijayan
 
Satyam case study on Bsiness ethics and corporate governance
Satyam case study on Bsiness ethics and corporate governanceSatyam case study on Bsiness ethics and corporate governance
Satyam case study on Bsiness ethics and corporate governanceBhupendra Rawat
 
Propecia section b_group3
Propecia section b_group3Propecia section b_group3
Propecia section b_group3Prateek Goel
 
Himont Technology Licensing
Himont Technology LicensingHimont Technology Licensing
Himont Technology LicensingDeepa Shukla
 
PVR Ltd.
PVR Ltd.PVR Ltd.
PVR Ltd.Zil Shah
 

What's hot (20)

Weikang Pharmaceuticals Co. Ltd.: Channel Management Dilemma
Weikang Pharmaceuticals Co. Ltd.: Channel Management DilemmaWeikang Pharmaceuticals Co. Ltd.: Channel Management Dilemma
Weikang Pharmaceuticals Co. Ltd.: Channel Management Dilemma
 
Managing The Three Types of IT
Managing The Three Types of ITManaging The Three Types of IT
Managing The Three Types of IT
 
Harrahs
HarrahsHarrahs
Harrahs
 
Dell Computers (A) : Field Service for Corporate Clients
Dell Computers (A) : Field Service for Corporate Clients Dell Computers (A) : Field Service for Corporate Clients
Dell Computers (A) : Field Service for Corporate Clients
 
Zenith (HDTV) Case Study by Dhiraj Agarwal
Zenith (HDTV) Case Study by Dhiraj AgarwalZenith (HDTV) Case Study by Dhiraj Agarwal
Zenith (HDTV) Case Study by Dhiraj Agarwal
 
CavinKare Project
CavinKare ProjectCavinKare Project
CavinKare Project
 
Daimler-Chrysler case study
Daimler-Chrysler case study Daimler-Chrysler case study
Daimler-Chrysler case study
 
PAPER BOAT- Presentation on Marketing Mix & Promotion Mix.
PAPER BOAT- Presentation on Marketing Mix & Promotion Mix.PAPER BOAT- Presentation on Marketing Mix & Promotion Mix.
PAPER BOAT- Presentation on Marketing Mix & Promotion Mix.
 
MIS - Final Presentation
MIS - Final PresentationMIS - Final Presentation
MIS - Final Presentation
 
Dell Case Study
Dell Case StudyDell Case Study
Dell Case Study
 
Goodyear: The Aquatred Launch : Harvard Case Analysis
Goodyear: The Aquatred Launch : Harvard Case AnalysisGoodyear: The Aquatred Launch : Harvard Case Analysis
Goodyear: The Aquatred Launch : Harvard Case Analysis
 
Corporate governance issues on satyam group 8
Corporate governance issues on satyam group 8Corporate governance issues on satyam group 8
Corporate governance issues on satyam group 8
 
Manzana insurance case study analysis.
Manzana insurance case study analysis.Manzana insurance case study analysis.
Manzana insurance case study analysis.
 
Netflix Analysis
Netflix Analysis Netflix Analysis
Netflix Analysis
 
Target data breach case study
Target data breach case studyTarget data breach case study
Target data breach case study
 
Session 5_v1.3.pptx
Session 5_v1.3.pptxSession 5_v1.3.pptx
Session 5_v1.3.pptx
 
Satyam case study on Bsiness ethics and corporate governance
Satyam case study on Bsiness ethics and corporate governanceSatyam case study on Bsiness ethics and corporate governance
Satyam case study on Bsiness ethics and corporate governance
 
Propecia section b_group3
Propecia section b_group3Propecia section b_group3
Propecia section b_group3
 
Himont Technology Licensing
Himont Technology LicensingHimont Technology Licensing
Himont Technology Licensing
 
PVR Ltd.
PVR Ltd.PVR Ltd.
PVR Ltd.
 

Viewers also liked

Marketing Research - Perceptual Map
Marketing Research - Perceptual MapMarketing Research - Perceptual Map
Marketing Research - Perceptual MapMinha Hwang
 
Nexmo_CaseStudy_klm_DIGITAL
Nexmo_CaseStudy_klm_DIGITALNexmo_CaseStudy_klm_DIGITAL
Nexmo_CaseStudy_klm_DIGITALAmanda Francoeur
 
Advert deconstruction 1
Advert deconstruction 1Advert deconstruction 1
Advert deconstruction 1georgewyse
 
Lesson 4 Deconstruct An Ad
Lesson 4 Deconstruct An AdLesson 4 Deconstruct An Ad
Lesson 4 Deconstruct An AdRenee Hobbs
 
Marketing analytics i
Marketing analytics iMarketing analytics i
Marketing analytics iMimi Nguyen
 
SMS Can do What?
SMS Can do What?SMS Can do What?
SMS Can do What?Sam Machin
 
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...Alan Quayle
 
Block Project - THE REAL Finished Draft
Block Project - THE REAL Finished Draft Block Project - THE REAL Finished Draft
Block Project - THE REAL Finished Draft Abram Edgar
 
Kellogg Strategic Audit Version 1
Kellogg Strategic Audit   Version 1Kellogg Strategic Audit   Version 1
Kellogg Strategic Audit Version 1Luis Terron
 
Breakfast cereal industry final presentation
Breakfast cereal industry final presentationBreakfast cereal industry final presentation
Breakfast cereal industry final presentationDicky Cahanaya
 
Facebook Brand Analysis - Strategic Brand Management
Facebook Brand Analysis - Strategic Brand ManagementFacebook Brand Analysis - Strategic Brand Management
Facebook Brand Analysis - Strategic Brand ManagementNuwan Ireshinie
 
State of the Word 2011
State of the Word 2011State of the Word 2011
State of the Word 2011photomatt
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 

Viewers also liked (14)

Marketing Research - Perceptual Map
Marketing Research - Perceptual MapMarketing Research - Perceptual Map
Marketing Research - Perceptual Map
 
Nexmo_CaseStudy_klm_DIGITAL
Nexmo_CaseStudy_klm_DIGITALNexmo_CaseStudy_klm_DIGITAL
Nexmo_CaseStudy_klm_DIGITAL
 
Advert deconstruction 1
Advert deconstruction 1Advert deconstruction 1
Advert deconstruction 1
 
Lesson 4 Deconstruct An Ad
Lesson 4 Deconstruct An AdLesson 4 Deconstruct An Ad
Lesson 4 Deconstruct An Ad
 
Marketing analytics i
Marketing analytics iMarketing analytics i
Marketing analytics i
 
SMS Can do What?
SMS Can do What?SMS Can do What?
SMS Can do What?
 
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
 
perceptual mapping
perceptual mappingperceptual mapping
perceptual mapping
 
Block Project - THE REAL Finished Draft
Block Project - THE REAL Finished Draft Block Project - THE REAL Finished Draft
Block Project - THE REAL Finished Draft
 
Kellogg Strategic Audit Version 1
Kellogg Strategic Audit   Version 1Kellogg Strategic Audit   Version 1
Kellogg Strategic Audit Version 1
 
Breakfast cereal industry final presentation
Breakfast cereal industry final presentationBreakfast cereal industry final presentation
Breakfast cereal industry final presentation
 
Facebook Brand Analysis - Strategic Brand Management
Facebook Brand Analysis - Strategic Brand ManagementFacebook Brand Analysis - Strategic Brand Management
Facebook Brand Analysis - Strategic Brand Management
 
State of the Word 2011
State of the Word 2011State of the Word 2011
State of the Word 2011
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Similar to Perceptual Mapping using Twitter Data

Lessons learned from over 25 Data Virtualization implementations
Lessons learned from over 25 Data Virtualization implementationsLessons learned from over 25 Data Virtualization implementations
Lessons learned from over 25 Data Virtualization implementationsDenodo
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation Sally Sadosky
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentTasktop
 
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaGraphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaNeo4j
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1Roger Barga
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph WebinarNeo4j
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfShristi Shrestha
 
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...IRJET Journal
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?Nicolas Georgeault
 
Part1
Part1Part1
Part1sumit621
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxshumPanwar
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

Similar to Perceptual Mapping using Twitter Data (20)

Lessons learned from over 25 Data Virtualization implementations
Lessons learned from over 25 Data Virtualization implementationsLessons learned from over 25 Data Virtualization implementations
Lessons learned from over 25 Data Virtualization implementations
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
 
Taming data lake - scalable metrics model
Taming data lake - scalable metrics modelTaming data lake - scalable metrics model
Taming data lake - scalable metrics model
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics Environment
 
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaGraphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
 
Data Mining
Data MiningData Mining
Data Mining
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph Webinar
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: Metadata
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdf
 
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?
 
Part1
Part1Part1
Part1
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Perceptual Mapping using Twitter Data

  • 1. Social Media Brand Positioning: Perceptual Mapping using Twitter Data David Gerson Gersondave@gmail.com
  • 2. Why do this? •It’s a known business standard for enabling stakeholders, clients, and decision makers to easily see and compare like and unalike elements.
  • 3. Problems •At their most numerical they typically are qualitative in nature. •Even more are designed using a “scorecard” approach. •If a scorecard is used it is limited to only a few points and a qualitative assessment to define the numeric measures. •Distances and positioning are defined using a “human element”.
  • 4. Perceptual Maps •Perceptual mapping is a diagrammatic technique used by asset marketers that attempts to visually display the perceptions of customers or potential customers. [wikipedia] •Perceptual maps enable you to find opportunities in the market for a new product or to identify potentially competitive products.
  • 5. Perceptual Maps •Perceptual mapping is a diagrammatic technique used by asset marketers that attempts to visually display the perceptions of customers or potential customers. [wikipedia] •Perceptual maps enable you to find opportunities in the market for a new product or to identify potentially competitive products. Attributed to http://npdbook.com/
  • 6. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting The Current Process
  • 7. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Twitter Extraction •Implementation • Streaming Api was left online for 1 week pulling data a target group of fast food companies. This twitter data collected will be used to generate our feature set. • A second set of data, without filters is used to create a control set of data that will help us determine what words are key to food, and which words are most important in the context of fast food
  • 8. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Twitter Extraction •Packages: • Twython library provides an easily accessible API wrapper which can be used for the twitter streaming API. • The Twython API allows a user to plug into twitter and access data •Limitations • The api didn’t seamlessly handle some necessary parameters to filter twitter data (ie. language) • The classic unicode/asci conversion problems are rife in the twitter dataset. • The Firehose API was deprecated while this project was being worked on. Without the ability to parse to the feed based on language and without firehose access I alternatively used the Swiftkey dataset.
  • 9. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Tokenizing Dataset •Implementation • Tokenizing is simple, there are a few ways to do it but the easiest is to split the data. First by column “n” and then by an empty string. • At the tokenization stage it was also ideal to filter, and format my tokens. • Also at this stage I perform deduplication of the data (if a line appears more than once remove it, this is there to help manage spammers.)
  • 10. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Stemming Tokens •Value Add of Stemming • Stemming is a technique first proposed in the late 1960’s by Julie Beth Lovins but was finalized later by Martin Porter whose algorithm has come to be the de-facto standard for stemming. • Stemming is used to remove roots of words so you only have the root word. (e.g. moved -> move) • In the context in this analysis you can compare root words of these stores. •Implementation • The newer version of the algorithm, Porter2 is readily available as a python package. • In the context of this analysis I stemmed both my core food dataset and then the general firehose corpus. • For simplicity and kindness to my RAM I stored the stemmed output as a csv. http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/dan-porters.pdf
  • 11. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting TFIDF: Term Frequency inverse document frequency •Value Add of TFIDF • TFIDF simplified allows you to find the relative occurrence of a word in a series of documents (tweets), and provides a simple way to compare it to the occurrence of other words. •Implementation • TFIDF is fit to a larger set of firehose data, in this case the firehose data is broken apart into tweet documents about any and all thoughts a twitter user might be interested in. • After creating a TFIDF model I then use the TFIDF object from the firehose data and compare it to the set of data I have from the restaurants. The list of words with the highest score are considered the “most important” in the context of food and restaurants.
  • 12. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting TFIDF: Term Frequency inverse document frequency •Packages • TFIDF is simple and easy to implement in Scikit Learn. • I simply point my string document objects into this function, create a tokenizer relevant to my textfiles, and then I simply run the function. • The output is a dictionary of words and their TFIDF scores which need to be read into a tuple and sorted. • I then create a filter based the sorted list by TFIDF score and use that to remove all non-relevant terms from the food list which I will use as a feature.
  • 13. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Wordcount Matrix •Implementation 1. First I create a dictionary object of all words (a default dict would work just as well.) 2. I then create a set of all words and compare that to a separate list of restaurants. 3. Based on the restaurant in the list I run a separate list to increment the word counters stored for each element. 4. Finally I take the wordcount and save it off as a csv that can be imported as a matrix.
  • 14. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting MDS/ NMDS / CA (PCA) •What is MDS and why is it used for perceptual mapping. • MDS uses matrix operations to compute the distances between elements and plot them while maintaining the distance between all elements. • While it is made to handle continuous variables in standard MDS if you have ordinal or comparison data than going with a non-metric MDS solution is necessary. A nonmetric MDS gives you results as your data elements compare to each other, rather than trying to solve for the total differences between them. • Where MDS differs from PCA is that they have entirely different goals and are studied separately. While the goal of PCA is dimensionality reduction in support of factor analysis the goal of MDS is to simplify the visual inspection of elements and their relationships to other like elements. • Another interesting way this analysis can be used is to find similarity between your measurements , for instance if you are using MDS with demographic data you would probably see that minivan owners and families have a very similar vector.
  • 15. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting MDS/ NMDS / CA (PCA) •What is MDS and why is it used for perceptual mapping. (cont.) • The easiest way to think about this is to use the concept of unidimensional scaling and apply it to a multidimensional environment. >
  • 16. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Scikit Learn NMDS Plot
  • 17. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting R MDS Plot
  • 18. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting R NMDS (Vegan Package)
  • 19. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting PCA Bi-plot
  • 20. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Analysis of the R NMDS •What we can determine with this analysis 1. Wendy’s and Burger King have favorable offerings for chicken and bacon. 2. Taco bell doesn’t have a notable salad offering. McDonalds also has a far distance from that term. 3. Price of Chipotle, Pizzahut, and McDonalds is frequently references. 4. Tacobell owns the term “crunch” •What to do next 1. Pull in additional data to find the relative profitability of these firms and align them with our terms. If any “blue ocean” space is seen, that could be a potential business opportunity.
  • 21. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Pain Points and Lessons Learned • Ascii and Unicode conversion issues are a constant pain. It’s much easier to be overaggressive with casting, also make sure that all modules and classes specify the type of text being used. • For long calculations it is best to use pickle to checkpoint the work done and to make sure you have the processing saved off. • Sometimes R is the right call, particularly when it comes to plotting. • The Scikit learn has almost all functions needed and it is easier to stay there as opposed to trying to find other best of breed packages. • LDA for topic modeling would be a great next step to reduce dimensionality.