SlideShare a Scribd company logo
1 of 22
Download to read offline
Social Media Brand
Positioning:
Perceptual Mapping using Twitter Data
David Gerson
Gersondave@gmail.com
Why do this?
•It’s a known business standard for
enabling stakeholders, clients, and
decision makers to easily see and
compare like and unalike elements.
Problems
•At their most numerical they typically are
qualitative in nature.
•Even more are designed using a
“scorecard” approach.
•If a scorecard is used it is limited to only a
few points and a qualitative assessment to
define the numeric measures.
•Distances and positioning are defined
using a “human element”.
Perceptual Maps
•Perceptual mapping is
a diagrammatic technique used by
asset marketers that attempts to visually
display the perceptions of customers or
potential customers. [wikipedia]
•Perceptual maps enable you to find
opportunities in the market for a new product
or to identify potentially competitive products.
Perceptual Maps
•Perceptual mapping is
a diagrammatic technique used by
asset marketers that attempts to visually
display the perceptions of customers or
potential customers. [wikipedia]
•Perceptual maps enable you to find
opportunities in the market for a new product
or to identify potentially competitive products.
Attributed to http://npdbook.com/
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
The Current Process
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Twitter Extraction
•Implementation
• Streaming Api was left online for 1 week pulling data a target group of fast food companies. This twitter
data collected will be used to generate our feature set.
• A second set of data, without filters is used to create a control set of data that will help us determine
what words are key to food, and which words are most important in the context of fast food
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Twitter Extraction
•Packages:
• Twython library provides an easily accessible API wrapper which can be used for the twitter streaming
API.
• The Twython API allows a user to plug into twitter and access data
•Limitations
• The api didn’t seamlessly handle some necessary parameters to filter twitter data (ie. language)
• The classic unicode/asci conversion problems are rife in the twitter dataset.
• The Firehose API was deprecated while this project was being worked on. Without the ability to parse to
the feed based on language and without firehose access I alternatively used the Swiftkey dataset.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Tokenizing Dataset
•Implementation
• Tokenizing is simple, there are a few ways to do it but the easiest is to split the data. First by column “n”
and then by an empty string.
• At the tokenization stage it was also ideal to filter, and format my tokens.
• Also at this stage I perform deduplication of the data (if a line appears more than once remove it, this is
there to help manage spammers.)
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Stemming Tokens
•Value Add of Stemming
• Stemming is a technique first proposed in the late 1960’s by Julie Beth Lovins but was finalized later by
Martin Porter whose algorithm has come to be the de-facto standard for stemming.
• Stemming is used to remove roots of words so you only have the root word. (e.g. moved -> move)
• In the context in this analysis you can compare root words of these stores.
•Implementation
• The newer version of the algorithm, Porter2 is readily available as a python package.
• In the context of this analysis I stemmed both my core food dataset and then the general firehose
corpus.
• For simplicity and kindness to my RAM I stored the stemmed output as a csv.
http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/dan-porters.pdf
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
TFIDF: Term Frequency inverse document frequency
•Value Add of TFIDF
• TFIDF simplified allows you to find the relative occurrence of a word in a series of documents (tweets),
and provides a simple way to compare it to the occurrence of other words.
•Implementation
• TFIDF is fit to a larger set of firehose data, in this case the firehose data is broken apart into tweet
documents about any and all thoughts a twitter user might be interested in.
• After creating a TFIDF model I then use the TFIDF object from the firehose data and compare it to the set
of data I have from the restaurants. The list of words with the highest score are considered the “most
important” in the context of food and restaurants.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
TFIDF: Term Frequency inverse document frequency
•Packages
• TFIDF is simple and easy to implement in Scikit Learn.
• I simply point my string document objects into this function, create a tokenizer relevant to my textfiles,
and then I simply run the function.
• The output is a dictionary of words and their TFIDF scores which need to be read into a tuple and sorted.
• I then create a filter based the sorted list by TFIDF score and use that to remove all non-relevant terms
from the food list which I will use as a feature.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Wordcount Matrix
•Implementation
1. First I create a dictionary object of all words (a default dict would work just as well.)
2. I then create a set of all words and compare that to a separate list of restaurants.
3. Based on the restaurant in the list I run a separate list to increment the word counters stored for each element.
4. Finally I take the wordcount and save it off as a csv that can be imported as a matrix.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
MDS/ NMDS / CA (PCA)
•What is MDS and why is it used for perceptual mapping.
• MDS uses matrix operations to compute the distances between elements and plot them while
maintaining the distance between all elements.
• While it is made to handle continuous variables in standard MDS if you have ordinal or comparison data
than going with a non-metric MDS solution is necessary. A nonmetric MDS gives you results as your data
elements compare to each other, rather than trying to solve for the total differences between them.
• Where MDS differs from PCA is that they have entirely different goals and are studied separately. While
the goal of PCA is dimensionality reduction in support of factor analysis the goal of MDS is to simplify the
visual inspection of elements and their relationships to other like elements.
• Another interesting way this analysis can be used is to find similarity between your measurements , for
instance if you are using MDS with demographic data you would probably see that minivan owners and
families have a very similar vector.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
MDS/ NMDS / CA (PCA)
•What is MDS and why is it used for perceptual mapping. (cont.)
• The easiest way to think about this is to use the concept of unidimensional scaling and apply it to a
multidimensional environment.
>
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Scikit Learn NMDS Plot
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
R MDS Plot
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
R NMDS (Vegan Package)
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
PCA Bi-plot
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Analysis of the R NMDS
•What we can determine with this analysis
1. Wendy’s and Burger King have favorable offerings for chicken
and bacon.
2. Taco bell doesn’t have a notable salad offering. McDonalds also
has a far distance from that term.
3. Price of Chipotle, Pizzahut, and McDonalds is frequently
references.
4. Tacobell owns the term “crunch”
•What to do next
1. Pull in additional data to find the relative profitability of these
firms and align them with our terms. If any “blue ocean” space
is seen, that could be a potential business opportunity.
Twitter Extraction Tokenizing Stemming
TFIDF
Stopwords
Word Count
Matrix
MDS Plotting
Pain Points and Lessons Learned
• Ascii and Unicode conversion issues are a constant pain. It’s much easier to be overaggressive with casting, also
make sure that all modules and classes specify the type of text being used.
• For long calculations it is best to use pickle to checkpoint the work done and to make sure you have the processing
saved off.
• Sometimes R is the right call, particularly when it comes to plotting.
• The Scikit learn has almost all functions needed and it is easier to stay there as opposed to trying to find other best
of breed packages.
• LDA for topic modeling would be a great next step to reduce dimensionality.
Questions?

More Related Content

What's hot

Aula 2 notícia, noticiabilidade
Aula 2   notícia, noticiabilidadeAula 2   notícia, noticiabilidade
Aula 2 notícia, noticiabilidadeEdelberto Behs
 
Apostilaredacaopublicitaria08032005170707 091201205511-phpapp02
Apostilaredacaopublicitaria08032005170707 091201205511-phpapp02Apostilaredacaopublicitaria08032005170707 091201205511-phpapp02
Apostilaredacaopublicitaria08032005170707 091201205511-phpapp02Blogotipos - Diário das Marcas
 
Revisão de jornalismo digital
Revisão de jornalismo digitalRevisão de jornalismo digital
Revisão de jornalismo digitalIuri Lammel
 
Pet português instrumental e e geraldino r cunha
Pet português instrumental e e geraldino r cunhaPet português instrumental e e geraldino r cunha
Pet português instrumental e e geraldino r cunhaMariaLusadeJesusRodo1
 
Introdução à Publicidade - Aula 06 - Criação
Introdução à Publicidade - Aula 06 - CriaçãoIntrodução à Publicidade - Aula 06 - Criação
Introdução à Publicidade - Aula 06 - CriaçãoThiago Ianatoni
 
Marketing e publicidade e propaganda
Marketing e publicidade e propagandaMarketing e publicidade e propaganda
Marketing e publicidade e propagandaCiro Gusatti
 
Repertório para redação
Repertório para redaçãoRepertório para redação
Repertório para redaçãolipexleal
 
Plano de Marketing Cetaphil - Galderma
Plano de Marketing Cetaphil - GaldermaPlano de Marketing Cetaphil - Galderma
Plano de Marketing Cetaphil - GaldermaNatália Viana
 
Livro Grandes Marcas Grandes Negócios
Livro Grandes Marcas Grandes NegóciosLivro Grandes Marcas Grandes Negócios
Livro Grandes Marcas Grandes NegóciosBeto Lima Branding
 
Convergência de mídias e narrativa transmídia
Convergência de mídias e narrativa transmídiaConvergência de mídias e narrativa transmídia
Convergência de mídias e narrativa transmídiaAlysson Lisboa
 
Analise swot McDonald´s
Analise swot McDonald´sAnalise swot McDonald´s
Analise swot McDonald´sXxdivisionxX69
 
Aula 2 - Funcoes de Linguagem.pdf
Aula 2 - Funcoes de Linguagem.pdfAula 2 - Funcoes de Linguagem.pdf
Aula 2 - Funcoes de Linguagem.pdfProfessoraCatia1
 
Características de Meios e Veículos de Comunicação
Características de Meios e Veículos de ComunicaçãoCaracterísticas de Meios e Veículos de Comunicação
Características de Meios e Veículos de Comunicaçãokalledonian
 
Aulas 3 e 4 práticas de negociação
Aulas 3 e 4   práticas de negociaçãoAulas 3 e 4   práticas de negociação
Aulas 3 e 4 práticas de negociaçãoMKTMAIS
 
Propaganda Institucional
Propaganda InstitucionalPropaganda Institucional
Propaganda InstitucionalDaiane Lins
 
Mobile: entendendo o comportamento do consumidor
Mobile: entendendo o comportamento do consumidorMobile: entendendo o comportamento do consumidor
Mobile: entendendo o comportamento do consumidorClaudio Alberto Hassan
 

What's hot (20)

Aula 2 notícia, noticiabilidade
Aula 2   notícia, noticiabilidadeAula 2   notícia, noticiabilidade
Aula 2 notícia, noticiabilidade
 
Apostilaredacaopublicitaria08032005170707 091201205511-phpapp02
Apostilaredacaopublicitaria08032005170707 091201205511-phpapp02Apostilaredacaopublicitaria08032005170707 091201205511-phpapp02
Apostilaredacaopublicitaria08032005170707 091201205511-phpapp02
 
Revisão de jornalismo digital
Revisão de jornalismo digitalRevisão de jornalismo digital
Revisão de jornalismo digital
 
Pet português instrumental e e geraldino r cunha
Pet português instrumental e e geraldino r cunhaPet português instrumental e e geraldino r cunha
Pet português instrumental e e geraldino r cunha
 
Introdução à Publicidade - Aula 06 - Criação
Introdução à Publicidade - Aula 06 - CriaçãoIntrodução à Publicidade - Aula 06 - Criação
Introdução à Publicidade - Aula 06 - Criação
 
Negociação
NegociaçãoNegociação
Negociação
 
Marketing e publicidade e propaganda
Marketing e publicidade e propagandaMarketing e publicidade e propaganda
Marketing e publicidade e propaganda
 
Repertório para redação
Repertório para redaçãoRepertório para redação
Repertório para redação
 
Plano de Marketing Cetaphil - Galderma
Plano de Marketing Cetaphil - GaldermaPlano de Marketing Cetaphil - Galderma
Plano de Marketing Cetaphil - Galderma
 
Livro Grandes Marcas Grandes Negócios
Livro Grandes Marcas Grandes NegóciosLivro Grandes Marcas Grandes Negócios
Livro Grandes Marcas Grandes Negócios
 
Groundswell
GroundswellGroundswell
Groundswell
 
O Processo de Comunicação
O Processo de ComunicaçãoO Processo de Comunicação
O Processo de Comunicação
 
Convergência de mídias e narrativa transmídia
Convergência de mídias e narrativa transmídiaConvergência de mídias e narrativa transmídia
Convergência de mídias e narrativa transmídia
 
Analise swot McDonald´s
Analise swot McDonald´sAnalise swot McDonald´s
Analise swot McDonald´s
 
Aula 2 - Funcoes de Linguagem.pdf
Aula 2 - Funcoes de Linguagem.pdfAula 2 - Funcoes de Linguagem.pdf
Aula 2 - Funcoes de Linguagem.pdf
 
Características de Meios e Veículos de Comunicação
Características de Meios e Veículos de ComunicaçãoCaracterísticas de Meios e Veículos de Comunicação
Características de Meios e Veículos de Comunicação
 
Aulas 3 e 4 práticas de negociação
Aulas 3 e 4   práticas de negociaçãoAulas 3 e 4   práticas de negociação
Aulas 3 e 4 práticas de negociação
 
Propaganda Institucional
Propaganda InstitucionalPropaganda Institucional
Propaganda Institucional
 
Publicidade e propaganda
Publicidade e propagandaPublicidade e propaganda
Publicidade e propaganda
 
Mobile: entendendo o comportamento do consumidor
Mobile: entendendo o comportamento do consumidorMobile: entendendo o comportamento do consumidor
Mobile: entendendo o comportamento do consumidor
 

Viewers also liked

Marketing Research - Perceptual Map
Marketing Research - Perceptual MapMarketing Research - Perceptual Map
Marketing Research - Perceptual MapMinha Hwang
 
Nexmo_CaseStudy_klm_DIGITAL
Nexmo_CaseStudy_klm_DIGITALNexmo_CaseStudy_klm_DIGITAL
Nexmo_CaseStudy_klm_DIGITALAmanda Francoeur
 
Advert deconstruction 1
Advert deconstruction 1Advert deconstruction 1
Advert deconstruction 1georgewyse
 
Lesson 4 Deconstruct An Ad
Lesson 4 Deconstruct An AdLesson 4 Deconstruct An Ad
Lesson 4 Deconstruct An AdRenee Hobbs
 
Marketing analytics i
Marketing analytics iMarketing analytics i
Marketing analytics iMimi Nguyen
 
SMS Can do What?
SMS Can do What?SMS Can do What?
SMS Can do What?Sam Machin
 
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...Alan Quayle
 
Block Project - THE REAL Finished Draft
Block Project - THE REAL Finished Draft Block Project - THE REAL Finished Draft
Block Project - THE REAL Finished Draft Abram Edgar
 
Kellogg Strategic Audit Version 1
Kellogg Strategic Audit   Version 1Kellogg Strategic Audit   Version 1
Kellogg Strategic Audit Version 1Luis Terron
 
Breakfast cereal industry final presentation
Breakfast cereal industry final presentationBreakfast cereal industry final presentation
Breakfast cereal industry final presentationDicky Cahanaya
 
Facebook Brand Analysis - Strategic Brand Management
Facebook Brand Analysis - Strategic Brand ManagementFacebook Brand Analysis - Strategic Brand Management
Facebook Brand Analysis - Strategic Brand ManagementNuwan Ireshinie
 
State of the Word 2011
State of the Word 2011State of the Word 2011
State of the Word 2011photomatt
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 

Viewers also liked (14)

Marketing Research - Perceptual Map
Marketing Research - Perceptual MapMarketing Research - Perceptual Map
Marketing Research - Perceptual Map
 
Nexmo_CaseStudy_klm_DIGITAL
Nexmo_CaseStudy_klm_DIGITALNexmo_CaseStudy_klm_DIGITAL
Nexmo_CaseStudy_klm_DIGITAL
 
Advert deconstruction 1
Advert deconstruction 1Advert deconstruction 1
Advert deconstruction 1
 
Lesson 4 Deconstruct An Ad
Lesson 4 Deconstruct An AdLesson 4 Deconstruct An Ad
Lesson 4 Deconstruct An Ad
 
Marketing analytics i
Marketing analytics iMarketing analytics i
Marketing analytics i
 
SMS Can do What?
SMS Can do What?SMS Can do What?
SMS Can do What?
 
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
Powering End User Experiences with Communication APIs Nexmo, Alex Economon TA...
 
perceptual mapping
perceptual mappingperceptual mapping
perceptual mapping
 
Block Project - THE REAL Finished Draft
Block Project - THE REAL Finished Draft Block Project - THE REAL Finished Draft
Block Project - THE REAL Finished Draft
 
Kellogg Strategic Audit Version 1
Kellogg Strategic Audit   Version 1Kellogg Strategic Audit   Version 1
Kellogg Strategic Audit Version 1
 
Breakfast cereal industry final presentation
Breakfast cereal industry final presentationBreakfast cereal industry final presentation
Breakfast cereal industry final presentation
 
Facebook Brand Analysis - Strategic Brand Management
Facebook Brand Analysis - Strategic Brand ManagementFacebook Brand Analysis - Strategic Brand Management
Facebook Brand Analysis - Strategic Brand Management
 
State of the Word 2011
State of the Word 2011State of the Word 2011
State of the Word 2011
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Similar to Perceptual Mapping using Twitter Data

Lessons learned from over 25 Data Virtualization implementations
Lessons learned from over 25 Data Virtualization implementationsLessons learned from over 25 Data Virtualization implementations
Lessons learned from over 25 Data Virtualization implementationsDenodo
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation Sally Sadosky
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentTasktop
 
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaGraphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaNeo4j
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1Roger Barga
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph WebinarNeo4j
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfShristi Shrestha
 
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...IRJET Journal
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?Nicolas Georgeault
 
Kudler Dimensional Model Hands-On-Project Essay
Kudler Dimensional Model Hands-On-Project EssayKudler Dimensional Model Hands-On-Project Essay
Kudler Dimensional Model Hands-On-Project EssayMonica Carter
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxshumPanwar
 

Similar to Perceptual Mapping using Twitter Data (20)

Lessons learned from over 25 Data Virtualization implementations
Lessons learned from over 25 Data Virtualization implementationsLessons learned from over 25 Data Virtualization implementations
Lessons learned from over 25 Data Virtualization implementations
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
 
Taming data lake - scalable metrics model
Taming data lake - scalable metrics modelTaming data lake - scalable metrics model
Taming data lake - scalable metrics model
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
Doing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics EnvironmentDoing Analytics Right - Building the Analytics Environment
Doing Analytics Right - Building the Analytics Environment
 
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaGraphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
 
Data Mining
Data MiningData Mining
Data Mining
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph Webinar
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: Metadata
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdf
 
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
Estimating the Efficacy of Efficient Machine Learning Classifiers for Twitter...
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?
 
Part1
Part1Part1
Part1
 
Kudler Dimensional Model Hands-On-Project Essay
Kudler Dimensional Model Hands-On-Project EssayKudler Dimensional Model Hands-On-Project Essay
Kudler Dimensional Model Hands-On-Project Essay
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 

Recently uploaded

Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1DianaGray10
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 

Recently uploaded (20)

Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 

Perceptual Mapping using Twitter Data

  • 1. Social Media Brand Positioning: Perceptual Mapping using Twitter Data David Gerson Gersondave@gmail.com
  • 2. Why do this? •It’s a known business standard for enabling stakeholders, clients, and decision makers to easily see and compare like and unalike elements.
  • 3. Problems •At their most numerical they typically are qualitative in nature. •Even more are designed using a “scorecard” approach. •If a scorecard is used it is limited to only a few points and a qualitative assessment to define the numeric measures. •Distances and positioning are defined using a “human element”.
  • 4. Perceptual Maps •Perceptual mapping is a diagrammatic technique used by asset marketers that attempts to visually display the perceptions of customers or potential customers. [wikipedia] •Perceptual maps enable you to find opportunities in the market for a new product or to identify potentially competitive products.
  • 5. Perceptual Maps •Perceptual mapping is a diagrammatic technique used by asset marketers that attempts to visually display the perceptions of customers or potential customers. [wikipedia] •Perceptual maps enable you to find opportunities in the market for a new product or to identify potentially competitive products. Attributed to http://npdbook.com/
  • 6. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting The Current Process
  • 7. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Twitter Extraction •Implementation • Streaming Api was left online for 1 week pulling data a target group of fast food companies. This twitter data collected will be used to generate our feature set. • A second set of data, without filters is used to create a control set of data that will help us determine what words are key to food, and which words are most important in the context of fast food
  • 8. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Twitter Extraction •Packages: • Twython library provides an easily accessible API wrapper which can be used for the twitter streaming API. • The Twython API allows a user to plug into twitter and access data •Limitations • The api didn’t seamlessly handle some necessary parameters to filter twitter data (ie. language) • The classic unicode/asci conversion problems are rife in the twitter dataset. • The Firehose API was deprecated while this project was being worked on. Without the ability to parse to the feed based on language and without firehose access I alternatively used the Swiftkey dataset.
  • 9. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Tokenizing Dataset •Implementation • Tokenizing is simple, there are a few ways to do it but the easiest is to split the data. First by column “n” and then by an empty string. • At the tokenization stage it was also ideal to filter, and format my tokens. • Also at this stage I perform deduplication of the data (if a line appears more than once remove it, this is there to help manage spammers.)
  • 10. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Stemming Tokens •Value Add of Stemming • Stemming is a technique first proposed in the late 1960’s by Julie Beth Lovins but was finalized later by Martin Porter whose algorithm has come to be the de-facto standard for stemming. • Stemming is used to remove roots of words so you only have the root word. (e.g. moved -> move) • In the context in this analysis you can compare root words of these stores. •Implementation • The newer version of the algorithm, Porter2 is readily available as a python package. • In the context of this analysis I stemmed both my core food dataset and then the general firehose corpus. • For simplicity and kindness to my RAM I stored the stemmed output as a csv. http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/dan-porters.pdf
  • 11. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting TFIDF: Term Frequency inverse document frequency •Value Add of TFIDF • TFIDF simplified allows you to find the relative occurrence of a word in a series of documents (tweets), and provides a simple way to compare it to the occurrence of other words. •Implementation • TFIDF is fit to a larger set of firehose data, in this case the firehose data is broken apart into tweet documents about any and all thoughts a twitter user might be interested in. • After creating a TFIDF model I then use the TFIDF object from the firehose data and compare it to the set of data I have from the restaurants. The list of words with the highest score are considered the “most important” in the context of food and restaurants.
  • 12. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting TFIDF: Term Frequency inverse document frequency •Packages • TFIDF is simple and easy to implement in Scikit Learn. • I simply point my string document objects into this function, create a tokenizer relevant to my textfiles, and then I simply run the function. • The output is a dictionary of words and their TFIDF scores which need to be read into a tuple and sorted. • I then create a filter based the sorted list by TFIDF score and use that to remove all non-relevant terms from the food list which I will use as a feature.
  • 13. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Wordcount Matrix •Implementation 1. First I create a dictionary object of all words (a default dict would work just as well.) 2. I then create a set of all words and compare that to a separate list of restaurants. 3. Based on the restaurant in the list I run a separate list to increment the word counters stored for each element. 4. Finally I take the wordcount and save it off as a csv that can be imported as a matrix.
  • 14. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting MDS/ NMDS / CA (PCA) •What is MDS and why is it used for perceptual mapping. • MDS uses matrix operations to compute the distances between elements and plot them while maintaining the distance between all elements. • While it is made to handle continuous variables in standard MDS if you have ordinal or comparison data than going with a non-metric MDS solution is necessary. A nonmetric MDS gives you results as your data elements compare to each other, rather than trying to solve for the total differences between them. • Where MDS differs from PCA is that they have entirely different goals and are studied separately. While the goal of PCA is dimensionality reduction in support of factor analysis the goal of MDS is to simplify the visual inspection of elements and their relationships to other like elements. • Another interesting way this analysis can be used is to find similarity between your measurements , for instance if you are using MDS with demographic data you would probably see that minivan owners and families have a very similar vector.
  • 15. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting MDS/ NMDS / CA (PCA) •What is MDS and why is it used for perceptual mapping. (cont.) • The easiest way to think about this is to use the concept of unidimensional scaling and apply it to a multidimensional environment. >
  • 16. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Scikit Learn NMDS Plot
  • 17. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting R MDS Plot
  • 18. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting R NMDS (Vegan Package)
  • 19. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting PCA Bi-plot
  • 20. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Analysis of the R NMDS •What we can determine with this analysis 1. Wendy’s and Burger King have favorable offerings for chicken and bacon. 2. Taco bell doesn’t have a notable salad offering. McDonalds also has a far distance from that term. 3. Price of Chipotle, Pizzahut, and McDonalds is frequently references. 4. Tacobell owns the term “crunch” •What to do next 1. Pull in additional data to find the relative profitability of these firms and align them with our terms. If any “blue ocean” space is seen, that could be a potential business opportunity.
  • 21. Twitter Extraction Tokenizing Stemming TFIDF Stopwords Word Count Matrix MDS Plotting Pain Points and Lessons Learned • Ascii and Unicode conversion issues are a constant pain. It’s much easier to be overaggressive with casting, also make sure that all modules and classes specify the type of text being used. • For long calculations it is best to use pickle to checkpoint the work done and to make sure you have the processing saved off. • Sometimes R is the right call, particularly when it comes to plotting. • The Scikit learn has almost all functions needed and it is easier to stay there as opposed to trying to find other best of breed packages. • LDA for topic modeling would be a great next step to reduce dimensionality.