Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
RepèRes Bayesia Consumer Segmentation Skim Conf08
1. Bayesian Networks : a new tool
for consumer segmentation
Skim Conference – Barcelona – May 28th 2008
2. Summary 2
Introduction to consumer segmentations
A brief overview of Bayesian Networks
Computing a segmentation with Bayesian Networks
Conclusion
Skim Conference – Barcelona – May 28th 2008
3. Introduction to consumer segmentations 3
Introduction to consumer segmentations
A brief overview of Bayesian Networks
Computing a segmentation with Bayesian Networks
Conclusion
Skim Conference – Barcelona – May 28th 2008
4. Why a segmentation ? 4
Valuable tool to understand a market
Homogeneous marketing targets
- people who behave the same way
- people who have homogeneous motivations / attitudes.
Groups of people to whom it is possible to speak the same language
Different marketing strategies
# Concepts
# Products
# Communication
# Advertising
MORE EFFICIENT
Skim Conference – Barcelona – May 28th 2008
5. A good segmentation - some important features 5
Homogeneous segments
TECHNICAL
QUALITY Clear differences between segments
Stable…
AND OTHER Easy to understand
VERY
IMPORTANT Operational / Actionable
ELEMENTS
Fair representation of the real world
Preparation Statistical Interpretation Output
stage procedure / Analysis
Only a part of the whole process.
How important is it ?
Skim Conference – Barcelona – May 28th 2008
6. The marketer’s dream…and cruel reality 6
Obvious groups ! More complicated
Any kind of computation should Unlimited number of typologies
lead to the same results
Procedure should guarantee a
relevant clustering
Skim Conference – Barcelona – May 28th 2008
7. Classical procedures 7
A factorial analysis followed by a clustering of the individuals
Canonical segmentation
ATTITUDES
ATTITUDES BEHAVIOURS
BEHAVIOURS
CANONICAL ANALYSIS
CANONICAL ANALYSIS
Projection of the individuals on the factorial axis
Projection of the individuals on the factorial axis
Clustering of the individuals
Clustering of the individuals
Drawbacks : Difficult to choose what are the attitudes / what are the
behaviours (declarative statements) – Time consuming.
Skim Conference – Barcelona – May 28th 2008
8. A brief overview of Bayesian Networks 8
Introduction to consumer segmentations
A brief overview of Bayesian Networks
Computing a segmentation with Bayesian Networks
Conclusion
Skim Conference – Barcelona – May 28th 2008
9. Bayesian Networks 9
A computational Tool to Model Uncertainty
based both on graphs theory
readability – Powerful communication tool
and probability theory
sound computations
Manual modelling through brainstorming
Probabilistic Expert Systems
Induction by automatic learning
Data analysis, data mining
Growing popularity
Industry, Defense, Health, …and now, Market Research
Skim Conference – Barcelona – May 28th 2008
10. A complete framework for Data Mining 10
Parametric estimation
Use of the database to estimate the probabilities of a given structure
Robust Missing values processing
Expectation-Maximization (EM)
Structural EM
Structural learning
Unsupervised learning to discover all the direct probabilistic relations
Supervised learning to characterize a target variable
Variable clustering to induce “factors” made of highly connected variables
Probabilistic Structural Equations
and… Data Clustering to find groups of data sharing the same characteristics
Skim Conference – Barcelona – May 28th 2008
11. Formalism : 2 distinctive parts 11
Structure
Directed acyclic graphs
Example: Anti-doping
Parameters agency using two
Probability distributions associated to each node different tests to
screen competitors
Skim Conference – Barcelona – May 28th 2008
12. A reasoning engine 1/3 12
Sound evidence propagation on the entire network
Simulation
Diagnosis
And any combination of these 2 types of inference
Skim Conference – Barcelona – May 28th 2008
13. A reasoning engine 2/3 13
Sound evidence propagation on the entire network
Simulation
Diagnosis
If a competitor is doped...
…there is 99.5% chance
that he is disqualified
Skim Conference – Barcelona – May 28th 2008
14. A reasoning engine 3/3 14
Sound evidence propagation on the entire network
Simulation
Diagnosis : thinking the other way round
… there is a slight
probability (8%)
that he is nevertheless
clean.
If a competitor has been
disqualified…
Skim Conference – Barcelona – May 28th 2008
15. Segmentation with Bayesian Networks 15
Introduction to consumer segmentations
A brief overview of Bayesian Networks
Computing a segmentation with Bayesian Networks
Real case study: Segmentation of women as regards shopping and fashion
For confidentiality reasons, consumer statements and outputs have been modified.
Conclusion
Skim Conference – Barcelona – May 28th 2008
16. 1st Stage : segmentation induction 16
Skim Conference – Barcelona – May 28th 2008
17. Unsupervised learning 17
Discovering relations between consumer statements
Usage and attitude survey conducted
for a clothes retailer.
Sample=1065 women.
234 consumer statements: attitudes
and behaviours towards fashion in
general, retailers, brand image…
Heuristic Search Algorithm to
find the best representation of
the joint probability distribution.
Minimum Description Length
score to evaluate the quality of
the network based on fitness
and compactness
Induced network
Skim Conference – Barcelona – May 28th 2008
18. Variables clustering and factor induction 18
Simplifying the information
Analysis of the network to discover groups of variables that are
strongly connected and that form a “concept”
Ascendant Hierarchical Clustering algorithm based on the arcs’ Kullback Leibler forces
(non linear and global measure – contribution of the relation to the network).
For each cluster of variables
Creation of a latent variable summarizing the information.
42 factors computed
Example of factor 15 : dimension
summarizing originality.
Based on attitude statements
Latent variable
(importance to be original, like to
differentiate with clothes) and
behaviours (buy brands X, Y and Z
more often).
Skim Conference – Barcelona – May 28th 2008
19. Factor clustering: overview of the procedure 19
Segmentation of the individuals based on the main factors
Introducing a new variable (consumer segments) which is the hidden
cause of the main factors.
Learning the probabilities with Expectation – Maximisation
Score derived from MDL to assess the quality of the clustering
Skim Conference – Barcelona – May 28th 2008
20. Selecting the number of clusters 20
Pseudo random walk to find the best number of clusters
example: find the best clustering with random walk between 2 and 6 clusters
– 20 iterations
The best segmentation
is the one that
minimizes the score
Also possible to define the desired number of clusters
Possible to define the minimal purity of the clusters. The purity is
computed as the mean of the probability of each cluster point.
Skim Conference – Barcelona – May 28th 2008
21. 2nd stage : segmentation analysis 21
Skim Conference – Barcelona – May 28th 2008
22. Supervised learning 22
Focusing on consumer clusters
LEARNING the relations between…
THE TARGET VARIABLE = SEGMENTATION
THE CONSTITUTIVE VARIABLES = CONSUMER STATEMENTS
Target Variable
= consumer segments
Skim Conference – Barcelona – May 28th 2008
23. Cluster Profile 23
Using the network to describe the consumer groups
Identification of the key variables and associated values
For each consumer group, we use the % of shared information to sort the variables
according to their importance in the characterisation of the group.
4 most contributing variables
Compared with total sample,
women of cluster#5 :
for Cluster #5
- Buy brand X more often
- Are older women (59 in average)
- Do not consider originality as important
- Do not like discovering new shops
Arrows symbolize the change in the probability distribution
when observing cluster #5.
Skim Conference – Barcelona – May 28th 2008
24. Generation of the cluster mapping 24
Map generation
The size of the cluster is proportional to its probability
The proximity of the clusters is a probabilistic proximity
The darkness of the blue is proportional to the purity of the cluster
(in this example all clusters have a purity > 95%)
Skim Conference – Barcelona – May 28th 2008
25. Summarizing segmentation results 25
-- Money
devoted to
clothes
18% 10%
Fashion cheap
Functional before all
above all
20%
Age
Fashionable
Neutral
Classical originality
18%
Superstars
20%
8% 14%
Classical upmarket
Young manager
/ executive
++ Money women
devoted to
clothes
Skim Conference – Barcelona – May 28th 2008
26. Going further : identifying a more compact target model 26
Markov procedure to select a subset of statements to determine to
which category consumers belong
Selection of a subset of variables…
…knowing the values of these variables makes the target independent of all the
other variables
Subset of 11 variables
Overall prediction score = 68%
Interesting to quickly recruit
consumer groups amongst the
total population.
Skim Conference – Barcelona – May 28th 2008
27. Conclusion 27
Introduction to consumer segmentations
A brief overview of Bayesian Networks
Computing a segmentation with Bayesian Networks
Conclusion
Skim Conference – Barcelona – May 28th 2008
28. Benefits 28
Our experience : a powerful tool
- Relevant typologies
- Easy to carry out
Modelling the consumer variables : good representation of reality
- Non-supervised modelling : no strong hypothesis
- Discovering interactions between variables (behaviours / attitudes)
- Use of qualitative / quantitative variables
Data clustering quality
- Possible to set the minimum purity of the clusters : enables the marketer to discover
“niche” markets (usually less pure) or focus on mainstream groups.
Added-value in the analysis of the clusters
- Easy ranking of the key variables for each consumer cluster
- Proximity mapping to summarize results
Development of robust models to identify consumer groups
- Interesting in the case of upcoming recruitment.
Skim Conference – Barcelona – May 28th 2008
29. Some drawbacks. How to deal with them ? 29
Modelling the consumer network and computing latent variables can
be long when the number of variables is very important.
234 variables and 1065 lines: 30-40 minutes
To speed up the process, possible to learn a simplified network : e.g. maximum
spanning tree or increase of the structural complexity parameter.
Continuous variables have to be discretized
Results will depend on the quality of the discretization.
Possible to use K-Means to adapt discretization to the distribution of the data.
Expertise of the user also helps.
And most of the time in consumer research variables are discrete !
Skim Conference – Barcelona – May 28th 2008
30. Perspectives 30
Flexibility : can be used far beyond usage and attitudes surveys
Easy to carry out
Can be adapted to any type of data
Well designed to process large amounts of data
Example: segmentation of trains using client’s internal data
Travelers' Data Train data (turnover, occupancy rate…)
10 Million individuals 15.000 trains Clustering of trains
In the future…
- typology of clients (turnover, potential…) to feed a business strategy
- segmentation of consumers based on utilities (CBC data)
Skim Conference – Barcelona – May 28th 2008
31. Contact 31
Jouffe Lionel Craignou Fabien
Managing Director Data Mining Department Manager
jouffe@bayesia.com fcr@reperes.net
Skim Conference – Barcelona – May 28th 2008