RepèRes Bayesia Consumer Segmentation Skim Conf08

Bayesian Networks : a new tool
for consumer segmentation
Skim Conference – Barcelona – May 28th 2008

Summary 2

Introduction to consumer segmentations

A brief overview of Bayesian Networks

Computing a segmentation with Bayesian Networks

Conclusion


Introduction to consumer segmentations 3




Conclusion


Why a segmentation ? 4

Valuable tool to understand a market

Homogeneous marketing targets
- people who behave the same way
- people who have homogeneous motivations / attitudes.

Groups of people to whom it is possible to speak the same language

Different marketing strategies
# Concepts
# Products
# Communication
# Advertising

MORE EFFICIENT


A good segmentation - some important features 5

Homogeneous segments
TECHNICAL
QUALITY Clear differences between segments

Stable…

AND OTHER Easy to understand
VERY
IMPORTANT Operational / Actionable
ELEMENTS
Fair representation of the real world

Preparation Statistical Interpretation Output
stage procedure / Analysis

Only a part of the whole process.
How important is it ?

The marketer’s dream…and cruel reality 6

Obvious groups ! More complicated

Any kind of computation should Unlimited number of typologies
lead to the same results
Procedure should guarantee a
relevant clustering


Classical procedures 7

A factorial analysis followed by a clustering of the individuals

Canonical segmentation

ATTITUDES
ATTITUDES BEHAVIOURS
BEHAVIOURS

CANONICAL ANALYSIS
CANONICAL ANALYSIS

Projection of the individuals on the factorial axis
Projection of the individuals on the factorial axis

Clustering of the individuals
Clustering of the individuals

Drawbacks : Difficult to choose what are the attitudes / what are the
behaviours (declarative statements) – Time consuming.


A brief overview of Bayesian Networks 8




Conclusion


Bayesian Networks 9

A computational Tool to Model Uncertainty
based both on graphs theory
readability – Powerful communication tool

and probability theory
sound computations

Manual modelling through brainstorming
Probabilistic Expert Systems

Induction by automatic learning
Data analysis, data mining

Growing popularity
Industry, Defense, Health, …and now, Market Research


A complete framework for Data Mining 10

Parametric estimation
Use of the database to estimate the probabilities of a given structure

Robust Missing values processing
Expectation-Maximization (EM)
Structural EM

Structural learning
Unsupervised learning to discover all the direct probabilistic relations
Supervised learning to characterize a target variable
Variable clustering to induce “factors” made of highly connected variables
Probabilistic Structural Equations

and… Data Clustering to find groups of data sharing the same characteristics


Formalism : 2 distinctive parts 11

Structure
Directed acyclic graphs
Example: Anti-doping
Parameters agency using two
Probability distributions associated to each node different tests to
screen competitors


A reasoning engine 1/3 12

Sound evidence propagation on the entire network
Simulation
Diagnosis
And any combination of these 2 types of inference



Simulation
Diagnosis

If a competitor is doped...

…there is 99.5% chance
that he is disqualified



Simulation
Diagnosis : thinking the other way round

… there is a slight
probability (8%)
that he is nevertheless
clean.

If a competitor has been
disqualified…


Segmentation with Bayesian Networks 15




Real case study: Segmentation of women as regards shopping and fashion
For confidentiality reasons, consumer statements and outputs have been modified.

Conclusion


1st Stage : segmentation induction 16


Unsupervised learning 17
Discovering relations between consumer statements

Usage and attitude survey conducted
for a clothes retailer.

Sample=1065 women.

234 consumer statements: attitudes
and behaviours towards fashion in
general, retailers, brand image…

Heuristic Search Algorithm to
find the best representation of
the joint probability distribution.

Minimum Description Length
score to evaluate the quality of
the network based on fitness
and compactness
Induced network


Variables clustering and factor induction 18
Simplifying the information

Analysis of the network to discover groups of variables that are
strongly connected and that form a “concept”
Ascendant Hierarchical Clustering algorithm based on the arcs’ Kullback Leibler forces
(non linear and global measure – contribution of the relation to the network).

For each cluster of variables
Creation of a latent variable summarizing the information.

42 factors computed

Example of factor 15 : dimension
summarizing originality.

Based on attitude statements
Latent variable
(importance to be original, like to
differentiate with clothes) and
behaviours (buy brands X, Y and Z
more often).


Factor clustering: overview of the procedure 19
Segmentation of the individuals based on the main factors

Introducing a new variable (consumer segments) which is the hidden
cause of the main factors.

Learning the probabilities with Expectation – Maximisation

Score derived from MDL to assess the quality of the clustering


Selecting the number of clusters 20

Pseudo random walk to find the best number of clusters
example: find the best clustering with random walk between 2 and 6 clusters
– 20 iterations

The best segmentation
is the one that
minimizes the score

Also possible to define the desired number of clusters

Possible to define the minimal purity of the clusters. The purity is
computed as the mean of the probability of each cluster point.


2nd stage : segmentation analysis 21


Supervised learning 22
Focusing on consumer clusters

LEARNING the relations between…
THE TARGET VARIABLE = SEGMENTATION
THE CONSTITUTIVE VARIABLES = CONSUMER STATEMENTS

Target Variable
= consumer segments


Cluster Profile 23
Using the network to describe the consumer groups

Identification of the key variables and associated values
For each consumer group, we use the % of shared information to sort the variables
according to their importance in the characterisation of the group.
4 most contributing variables

Compared with total sample,
women of cluster#5 :
for Cluster #5

- Buy brand X more often
- Are older women (59 in average)
- Do not consider originality as important
- Do not like discovering new shops

Arrows symbolize the change in the probability distribution
when observing cluster #5.


Generation of the cluster mapping 24

Map generation

The size of the cluster is proportional to its probability

The proximity of the clusters is a probabilistic proximity

The darkness of the blue is proportional to the purity of the cluster
(in this example all clusters have a purity > 95%)


Summarizing segmentation results 25

-- Money
devoted to
clothes

18% 10%

Fashion cheap
Functional before all
above all

20%
Age
Fashionable
Neutral
Classical originality

18%

Superstars
20%

8% 14%
Classical upmarket

Young manager
/ executive
++ Money women
devoted to
clothes


Going further : identifying a more compact target model 26

Markov procedure to select a subset of statements to determine to
which category consumers belong
Selection of a subset of variables…

…knowing the values of these variables makes the target independent of all the
other variables

Subset of 11 variables

Overall prediction score = 68%

Interesting to quickly recruit
consumer groups amongst the
total population.


Conclusion 27




Conclusion


Benefits 28

Our experience : a powerful tool
- Relevant typologies
- Easy to carry out

Modelling the consumer variables : good representation of reality
- Non-supervised modelling : no strong hypothesis
- Discovering interactions between variables (behaviours / attitudes)
- Use of qualitative / quantitative variables

Data clustering quality
- Possible to set the minimum purity of the clusters : enables the marketer to discover
“niche” markets (usually less pure) or focus on mainstream groups.

Added-value in the analysis of the clusters
- Easy ranking of the key variables for each consumer cluster
- Proximity mapping to summarize results

Development of robust models to identify consumer groups
- Interesting in the case of upcoming recruitment.


Some drawbacks. How to deal with them ? 29

Modelling the consumer network and computing latent variables can
be long when the number of variables is very important.
234 variables and 1065 lines: 30-40 minutes
To speed up the process, possible to learn a simplified network : e.g. maximum
spanning tree or increase of the structural complexity parameter.

Continuous variables have to be discretized
Results will depend on the quality of the discretization.
Possible to use K-Means to adapt discretization to the distribution of the data.
Expertise of the user also helps.

And most of the time in consumer research variables are discrete !


Perspectives 30

Flexibility : can be used far beyond usage and attitudes surveys
Easy to carry out

Can be adapted to any type of data
Well designed to process large amounts of data

Example: segmentation of trains using client’s internal data

Travelers' Data Train data (turnover, occupancy rate…)
10 Million individuals 15.000 trains Clustering of trains

In the future…
- typology of clients (turnover, potential…) to feed a business strategy
- segmentation of consumers based on utilities (CBC data)


Contact 31

Jouffe Lionel Craignou Fabien
Managing Director Data Mining Department Manager

jouffe@bayesia.com fcr@reperes.net


RepèRes Bayesia Consumer Segmentation Skim Conf08

Recommended

Recommended

More Related Content

Similar to RepèRes Bayesia Consumer Segmentation Skim Conf08

Similar to RepèRes Bayesia Consumer Segmentation Skim Conf08 (20)

More from François Abiven

More from François Abiven (20)

Recently uploaded

Recently uploaded (20)

RepèRes Bayesia Consumer Segmentation Skim Conf08