SlideShare a Scribd company logo
1 of 50
TIM BOCK PRESENTS
Examples in Q5 (beta)
If you have any questions, enter them into the Questions field.
Questions will be answered at the end. If we do not have time to get to
your question, we will email you.
We will email you a link to the slides and data.
Get a free one-month trial of Q from www.q-researchsoftware.com.
DIY Driver Analysis
Overview
• Objectives of (key) driver analysis
• Overview of techniques
• 13 assumptions that need to be checked when doing QA for driver analysis
(there are workarounds for most of these)
1: There are 15 or fewer predictors (if using Shapley)
2: The outcome variable is monotonically increasing
3: The outcome variable is numeric (if using Shapley)
4: The predictor variables are numeric or binary
5: People do not differ in their needs/wants (segmentation)
6: The causal model is plausible
7: There is no multicollinearity/correlations between predictors (if using GLMs)
8: There are no unexpected correlations between the predictors and the outcome variable
9: The signs of the importance scores are correct
10: The predictor variables have no missing values
11: There are no outliers/influential data points
12: There is no serial correlation (aka autocorrelation)
13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable) 2
The basic objective of (key) driver analysis
The basic objective: work out the relative importance of a series of predictor
variables in predicting an outcome variable. For example:
• NPS: comfort vs customer service vs price.
• Customer satisfaction: wait time vs staff friendliness vs comfort.
• Brand preference: modernity vs friendliness vs youthfulness.
What driver analysis is not: predictive analysis (e.g., predicting sales, customer
churn). Although, you can use driver analysis to make strategic predictions (e.g., if I
improve, say, fun, then preference will increase.)
3
Likelihood to
recommend
This brand is
fun
This brand is
exciting
This brand is
youthful
6 1 1 1
9 0 1 0
7 0 0 0
6 1 1 1
9 0 1 0
7 0 0 1
7 0 0 0
What the data looks like
This data shows 7
observations
1 outcome
variable
4
Predictor variables
(Typically there will be more than 3.)
Case study 1: Technology
Outcome variable(s) Predictor variable(s)
Likelihood to recommend:
• Apple
• Microsoft
• IBM
• Google
• Intel
• Hewlett-Packard
• Sony
• Dell
• Yahoo
• Nokia
• Samsung
• LG
• Panasonic
Brand associations:
• Fun
• Worth what you pay for
• Innovative
• Good customer service
• Stylish
• Easy-to-use
• High quality
• High performance
• Low prices
5
ID
Apple
Microsoft
IBM
Apple
Microsoft
IBM
Apple
Microsoft
IBM
1 6 9 7 1 0 0 1 1 0
2 8 7 7 1 0 0 1 0 0
3 0 9 8 0 1 0 0 0 0
4 0 0 0 0 0 0 0 0 0
This brand is
fun
This brand is
exciting
Likelihood to
recommend
ID Brand
Likelihood to
recommend
This brand is
fun
This brand is
exciting
1 Apple 6 1 1
1 Microsoft 9 0 1
1 IBM 7 0 0
2 Apple 6 1 1
2 Microsoft 9 0 1
2 IBM 7 0 0
3 Apple 6 1 1
3 Microsoft 9 0 1
3 IBM 7 0 0
4 Apple 6 1 1
4 Microsoft 9 0 1
4 IBM 7 0 0
The data (stacked)
From: one row per respondent
To: one row per brand per respondent
6
Tips for stacking
• Get an SPSS .SAV data file. If you do not have an SPSS file:
• Import your data the usual way
• Tools > Save Data as SPSS/CSV and Save as type: SPSS
• Re-import
• Tools > Stack SPSS .sav Data File
• Set the labels for the stacking variable (in Q: observation) in Value Attributes
• Delete any None of these data (e.g., brand associations where respondents were
able to select None of these
7
Case study 2: Cola brand attitude
Outcome
variable(s)
34 Predictor
variable(s)
If the brand was a person, what would
its personality be?
Hate/Dislike/Neither/
Like/Love/Don’t know:
• Coke Zero
• Coke
• Diet Coke
• Diet Pepsi
• Pepsi Max
• Pepsi
Brand associations:
• Beautiful
• Carefree
• Charming
• Confident
• Down-to-earth
• Feminine
• Fun
• Health-conscious
• Hip
• Honest
• Humorous
• Imaginative
• Individualistic
• Innocent
• Intelligent
• Masculine
• Older
• Open to new
experiences
• Outdoorsy
• Rebellious
• Reckless
• Reliable
• Sexy
• Sleepy
• Tough
• Traditional
• Trying to be cool
• Unconventional
• Up-to-date
• Upper-class
• Urban
• Weight-conscious
• Wholesome
• Youthful
8
9
Example output:
Importance scores
Key
drivers of
cola
preference
10
Example output:
Performance-
Importance Chart
(aka Quad Chart)
Example output: Correspondence Analysis with Importance
11
Overview
• Objectives of (key) driver analysis
• Overview of techniques
• 13 assumptions that need to be checked when doing QA for driver analysis
(there are workarounds for most of these)
1: There are 15 or fewer predictors (if using Shapley)
2: The outcome variable is monotonically increasing
3: The outcome variable is numeric (if using Shapley)
4: The predictor variables are numeric or binary
5: People do not differ in their needs/wants (segmentation)
6: The causal model is plausible
7: There is no multicollinearity/correlations between predictors (if using GLMs)
8: There are no unexpected correlations between the predictors and the outcome variable
9: The signs of the importance scores are correct
10: The predictor variables have no missing values
11: There are no outliers/influential data points
12: There is no serial correlation (aka autocorrelation)
13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable) 12
Standard “best practice”
recommendation for
driver analysis:
LMG
Lindeman, Merenda, Gold (1980)
=
Kruskal
Kruskal (1987)
=
Dominance Analysis
Budescu (1993)
=
Shapley / Shapley Value
Lipovetsky and Conklin(2001)
The average
improvement in R² that a
predictor makes across
all possible models (aka
“Shapley”)
14
Best practice:
Bespoke models
(e.g., Bayesian
multilevel model)
Bivariate metrics
E.g., Correlations,
Jaccard
Coefficients
Shapley,
Relative
Importance
Analysis
Much too hard Too hard Too Soft Just Right
GLMs
(e.g., linear
regression)
What makes bespoke models and GLMs too hard?
To estimate an OK bespoke model,
you need to have a few week, and
know lots of things, including:
• Joint interpretation of parameter
estimates, the predictor covariance
matrix, and the parameter
covariance matrix
• Conditional effects
• Multicollinearity
• Confounding (e.g., suppressor
effects)
• Estimation (ML, Bayesian)
• Specification of informative priors
• Specification of random effects
15
To understand importance in a GLM (e.g., linear
regression), you need to know quite a lot about:
• Joint interpretation of parameter estimates, the
predictor covariance matrix, and the parameter
covariance matrix
• Conditional effects
• Multicollinearity
• Confounding (e.g., suppressor effects)
Shapley and similar methods allow
us to be less careful when
interpreting results
16
Bespoke models
& GLMs
Proportional
Marginal Variance
Decomposition
Shapley
With coefficient adjustment
Lipovetsky and Conklin(2001)
Random Forest
(for importance analysis)
Kruskal’s Squared
partial correlation
Called Kruskal in Q
Relative Importance
Analysis
AKA Relative Weight: Johnson (2000)
Shapley
Shapley case study
• Open Initial.Q. This already contains the cola data.
• File > Data Sets > Add to Project > From File > Stacked Technology
• Create > Regression > Driver (Importance) Analysis > Shapley
• Dependent variable: Q3. Likelihood to recommend [Stacked Technology]
• Dependent variable: Q4 variables from Stacked Technology
• No when asked about confidence intervals (clicking Yes is OK as well)
• Note that High Quality is the most important, with a score of 18.2
• Right-click: Reference name: shapley
Everything I demonstrate in this webinar is described on a slide like this. The rest of them
are hidden in this deck, but you can get them if you download the slides. So, there is no
need to take detailed notes.
17
Instructions for the case studies
Overview
• Objectives of (key) driver analysis
• Overview of techniques
• 13 assumptions that need to be checked when doing QA for driver analysis
(there are workarounds for most of these)
1: There are 15 or fewer predictors (if using Shapley)
2: The outcome variable is monotonically increasing
3: The outcome variable is numeric (if using Shapley)
4: The predictor variables are numeric or binary
5: People do not differ in their needs/wants (segmentation)
6: The causal model is plausible
7: There is no multicollinearity/correlations between predictors (if using GLMs)
8: There are no unexpected correlations between the predictors and the outcome variable
9: The signs of the importance scores are correct
10: The predictor variables have no missing values
11: There are no outliers/influential data points
12: There is no serial correlation (aka autocorrelation)
13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable)18
1: There are 15 or fewer predictors (if using Shapley)
19
Options (ranked from best to worst) Comments
Relative Importance Analysis
Tends to give almost identical results
to Shapley regression, and much,
much, faster to compute.
Shapley using only a subset of models (e.g.,
all predictors, leave 1 out predictor, 2
predictors, 1 predictor)
There is no evidence that this
method produces accurate estimates
of Shapley importance.
This is likely to be less similar to a
proper Shapley than using Relative
Importance Analysis.
Dimension reduction (e.g., PCA) Hard to interpret.
Issue
Shapley Regression cannot be
computed with more 15
predictors in Q (it reverts to
Relative Importance Analysis)
If you run Shapley Regression in
the R package relimpo with
this many variables you get a
crash.
1: There are 15 or fewer predictors (if using Shapley)
• With the cola study, we have 34 variables, and that will take an infinite amount of time to
compute, so using Shapley is not an option and we have to use Relative Importance Analysis.
• We can use the technology data set, which only has 9 predictors, to explore how similar the
techniques are.
• Create > Regression > Linear Regression
• Reference name: relative.importance
• Select variables
• Output: Relative importance analysis
• Check Automatic Note that High Quality is again most important
• Right-click: Add R Output:
comparison = cbind(shapley = shapley[-10],
"Relative Importance" =
relative.importance$relative.importance$importance)
• Calculate
• Change shapley to shapley[-10]
• Calculate
• Right-click: Add R Output: correlation = cor(comparison)
• Increase number of decimal places. Note the correlation is 0.999
• Rename output: Correlation
• Insert > Charts > Visualization > Labeled Scatterplot,
• Table: comparison
• Automatic
20
Instructions for the case studies
2: The outcome variable is monotonically increasing
21
Options (not mutually exclusive) Comments
Set Don’t Knows to missing
Merge categories
• Do this when there are categories that
have ambiguous orderings (e.g., OK
and Good).
• The more categories you merge, the
less significant the results will be.
Recode the data in some meaningful way
(e.g., reverse the scale, Likelihood to
recommend, recoded as NPS)
The specific values tend to make little
difference, so using a recoding that is
easy to explain to stakeholders, such as
NPS, is often desirable.
Issue
All the standard driver analysis
algorithms assume that the
outcome variable contains
categories ordered from lowest
to highest, and which are
believed to be associated with
greater levels of preference.
Test
This is usually best checked by
creating a summary table.
2: The outcome variable is monotonically increasing
• Right-click: Add Table
• Blue drop-down: Likelihood to recommend [Stacked Technology.sav]
Other than the NET, which is ignored by analyses, the order is correct.
• Click on table of outcome variables in Technology. Note that there is no problem.
• Click on Brand preference Note that here we have a Don’t Know.
• Right-click on Don’t Know and press Remove. This sets it as missing values.
• Click on two grey outputs (just setting up a later analysis)
22
Instructions for the case studies
3: The outcome variable is numeric (if using Shapley)
23
Options (ranked from best to worst) Comments
Use limited dependent variable versions of
Relative Importance Analysis (e.g., Ordered
Logit)
• The less numeric the variable, the
better this option is.
• This approach is also preferable
because it can take non-linear
relationships into account
automatically.
Ignore the problem and use Shapley.
Where the variable is close to being
numeric, there is probably little lost
by this approach.
Issue
Shapley assumes that the
outcome variable is numeric
(theoretically, it can deal with
non-numeric outcome variables,
but for more than about 10 or so
variables, it is impractical).
3: Numeric outcome
• Select relative.importance
• Type: Ordered logit
• Select chart We can see that the correlation between the models is smaller than
before. This is because correctly modeling the data type has a bigger impact on
our results than the difference between Shapley and Relative importance analysis.
• Note that the changes are pretty trivial.
24
Instructions for the case studies
4: The predictor variables are numeric or binary
25
Options (not mutually exclusive) Comments
Set Don’t Knows to missing
This can be problematic as the variables as the
missing values may not be missing at random.
This is discussed later.
Merge categories
• Do this when there are categories that have
ambiguous orderings (e.g., OK and Good).
• The more categories you merge, the less
significant the results will be.
Recode the data in some meaningful way
(midpoint recoding)
Use a bespoke or Generalized Linear Model
(GLM), with dummy variables and/or splines,
computing importance as the difference
between the lowest and largest effect sizes for
each variable.
In theory this is the best approach to dealing
with non-numeric data, but it requires quite a
lot to get right and, when interpreting the data,
the sampling error of the categorical and spline
effects will make them hard to compare.
Issue
Both Shapley and
Relative Importance
Analysis assume that
the predictor
variables are
numeric or binary.
5: People do not differ in their needs/wants (segmentation)
26
Options (not mutually exclusive) Comments
Estimate an appropriate bespoke model
(e.g., latent class analysis) and then
estimate the driver analysis models
within each segment
In Q: In a non-stacked data file, set up the
data as an Experiment, and use Create >
Segment > Latent Class Analysis
Form segments by judgment, and
estimate separate relative importance
analyses for each segment.
Ignore the problem, interpreting results
as “average” effects
Rightly-or-wrongly, this is how 99.9%* of
all modelling is done.
* Made-up number
Issue
Traditional driver analysis
techniques assume that people
have the same needs/wants, and
apply these consistently from
situation to situation.
How to test
• Compare by brand
• Compare by other data
• Latent class analysis
5: People do not differ in their needs/wants (segmentation)
• Select the Shapley Importance for Q3… table and press Duplicate
• Brown drop-down: Brand [Stacked Technology.sav]
• Note that this table is showing difference in important scores by segment. It is not
showing difference in perceptions.
• The colors and arrows are telling us that there are differences by brand. Look at
Fun. This is significantly lower for Intel, HP, and Dell. But, really important in the
evaluation of Google.
• What do we do now? Let’s return to the previous slide and look at the options.
27
Instructions for the case studies
6: The causal model is plausible
28
Options (not mutually exclusive) Comments
Build a bespoke model This is usually too hard
Include all the relevant (non-outcome)
variables and cross your fingers (if you
have not collected the data, you cannot
magic it into existence)
Rightly-or-wrongly, this is how 99.9%* of
all modelling is done
* Made-up number
Issue
All driver analysis techniques
assume that the analysis is a
plausible explanation of the
causal relationship between the
predictor variables and the
outcome variable.
This assumption is never true.
How to test
Common sense. Four common
examples are shown on the next
slides.
Example causality problem: Omitted variable bias
If we fail to include a relevant predictor variable, and that variable is correlated with the
predictor variables that we do include, the estimates of importance will be wrong. If
your R-square is less than 0.9, you may have this problem (a typical R-square is closer to
0.2 than 0.9).
Predictor 1
Predictor 2
Predictor 3
Predictor 4
E.g., price
Outcome 1
Assumed
predictor
variables
Arrows denote the true causal relationship
29
Example causality problem:
Outcome variable included as a predictor
30
Predictor 1
Predictor 2
Predictor 3
Outcome 2
E.g., Satisfaction
Outcome 1
E.g., NPS
Assumed predictor variables
If we include a predictor variable that is really an outcome variable, the
estimates of importance will be wrong.
Arrows denote the true causal relationship
6: Outcome variable a predictor
• Click on relative.importance
• Worth what paid for is clearly an example of this problem, so should be deleted.
• Remove Worth what paid for from the model
• Click on Shapley Importance for Q3. Likelihood to recommend
• Click on Comparisons. Note
• The warning
• The R-square is back in the model.
• Change in the R code -10 to -9
31
Instructions for the case studies
Example causality problem: Backdoor path
32
Predictor 1
E.g., price perception
Predictor 2
E.g., quality
Predictor 3
E,g., packaging
Variable
E.g., Attitude
Outcome 1
E.g., NPS
Assumed
predictor
variables
If backdoor path exists from the predictors to the outcome variable, the
estimates of importance will be wrong (spurious).
Arrows denote the true causal relationship
Example causality problem: Functional form
33
Assumed functional form
If we have the wrong functional form (i.e., assumed equation), the
estimates of importance will be wrong.
Arrows denote the true causal relationship
Outcome = Predictor 1 + Predictor 2 + Predictor 3
Outcome = Predictor 1 × Predictor 2 + Predictor 3
True functional form
7: There is no multicollinearity/correlations between
predictors (if using GLMs, e.g., linear regression)
34
Options (ranked from best to worst) Comments
Take all the relevant theory into
account when interpreting the
results.
This requires a strong technical and
intuitive understanding of the
underlying maths. Even if you possess
that understanding, it is really difficult to
explain to clients (particularly if it is a
tracking study and they are seeing
results fluctuate from period-to-period)
Use Shapley or Relative Importance
Analysis.
These techniques are designed to
address this problem. They are not
perfect, but they are easier to interpret
than linear regression and other GLMs
when predictor variables are correlated.
Issue
The bigger the correlations between predictors,
the more difficult it is to accurately interpret
estimates from traditional GLMs (e.g., linear
regression)
Test
1. Inspect the Variance Inflation Factors (VIF)
or Generalized Variance Inflation Factors
(GVIF). Q automatically computes these and
warns you if they are high.
2. Inspect the coefficients. Do they make
sense?
3. Look at the correlations.
7: There is no multicollinearity
• Click on relative.importance
• Uncheck Automatic
• Change it back to Summary. Now it is showing an order logit regression.
• Type to Linear.
• Note that we have a negative effect for Stylish. That is, the model says that if we hold everything else
constant, an improvement in Stylish will make a reduction in likelihood to recommend. This doesn’t
seem to make sense.
• Create > Correlation > Correlation Matrix. Select
• Q3. Likelihood to recommend
• Q4.
• Note that there is a moderate correlation between Stylish and Likelihood to recommend. And, Stylish is
correlated with everything out.
• My intuition is that this is all some weird and uninteresting quirk in the data. But, in truth I don’t know.
• Conclusion: we should be using Shapley or Relative Importance Analysis. It is designed to remove such
headaches.
• Output: Relative Importance Analysis
35
Instructions for the case studies
8: There are no unexpected correlations between the
predictors and the outcome variable
36
Options (ranked from best to worst)
Investigate the data to make sense of the
unexpected relationships.
Remove problematic variables from the
analysis.
Issue
When people interpret
importance scores, they assume
that higher means better. This is
assumption is not always right.
Test
Correlate each predictor variable
with the outcome variable
8: There are no unexpected correlations
• Click back on the correlation matrix. As we discussed earlier, these all look fine.
• Now, let’s look at the cola correlations. I computed them before, but they will
need to re-run because we removed the Don’t Know category from the Outcome
Variable.
• Click on cola.correlations
• You can see we have some negative correlations in the first column, which shows
the correlations with the outcome variable.
37
Instructions for the case studies
9: The signs of the importance scores are correct
38
Recommendation
If all the effects should be
positive, select the Absolute
importance scores option.
Otherwise, manually change
the results when reporting.
Issue
The underlying Shapley and Relative Importance Analysis algorithms always
compute a positive importance scores.
However, the true effect of a predictor can be negative, resulting in people
misinterpreting the results.
Test
Compute a GLM (e.g., linear regression). Any negative coefficients warrant
investigation. For this reason, Q automatically does this and puts the signs of the
multiple regression coefficients onto the driver analysis outputs (both Shapley and
Relative Importance Analysis).
If the correlation is also negative, it means that the effect is negative. If positive, it
suggests that the multiple regression is picking up a non-interesting artefact.
9: The signs of the importance scores are correct
• Click on relative.importance
• Stylish is negative. As discussed, it is because of the multiple regression. This is
really just a reminder that we need to be careful. We have already looked at this,
and we know that the correlation is positive, so we can ignore this.
• Check Absolute importance scores
• Go to comparison, and replace shapley[-9] with abs(shapley[-9])
39
Instructions for the case studies
10: The predictor variables have no missing values
40
Options (ranked from best to worst) Comments
Create a bespoke model that appropriately
models the process(es) that cause the
values to be missing.
This is really hard!
Multiple imputation of missing values
If using Relative Importance Analysis, set Missing
Data to Multiple Imputation
Leave out observations with missing values
from the analysis (i.e., complete case
analysis)
This implicitly assumes that the data is Missing
Completely At Random (MCAR; i.e., other than
that some variables have more missing values than
others, there is no pattern of any kind in the
missing data).
Test this assumption using Automate > Browse
Online Library > Missing Data > Little’s MCAR Test
Issue
There are missing
values of predictor
variables (e.g.,
some attributes
were not collected
for some
respondents, or
there were “don’t
know” response)
11: There are no outliers/unusual data points
41
Options (ranked from best to worst) Comments
Inspect each unusual observation, and
understand if it is an error or not
Difficult/time consuming
Filter out all the unusual observations, and
check to see if the model has changed. If it
has changed, and the number of unusual
observations is small, use the new model.
Ignore the problem
This is, by far, the most common
approach.
Issue
A few outliers/unusual
observations can skew the
results of importance analysis.
Test
• Hat/influence scores
• Standardized residuals
• Cook’s distance
11: There are no outliers/unusual data points
• Click on 3 more (warnings)
• Note that we have a warning about Unusual Observations. Let’s
explore this further.
• Create > Regression > Diagnostic > Plot > Influence Index
• Uncheck Automatic (so that it does not update when we later
change the model)
• You can see the various unusual observations have been marked:
379, 382, 295, and so on
• Go to Notes tab and copy code
• Variables and Questions – Stacked Technology.sav
• Right-click on a row: Insert > Variable(s) > R Variable:
• !(1:4056 %in% c(295, 1744, 1749, 2764, 3420, 246,
2364, 2829, 2830, 4029, 295, 2380, 2764, 3049,
3050)
• Name: Unusual observations removed
• Check the F in the Tags column. Make the variable available as a
filter.
• Go back to Output: relative.importance
• Apply the filter to the table
42
Instructions for the case studies
12: There is no serial correlation (aka autocorrelation)
43
Options (ranked from best to worst) Comments
Create a bespoke model that addresses the
serial correlation (e.g., a random effects model
if the serial correlation is due to repeated
measures, or a time series model if it is
measures over time)
This is a lot of work.
Don’t report statistical test results (i.e., p-
values).
The importance scores will be OK. The
significance tests will be misleading to
an unknown extent.
Issue
The standard tests for the
significance of a predictor
assume that there is no serial
correlation/autocorrelation (a
particular type of pattern in the
residuals).
Whenever you stack data you
are highly likely to have this
problem.
Test
Regression > Diagnostic > Serial
Correlation (Durbin-Watson)
12: There is no serial correlation (aka auto…)
• Regression > Diagnostic > Serial Correlation (Durbin-Watson)
• These are significant. As mentioned, this is always the case.
44
Instructions for the case studies
13: The residuals have constant variance (i.e., no
heteroscedasticity in a model with a linear outcome variable)
45
Options (ranked from best to worst) Comments
Use a more appropriate model (e.g., ordered
logit)
This is not possible with Shapley.
This models make other,
hopefully less problematic,
assumptions (beyond the scope
of this webinar)
Use robust standard errors
This is not possible with Shapley.
In Q: check Robust standard
error
Issue
The standard tests for the significance
of a predictor in a linear model assume
that the variance of the residuals is
constant.
This is rarely the case in driver analysis,
as usually the data is from a bounded
scale (e.g., if it is a rating out of 10, it is
impossible for a value to be observed
that is greater than 10).
Test
Displayr automatically performs the
Breusch-Pagen Test Type = Linear
13: The residuals have constant variance
• Click on relative.importance. Note that there is a warning.
• Change the Type to Ordered Logit
46
Instructions for the case studies
Creating the donut chart (earlier in the presentation)
• The first step is to create a new table containing the cola importance scores. Add a new R
Output, with code:
ColaImportance = cola.importance$relative.importance$importance
# Removing "Performance: " from the labels
names(ColaImportance) = flipFormat::TidyLabels(names(ColaImportance))
ColaImportance
• Then, sort the cola importance scores, in a descending order:
• Add a new R Output, with code:
SortedColaImportance = sort(ColaImportance, decreasing = TRUE)
• Create > Charts > Visualization > Donut Chart
• Select SortedColaImportance as the Table
• Data value suffix: %
• Data label decimal places: 1
• Change the name in the report tree to Donut
• Check Automatic (at the top)
47
Instructions for the case studies
Creating the performance-importance chart
(earlier in the presentation)
• Right-click on donut in the Report tree and select Add Table
• In the blue drop-down menu, select Performance
• Click on Brand preference, at the top of the Report tree
• Add another table and select Brand [Stacked Cola Brand Associations.sav] in the blue drop-down menu
• Right-click on the 17% next to Diet Coke and select Create Filter
• Make sure that Apply filter to the current table is not selected, and press OK
• Click the Performance table, and apply the Diet Coke filter (using the Filter drop-down, at the bottom-left of the
screen)
• Right-click on Performance and select Reference Name and change this to performance
• Right-click on the Performance table in the Report tree and select Add R Output, entering the code
PerformanceImportance = cbind("Importance (%)" = ColaImportance,
"Diet Coke Brand Associations (%)" = table.Performance[-
nrow(table.Performance)])
• Create > Charts > Visualization > Labeled Scatterplot
• Set Table to PerformanceImportance
• Rename to Perfomance Importance Chart
• Press Calculate
48
Instructions for the case studies
Creating the correspondence analysis chart
(earlier in the presentation)
• Right-click on Performance Importance Chartin the Report tree and select Add
Table
• In the blue drop-down menu, select Performance
• In the brown drop-down select Brand [Stacked Cola Brand Associations.sav]
• Right-click on None of these and select Remove
• Create > Dimension Reduction > Correspondence Analysis of a Table
• Table: Performance by Brand
• Output: Bubble Chart
• Bubble sizes: ColaImportance
• Legend title: Importance
49
Instructions for the case studies
TIM BOCK PRESENTS
Q&A Session
Type questions into the Questions fields in GoToWebinar.
If we do not get to your question during the webinar, we will write back
via email.
We will email you a link to the slides and data.
Get a free one-month trial of Q from www.q-researchsoftware.com.

More Related Content

What's hot

R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsJen Stirrup
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practiceAmit Sharma
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparationSara Hooker
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - FinalBrian Lin
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear RegressionSara Hooker
 
Recsys2021_slides_sato
Recsys2021_slides_satoRecsys2021_slides_sato
Recsys2021_slides_satoMasahiro Sato
 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedRising Media Ltd.
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Amit Sharma
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionBigML, Inc
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleAmit Sharma
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing DataDataCards
 
Can We Automate Predictive Analytics
Can We Automate Predictive AnalyticsCan We Automate Predictive Analytics
Can We Automate Predictive Analyticsodsc
 
Evaluation and optimization of variables using response surface methodology
Evaluation and optimization of variables using response surface methodologyEvaluation and optimization of variables using response surface methodology
Evaluation and optimization of variables using response surface methodologyMohammed Abdullah Issa
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshopodsc
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learningGauravsd2014
 
Doing Research with PLS_SEM using SmartPLS
Doing Research with PLS_SEM using SmartPLSDoing Research with PLS_SEM using SmartPLS
Doing Research with PLS_SEM using SmartPLSAwuni Emmanuel
 

What's hot (20)

R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparation
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Recsys2021_slides_sato
Recsys2021_slides_satoRecsys2021_slides_sato
Recsys2021_slides_sato
 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
 
Introduction to Modeling
Introduction to ModelingIntroduction to Modeling
Introduction to Modeling
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...Predictability of popularity on online social media: Gaps between prediction ...
Predictability of popularity on online social media: Gaps between prediction ...
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scale
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing Data
 
Can We Automate Predictive Analytics
Can We Automate Predictive AnalyticsCan We Automate Predictive Analytics
Can We Automate Predictive Analytics
 
Evaluation and optimization of variables using response surface methodology
Evaluation and optimization of variables using response surface methodologyEvaluation and optimization of variables using response surface methodology
Evaluation and optimization of variables using response surface methodology
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshop
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
 
Doing Research with PLS_SEM using SmartPLS
Doing Research with PLS_SEM using SmartPLSDoing Research with PLS_SEM using SmartPLS
Doing Research with PLS_SEM using SmartPLS
 

Viewers also liked

Geometrics rev 1
Geometrics rev 1Geometrics rev 1
Geometrics rev 1rohitcarlos
 
RepèRes Bayesia Consumer Segmentation Skim Conf08
RepèRes Bayesia   Consumer Segmentation   Skim Conf08RepèRes Bayesia   Consumer Segmentation   Skim Conf08
RepèRes Bayesia Consumer Segmentation Skim Conf08François Abiven
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8
IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8
IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8Teamstudio
 
Correspondence analysis(step by step)
Correspondence analysis(step by step)Correspondence analysis(step by step)
Correspondence analysis(step by step)Nguyen Van Chuc
 
Global SAP Support from Invenio
Global SAP Support from InvenioGlobal SAP Support from Invenio
Global SAP Support from InvenioinvenioLSI
 
Self-Fund Your Digital Transformation with the Power of SAP Ariba Solutions a...
Self-Fund Your Digital Transformation with the Power of SAP Ariba Solutions a...Self-Fund Your Digital Transformation with the Power of SAP Ariba Solutions a...
Self-Fund Your Digital Transformation with the Power of SAP Ariba Solutions a...SAP Ariba
 
SAP Ariba Procurement: The Buying Process
SAP Ariba Procurement: The Buying ProcessSAP Ariba Procurement: The Buying Process
SAP Ariba Procurement: The Buying ProcessSAP Ariba
 
Preparing for Deployment of SAP Ariba Solutions at Your Company
Preparing for Deployment of SAP Ariba Solutions at Your Company	Preparing for Deployment of SAP Ariba Solutions at Your Company
Preparing for Deployment of SAP Ariba Solutions at Your Company SAP Ariba
 
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Amazon Web Services
 
How to Scale Your Architecture and DevOps Practices for Big Data Applications
How to Scale Your Architecture and DevOps Practices for Big Data ApplicationsHow to Scale Your Architecture and DevOps Practices for Big Data Applications
How to Scale Your Architecture and DevOps Practices for Big Data ApplicationsAmazon Web Services
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Building a Development Workflow for Serverless Applications - March 2017 AWS ...
Building a Development Workflow for Serverless Applications - March 2017 AWS ...Building a Development Workflow for Serverless Applications - March 2017 AWS ...
Building a Development Workflow for Serverless Applications - March 2017 AWS ...Amazon Web Services
 
Omnichannel Attribution Modelling
Omnichannel Attribution ModellingOmnichannel Attribution Modelling
Omnichannel Attribution ModellingMetriplica
 
Deep Dive on Amazon Cognito - March 2017 AWS Online Tech Talks
Deep Dive on Amazon Cognito - March 2017 AWS Online Tech TalksDeep Dive on Amazon Cognito - March 2017 AWS Online Tech Talks
Deep Dive on Amazon Cognito - March 2017 AWS Online Tech TalksAmazon Web Services
 
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANA
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANAFive Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANA
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANASAP Technology
 

Viewers also liked (20)

Geometrics rev 1
Geometrics rev 1Geometrics rev 1
Geometrics rev 1
 
RepèRes Bayesia Consumer Segmentation Skim Conf08
RepèRes Bayesia   Consumer Segmentation   Skim Conf08RepèRes Bayesia   Consumer Segmentation   Skim Conf08
RepèRes Bayesia Consumer Segmentation Skim Conf08
 
SAP Content Management Solution Brief
SAP Content Management Solution Brief SAP Content Management Solution Brief
SAP Content Management Solution Brief
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
SAP HANA and SAP Vora
SAP HANA and SAP VoraSAP HANA and SAP Vora
SAP HANA and SAP Vora
 
IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8
IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8
IBM Presents the Notes Domino Roadmap and a Deep Dive into Feature Pack 8
 
Correspondence analysis(step by step)
Correspondence analysis(step by step)Correspondence analysis(step by step)
Correspondence analysis(step by step)
 
Global SAP Support from Invenio
Global SAP Support from InvenioGlobal SAP Support from Invenio
Global SAP Support from Invenio
 
Self-Fund Your Digital Transformation with the Power of SAP Ariba Solutions a...
Self-Fund Your Digital Transformation with the Power of SAP Ariba Solutions a...Self-Fund Your Digital Transformation with the Power of SAP Ariba Solutions a...
Self-Fund Your Digital Transformation with the Power of SAP Ariba Solutions a...
 
SAP Ariba Procurement: The Buying Process
SAP Ariba Procurement: The Buying ProcessSAP Ariba Procurement: The Buying Process
SAP Ariba Procurement: The Buying Process
 
Preparing for Deployment of SAP Ariba Solutions at Your Company
Preparing for Deployment of SAP Ariba Solutions at Your Company	Preparing for Deployment of SAP Ariba Solutions at Your Company
Preparing for Deployment of SAP Ariba Solutions at Your Company
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
 
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
 
How to Scale Your Architecture and DevOps Practices for Big Data Applications
How to Scale Your Architecture and DevOps Practices for Big Data ApplicationsHow to Scale Your Architecture and DevOps Practices for Big Data Applications
How to Scale Your Architecture and DevOps Practices for Big Data Applications
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Building a Development Workflow for Serverless Applications - March 2017 AWS ...
Building a Development Workflow for Serverless Applications - March 2017 AWS ...Building a Development Workflow for Serverless Applications - March 2017 AWS ...
Building a Development Workflow for Serverless Applications - March 2017 AWS ...
 
Incident Coordination Workshop
Incident Coordination WorkshopIncident Coordination Workshop
Incident Coordination Workshop
 
Omnichannel Attribution Modelling
Omnichannel Attribution ModellingOmnichannel Attribution Modelling
Omnichannel Attribution Modelling
 
Deep Dive on Amazon Cognito - March 2017 AWS Online Tech Talks
Deep Dive on Amazon Cognito - March 2017 AWS Online Tech TalksDeep Dive on Amazon Cognito - March 2017 AWS Online Tech Talks
Deep Dive on Amazon Cognito - March 2017 AWS Online Tech Talks
 
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANA
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANAFive Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANA
Five Reasons To Skip SAP Suite on HANA and Go Directly to SAP S/4HANA
 

Similar to DIY Driver Analysis Webinar slides

q method research (1).pptx
q method research (1).pptxq method research (1).pptx
q method research (1).pptxChia Barzinje
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points FinalJohn Michael Croft
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDatabaOllieShoresna
 
bookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdfbookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdf13DikshaDatir
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsAmy Hodler
 
Predictive Analytics in Practice
Predictive Analytics in PracticePredictive Analytics in Practice
Predictive Analytics in PracticeHobsons
 
Regression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalRegression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalJohn Michael Croft
 
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...QA or the Highway
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071CS, NcState
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti AZoha Qureshi
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSSRemas Mohamed
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 
Driver Analysis and Product Optimization with Bayesian Networks
Driver Analysis and Product Optimization with Bayesian NetworksDriver Analysis and Product Optimization with Bayesian Networks
Driver Analysis and Product Optimization with Bayesian NetworksBayesia USA
 
Empirical Evaluation of Active Learning in Recommender Systems
Empirical Evaluation of Active Learning in Recommender SystemsEmpirical Evaluation of Active Learning in Recommender Systems
Empirical Evaluation of Active Learning in Recommender SystemsUniversity of Bergen
 

Similar to DIY Driver Analysis Webinar slides (20)

q method research (1).pptx
q method research (1).pptxq method research (1).pptx
q method research (1).pptx
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDataba
 
bookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdfbookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdf
 
Book Recommendations.pptx
Book Recommendations.pptxBook Recommendations.pptx
Book Recommendations.pptx
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problems
 
Predictive Analytics in Practice
Predictive Analytics in PracticePredictive Analytics in Practice
Predictive Analytics in Practice
 
7INNOVA Lesson 5 slides for T26
7INNOVA Lesson 5 slides for T267INNOVA Lesson 5 slides for T26
7INNOVA Lesson 5 slides for T26
 
Regression Analysis of SAT Scores Final
Regression Analysis of SAT Scores FinalRegression Analysis of SAT Scores Final
Regression Analysis of SAT Scores Final
 
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Driver Analysis and Product Optimization with Bayesian Networks
Driver Analysis and Product Optimization with Bayesian NetworksDriver Analysis and Product Optimization with Bayesian Networks
Driver Analysis and Product Optimization with Bayesian Networks
 
Decisions
DecisionsDecisions
Decisions
 
Empirical Evaluation of Active Learning in Recommender Systems
Empirical Evaluation of Active Learning in Recommender SystemsEmpirical Evaluation of Active Learning in Recommender Systems
Empirical Evaluation of Active Learning in Recommender Systems
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 

DIY Driver Analysis Webinar slides

  • 1. TIM BOCK PRESENTS Examples in Q5 (beta) If you have any questions, enter them into the Questions field. Questions will be answered at the end. If we do not have time to get to your question, we will email you. We will email you a link to the slides and data. Get a free one-month trial of Q from www.q-researchsoftware.com. DIY Driver Analysis
  • 2. Overview • Objectives of (key) driver analysis • Overview of techniques • 13 assumptions that need to be checked when doing QA for driver analysis (there are workarounds for most of these) 1: There are 15 or fewer predictors (if using Shapley) 2: The outcome variable is monotonically increasing 3: The outcome variable is numeric (if using Shapley) 4: The predictor variables are numeric or binary 5: People do not differ in their needs/wants (segmentation) 6: The causal model is plausible 7: There is no multicollinearity/correlations between predictors (if using GLMs) 8: There are no unexpected correlations between the predictors and the outcome variable 9: The signs of the importance scores are correct 10: The predictor variables have no missing values 11: There are no outliers/influential data points 12: There is no serial correlation (aka autocorrelation) 13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable) 2
  • 3. The basic objective of (key) driver analysis The basic objective: work out the relative importance of a series of predictor variables in predicting an outcome variable. For example: • NPS: comfort vs customer service vs price. • Customer satisfaction: wait time vs staff friendliness vs comfort. • Brand preference: modernity vs friendliness vs youthfulness. What driver analysis is not: predictive analysis (e.g., predicting sales, customer churn). Although, you can use driver analysis to make strategic predictions (e.g., if I improve, say, fun, then preference will increase.) 3
  • 4. Likelihood to recommend This brand is fun This brand is exciting This brand is youthful 6 1 1 1 9 0 1 0 7 0 0 0 6 1 1 1 9 0 1 0 7 0 0 1 7 0 0 0 What the data looks like This data shows 7 observations 1 outcome variable 4 Predictor variables (Typically there will be more than 3.)
  • 5. Case study 1: Technology Outcome variable(s) Predictor variable(s) Likelihood to recommend: • Apple • Microsoft • IBM • Google • Intel • Hewlett-Packard • Sony • Dell • Yahoo • Nokia • Samsung • LG • Panasonic Brand associations: • Fun • Worth what you pay for • Innovative • Good customer service • Stylish • Easy-to-use • High quality • High performance • Low prices 5
  • 6. ID Apple Microsoft IBM Apple Microsoft IBM Apple Microsoft IBM 1 6 9 7 1 0 0 1 1 0 2 8 7 7 1 0 0 1 0 0 3 0 9 8 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 This brand is fun This brand is exciting Likelihood to recommend ID Brand Likelihood to recommend This brand is fun This brand is exciting 1 Apple 6 1 1 1 Microsoft 9 0 1 1 IBM 7 0 0 2 Apple 6 1 1 2 Microsoft 9 0 1 2 IBM 7 0 0 3 Apple 6 1 1 3 Microsoft 9 0 1 3 IBM 7 0 0 4 Apple 6 1 1 4 Microsoft 9 0 1 4 IBM 7 0 0 The data (stacked) From: one row per respondent To: one row per brand per respondent 6
  • 7. Tips for stacking • Get an SPSS .SAV data file. If you do not have an SPSS file: • Import your data the usual way • Tools > Save Data as SPSS/CSV and Save as type: SPSS • Re-import • Tools > Stack SPSS .sav Data File • Set the labels for the stacking variable (in Q: observation) in Value Attributes • Delete any None of these data (e.g., brand associations where respondents were able to select None of these 7
  • 8. Case study 2: Cola brand attitude Outcome variable(s) 34 Predictor variable(s) If the brand was a person, what would its personality be? Hate/Dislike/Neither/ Like/Love/Don’t know: • Coke Zero • Coke • Diet Coke • Diet Pepsi • Pepsi Max • Pepsi Brand associations: • Beautiful • Carefree • Charming • Confident • Down-to-earth • Feminine • Fun • Health-conscious • Hip • Honest • Humorous • Imaginative • Individualistic • Innocent • Intelligent • Masculine • Older • Open to new experiences • Outdoorsy • Rebellious • Reckless • Reliable • Sexy • Sleepy • Tough • Traditional • Trying to be cool • Unconventional • Up-to-date • Upper-class • Urban • Weight-conscious • Wholesome • Youthful 8
  • 11. Example output: Correspondence Analysis with Importance 11
  • 12. Overview • Objectives of (key) driver analysis • Overview of techniques • 13 assumptions that need to be checked when doing QA for driver analysis (there are workarounds for most of these) 1: There are 15 or fewer predictors (if using Shapley) 2: The outcome variable is monotonically increasing 3: The outcome variable is numeric (if using Shapley) 4: The predictor variables are numeric or binary 5: People do not differ in their needs/wants (segmentation) 6: The causal model is plausible 7: There is no multicollinearity/correlations between predictors (if using GLMs) 8: There are no unexpected correlations between the predictors and the outcome variable 9: The signs of the importance scores are correct 10: The predictor variables have no missing values 11: There are no outliers/influential data points 12: There is no serial correlation (aka autocorrelation) 13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable) 12
  • 13. Standard “best practice” recommendation for driver analysis: LMG Lindeman, Merenda, Gold (1980) = Kruskal Kruskal (1987) = Dominance Analysis Budescu (1993) = Shapley / Shapley Value Lipovetsky and Conklin(2001) The average improvement in R² that a predictor makes across all possible models (aka “Shapley”)
  • 14. 14 Best practice: Bespoke models (e.g., Bayesian multilevel model) Bivariate metrics E.g., Correlations, Jaccard Coefficients Shapley, Relative Importance Analysis Much too hard Too hard Too Soft Just Right GLMs (e.g., linear regression)
  • 15. What makes bespoke models and GLMs too hard? To estimate an OK bespoke model, you need to have a few week, and know lots of things, including: • Joint interpretation of parameter estimates, the predictor covariance matrix, and the parameter covariance matrix • Conditional effects • Multicollinearity • Confounding (e.g., suppressor effects) • Estimation (ML, Bayesian) • Specification of informative priors • Specification of random effects 15 To understand importance in a GLM (e.g., linear regression), you need to know quite a lot about: • Joint interpretation of parameter estimates, the predictor covariance matrix, and the parameter covariance matrix • Conditional effects • Multicollinearity • Confounding (e.g., suppressor effects) Shapley and similar methods allow us to be less careful when interpreting results
  • 16. 16 Bespoke models & GLMs Proportional Marginal Variance Decomposition Shapley With coefficient adjustment Lipovetsky and Conklin(2001) Random Forest (for importance analysis) Kruskal’s Squared partial correlation Called Kruskal in Q Relative Importance Analysis AKA Relative Weight: Johnson (2000) Shapley
  • 17. Shapley case study • Open Initial.Q. This already contains the cola data. • File > Data Sets > Add to Project > From File > Stacked Technology • Create > Regression > Driver (Importance) Analysis > Shapley • Dependent variable: Q3. Likelihood to recommend [Stacked Technology] • Dependent variable: Q4 variables from Stacked Technology • No when asked about confidence intervals (clicking Yes is OK as well) • Note that High Quality is the most important, with a score of 18.2 • Right-click: Reference name: shapley Everything I demonstrate in this webinar is described on a slide like this. The rest of them are hidden in this deck, but you can get them if you download the slides. So, there is no need to take detailed notes. 17 Instructions for the case studies
  • 18. Overview • Objectives of (key) driver analysis • Overview of techniques • 13 assumptions that need to be checked when doing QA for driver analysis (there are workarounds for most of these) 1: There are 15 or fewer predictors (if using Shapley) 2: The outcome variable is monotonically increasing 3: The outcome variable is numeric (if using Shapley) 4: The predictor variables are numeric or binary 5: People do not differ in their needs/wants (segmentation) 6: The causal model is plausible 7: There is no multicollinearity/correlations between predictors (if using GLMs) 8: There are no unexpected correlations between the predictors and the outcome variable 9: The signs of the importance scores are correct 10: The predictor variables have no missing values 11: There are no outliers/influential data points 12: There is no serial correlation (aka autocorrelation) 13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable)18
  • 19. 1: There are 15 or fewer predictors (if using Shapley) 19 Options (ranked from best to worst) Comments Relative Importance Analysis Tends to give almost identical results to Shapley regression, and much, much, faster to compute. Shapley using only a subset of models (e.g., all predictors, leave 1 out predictor, 2 predictors, 1 predictor) There is no evidence that this method produces accurate estimates of Shapley importance. This is likely to be less similar to a proper Shapley than using Relative Importance Analysis. Dimension reduction (e.g., PCA) Hard to interpret. Issue Shapley Regression cannot be computed with more 15 predictors in Q (it reverts to Relative Importance Analysis) If you run Shapley Regression in the R package relimpo with this many variables you get a crash.
  • 20. 1: There are 15 or fewer predictors (if using Shapley) • With the cola study, we have 34 variables, and that will take an infinite amount of time to compute, so using Shapley is not an option and we have to use Relative Importance Analysis. • We can use the technology data set, which only has 9 predictors, to explore how similar the techniques are. • Create > Regression > Linear Regression • Reference name: relative.importance • Select variables • Output: Relative importance analysis • Check Automatic Note that High Quality is again most important • Right-click: Add R Output: comparison = cbind(shapley = shapley[-10], "Relative Importance" = relative.importance$relative.importance$importance) • Calculate • Change shapley to shapley[-10] • Calculate • Right-click: Add R Output: correlation = cor(comparison) • Increase number of decimal places. Note the correlation is 0.999 • Rename output: Correlation • Insert > Charts > Visualization > Labeled Scatterplot, • Table: comparison • Automatic 20 Instructions for the case studies
  • 21. 2: The outcome variable is monotonically increasing 21 Options (not mutually exclusive) Comments Set Don’t Knows to missing Merge categories • Do this when there are categories that have ambiguous orderings (e.g., OK and Good). • The more categories you merge, the less significant the results will be. Recode the data in some meaningful way (e.g., reverse the scale, Likelihood to recommend, recoded as NPS) The specific values tend to make little difference, so using a recoding that is easy to explain to stakeholders, such as NPS, is often desirable. Issue All the standard driver analysis algorithms assume that the outcome variable contains categories ordered from lowest to highest, and which are believed to be associated with greater levels of preference. Test This is usually best checked by creating a summary table.
  • 22. 2: The outcome variable is monotonically increasing • Right-click: Add Table • Blue drop-down: Likelihood to recommend [Stacked Technology.sav] Other than the NET, which is ignored by analyses, the order is correct. • Click on table of outcome variables in Technology. Note that there is no problem. • Click on Brand preference Note that here we have a Don’t Know. • Right-click on Don’t Know and press Remove. This sets it as missing values. • Click on two grey outputs (just setting up a later analysis) 22 Instructions for the case studies
  • 23. 3: The outcome variable is numeric (if using Shapley) 23 Options (ranked from best to worst) Comments Use limited dependent variable versions of Relative Importance Analysis (e.g., Ordered Logit) • The less numeric the variable, the better this option is. • This approach is also preferable because it can take non-linear relationships into account automatically. Ignore the problem and use Shapley. Where the variable is close to being numeric, there is probably little lost by this approach. Issue Shapley assumes that the outcome variable is numeric (theoretically, it can deal with non-numeric outcome variables, but for more than about 10 or so variables, it is impractical).
  • 24. 3: Numeric outcome • Select relative.importance • Type: Ordered logit • Select chart We can see that the correlation between the models is smaller than before. This is because correctly modeling the data type has a bigger impact on our results than the difference between Shapley and Relative importance analysis. • Note that the changes are pretty trivial. 24 Instructions for the case studies
  • 25. 4: The predictor variables are numeric or binary 25 Options (not mutually exclusive) Comments Set Don’t Knows to missing This can be problematic as the variables as the missing values may not be missing at random. This is discussed later. Merge categories • Do this when there are categories that have ambiguous orderings (e.g., OK and Good). • The more categories you merge, the less significant the results will be. Recode the data in some meaningful way (midpoint recoding) Use a bespoke or Generalized Linear Model (GLM), with dummy variables and/or splines, computing importance as the difference between the lowest and largest effect sizes for each variable. In theory this is the best approach to dealing with non-numeric data, but it requires quite a lot to get right and, when interpreting the data, the sampling error of the categorical and spline effects will make them hard to compare. Issue Both Shapley and Relative Importance Analysis assume that the predictor variables are numeric or binary.
  • 26. 5: People do not differ in their needs/wants (segmentation) 26 Options (not mutually exclusive) Comments Estimate an appropriate bespoke model (e.g., latent class analysis) and then estimate the driver analysis models within each segment In Q: In a non-stacked data file, set up the data as an Experiment, and use Create > Segment > Latent Class Analysis Form segments by judgment, and estimate separate relative importance analyses for each segment. Ignore the problem, interpreting results as “average” effects Rightly-or-wrongly, this is how 99.9%* of all modelling is done. * Made-up number Issue Traditional driver analysis techniques assume that people have the same needs/wants, and apply these consistently from situation to situation. How to test • Compare by brand • Compare by other data • Latent class analysis
  • 27. 5: People do not differ in their needs/wants (segmentation) • Select the Shapley Importance for Q3… table and press Duplicate • Brown drop-down: Brand [Stacked Technology.sav] • Note that this table is showing difference in important scores by segment. It is not showing difference in perceptions. • The colors and arrows are telling us that there are differences by brand. Look at Fun. This is significantly lower for Intel, HP, and Dell. But, really important in the evaluation of Google. • What do we do now? Let’s return to the previous slide and look at the options. 27 Instructions for the case studies
  • 28. 6: The causal model is plausible 28 Options (not mutually exclusive) Comments Build a bespoke model This is usually too hard Include all the relevant (non-outcome) variables and cross your fingers (if you have not collected the data, you cannot magic it into existence) Rightly-or-wrongly, this is how 99.9%* of all modelling is done * Made-up number Issue All driver analysis techniques assume that the analysis is a plausible explanation of the causal relationship between the predictor variables and the outcome variable. This assumption is never true. How to test Common sense. Four common examples are shown on the next slides.
  • 29. Example causality problem: Omitted variable bias If we fail to include a relevant predictor variable, and that variable is correlated with the predictor variables that we do include, the estimates of importance will be wrong. If your R-square is less than 0.9, you may have this problem (a typical R-square is closer to 0.2 than 0.9). Predictor 1 Predictor 2 Predictor 3 Predictor 4 E.g., price Outcome 1 Assumed predictor variables Arrows denote the true causal relationship 29
  • 30. Example causality problem: Outcome variable included as a predictor 30 Predictor 1 Predictor 2 Predictor 3 Outcome 2 E.g., Satisfaction Outcome 1 E.g., NPS Assumed predictor variables If we include a predictor variable that is really an outcome variable, the estimates of importance will be wrong. Arrows denote the true causal relationship
  • 31. 6: Outcome variable a predictor • Click on relative.importance • Worth what paid for is clearly an example of this problem, so should be deleted. • Remove Worth what paid for from the model • Click on Shapley Importance for Q3. Likelihood to recommend • Click on Comparisons. Note • The warning • The R-square is back in the model. • Change in the R code -10 to -9 31 Instructions for the case studies
  • 32. Example causality problem: Backdoor path 32 Predictor 1 E.g., price perception Predictor 2 E.g., quality Predictor 3 E,g., packaging Variable E.g., Attitude Outcome 1 E.g., NPS Assumed predictor variables If backdoor path exists from the predictors to the outcome variable, the estimates of importance will be wrong (spurious). Arrows denote the true causal relationship
  • 33. Example causality problem: Functional form 33 Assumed functional form If we have the wrong functional form (i.e., assumed equation), the estimates of importance will be wrong. Arrows denote the true causal relationship Outcome = Predictor 1 + Predictor 2 + Predictor 3 Outcome = Predictor 1 × Predictor 2 + Predictor 3 True functional form
  • 34. 7: There is no multicollinearity/correlations between predictors (if using GLMs, e.g., linear regression) 34 Options (ranked from best to worst) Comments Take all the relevant theory into account when interpreting the results. This requires a strong technical and intuitive understanding of the underlying maths. Even if you possess that understanding, it is really difficult to explain to clients (particularly if it is a tracking study and they are seeing results fluctuate from period-to-period) Use Shapley or Relative Importance Analysis. These techniques are designed to address this problem. They are not perfect, but they are easier to interpret than linear regression and other GLMs when predictor variables are correlated. Issue The bigger the correlations between predictors, the more difficult it is to accurately interpret estimates from traditional GLMs (e.g., linear regression) Test 1. Inspect the Variance Inflation Factors (VIF) or Generalized Variance Inflation Factors (GVIF). Q automatically computes these and warns you if they are high. 2. Inspect the coefficients. Do they make sense? 3. Look at the correlations.
  • 35. 7: There is no multicollinearity • Click on relative.importance • Uncheck Automatic • Change it back to Summary. Now it is showing an order logit regression. • Type to Linear. • Note that we have a negative effect for Stylish. That is, the model says that if we hold everything else constant, an improvement in Stylish will make a reduction in likelihood to recommend. This doesn’t seem to make sense. • Create > Correlation > Correlation Matrix. Select • Q3. Likelihood to recommend • Q4. • Note that there is a moderate correlation between Stylish and Likelihood to recommend. And, Stylish is correlated with everything out. • My intuition is that this is all some weird and uninteresting quirk in the data. But, in truth I don’t know. • Conclusion: we should be using Shapley or Relative Importance Analysis. It is designed to remove such headaches. • Output: Relative Importance Analysis 35 Instructions for the case studies
  • 36. 8: There are no unexpected correlations between the predictors and the outcome variable 36 Options (ranked from best to worst) Investigate the data to make sense of the unexpected relationships. Remove problematic variables from the analysis. Issue When people interpret importance scores, they assume that higher means better. This is assumption is not always right. Test Correlate each predictor variable with the outcome variable
  • 37. 8: There are no unexpected correlations • Click back on the correlation matrix. As we discussed earlier, these all look fine. • Now, let’s look at the cola correlations. I computed them before, but they will need to re-run because we removed the Don’t Know category from the Outcome Variable. • Click on cola.correlations • You can see we have some negative correlations in the first column, which shows the correlations with the outcome variable. 37 Instructions for the case studies
  • 38. 9: The signs of the importance scores are correct 38 Recommendation If all the effects should be positive, select the Absolute importance scores option. Otherwise, manually change the results when reporting. Issue The underlying Shapley and Relative Importance Analysis algorithms always compute a positive importance scores. However, the true effect of a predictor can be negative, resulting in people misinterpreting the results. Test Compute a GLM (e.g., linear regression). Any negative coefficients warrant investigation. For this reason, Q automatically does this and puts the signs of the multiple regression coefficients onto the driver analysis outputs (both Shapley and Relative Importance Analysis). If the correlation is also negative, it means that the effect is negative. If positive, it suggests that the multiple regression is picking up a non-interesting artefact.
  • 39. 9: The signs of the importance scores are correct • Click on relative.importance • Stylish is negative. As discussed, it is because of the multiple regression. This is really just a reminder that we need to be careful. We have already looked at this, and we know that the correlation is positive, so we can ignore this. • Check Absolute importance scores • Go to comparison, and replace shapley[-9] with abs(shapley[-9]) 39 Instructions for the case studies
  • 40. 10: The predictor variables have no missing values 40 Options (ranked from best to worst) Comments Create a bespoke model that appropriately models the process(es) that cause the values to be missing. This is really hard! Multiple imputation of missing values If using Relative Importance Analysis, set Missing Data to Multiple Imputation Leave out observations with missing values from the analysis (i.e., complete case analysis) This implicitly assumes that the data is Missing Completely At Random (MCAR; i.e., other than that some variables have more missing values than others, there is no pattern of any kind in the missing data). Test this assumption using Automate > Browse Online Library > Missing Data > Little’s MCAR Test Issue There are missing values of predictor variables (e.g., some attributes were not collected for some respondents, or there were “don’t know” response)
  • 41. 11: There are no outliers/unusual data points 41 Options (ranked from best to worst) Comments Inspect each unusual observation, and understand if it is an error or not Difficult/time consuming Filter out all the unusual observations, and check to see if the model has changed. If it has changed, and the number of unusual observations is small, use the new model. Ignore the problem This is, by far, the most common approach. Issue A few outliers/unusual observations can skew the results of importance analysis. Test • Hat/influence scores • Standardized residuals • Cook’s distance
  • 42. 11: There are no outliers/unusual data points • Click on 3 more (warnings) • Note that we have a warning about Unusual Observations. Let’s explore this further. • Create > Regression > Diagnostic > Plot > Influence Index • Uncheck Automatic (so that it does not update when we later change the model) • You can see the various unusual observations have been marked: 379, 382, 295, and so on • Go to Notes tab and copy code • Variables and Questions – Stacked Technology.sav • Right-click on a row: Insert > Variable(s) > R Variable: • !(1:4056 %in% c(295, 1744, 1749, 2764, 3420, 246, 2364, 2829, 2830, 4029, 295, 2380, 2764, 3049, 3050) • Name: Unusual observations removed • Check the F in the Tags column. Make the variable available as a filter. • Go back to Output: relative.importance • Apply the filter to the table 42 Instructions for the case studies
  • 43. 12: There is no serial correlation (aka autocorrelation) 43 Options (ranked from best to worst) Comments Create a bespoke model that addresses the serial correlation (e.g., a random effects model if the serial correlation is due to repeated measures, or a time series model if it is measures over time) This is a lot of work. Don’t report statistical test results (i.e., p- values). The importance scores will be OK. The significance tests will be misleading to an unknown extent. Issue The standard tests for the significance of a predictor assume that there is no serial correlation/autocorrelation (a particular type of pattern in the residuals). Whenever you stack data you are highly likely to have this problem. Test Regression > Diagnostic > Serial Correlation (Durbin-Watson)
  • 44. 12: There is no serial correlation (aka auto…) • Regression > Diagnostic > Serial Correlation (Durbin-Watson) • These are significant. As mentioned, this is always the case. 44 Instructions for the case studies
  • 45. 13: The residuals have constant variance (i.e., no heteroscedasticity in a model with a linear outcome variable) 45 Options (ranked from best to worst) Comments Use a more appropriate model (e.g., ordered logit) This is not possible with Shapley. This models make other, hopefully less problematic, assumptions (beyond the scope of this webinar) Use robust standard errors This is not possible with Shapley. In Q: check Robust standard error Issue The standard tests for the significance of a predictor in a linear model assume that the variance of the residuals is constant. This is rarely the case in driver analysis, as usually the data is from a bounded scale (e.g., if it is a rating out of 10, it is impossible for a value to be observed that is greater than 10). Test Displayr automatically performs the Breusch-Pagen Test Type = Linear
  • 46. 13: The residuals have constant variance • Click on relative.importance. Note that there is a warning. • Change the Type to Ordered Logit 46 Instructions for the case studies
  • 47. Creating the donut chart (earlier in the presentation) • The first step is to create a new table containing the cola importance scores. Add a new R Output, with code: ColaImportance = cola.importance$relative.importance$importance # Removing "Performance: " from the labels names(ColaImportance) = flipFormat::TidyLabels(names(ColaImportance)) ColaImportance • Then, sort the cola importance scores, in a descending order: • Add a new R Output, with code: SortedColaImportance = sort(ColaImportance, decreasing = TRUE) • Create > Charts > Visualization > Donut Chart • Select SortedColaImportance as the Table • Data value suffix: % • Data label decimal places: 1 • Change the name in the report tree to Donut • Check Automatic (at the top) 47 Instructions for the case studies
  • 48. Creating the performance-importance chart (earlier in the presentation) • Right-click on donut in the Report tree and select Add Table • In the blue drop-down menu, select Performance • Click on Brand preference, at the top of the Report tree • Add another table and select Brand [Stacked Cola Brand Associations.sav] in the blue drop-down menu • Right-click on the 17% next to Diet Coke and select Create Filter • Make sure that Apply filter to the current table is not selected, and press OK • Click the Performance table, and apply the Diet Coke filter (using the Filter drop-down, at the bottom-left of the screen) • Right-click on Performance and select Reference Name and change this to performance • Right-click on the Performance table in the Report tree and select Add R Output, entering the code PerformanceImportance = cbind("Importance (%)" = ColaImportance, "Diet Coke Brand Associations (%)" = table.Performance[- nrow(table.Performance)]) • Create > Charts > Visualization > Labeled Scatterplot • Set Table to PerformanceImportance • Rename to Perfomance Importance Chart • Press Calculate 48 Instructions for the case studies
  • 49. Creating the correspondence analysis chart (earlier in the presentation) • Right-click on Performance Importance Chartin the Report tree and select Add Table • In the blue drop-down menu, select Performance • In the brown drop-down select Brand [Stacked Cola Brand Associations.sav] • Right-click on None of these and select Remove • Create > Dimension Reduction > Correspondence Analysis of a Table • Table: Performance by Brand • Output: Bubble Chart • Bubble sizes: ColaImportance • Legend title: Importance 49 Instructions for the case studies
  • 50. TIM BOCK PRESENTS Q&A Session Type questions into the Questions fields in GoToWebinar. If we do not get to your question during the webinar, we will write back via email. We will email you a link to the slides and data. Get a free one-month trial of Q from www.q-researchsoftware.com.