Writing surveysthatwork

About this talk
What this talk is about:

• The main concepts in surveys and questionnaires

• Some “best practices” and general principles

• There’s no way we can cover everything (not even a Ph.D. covers everything)

What this talk isn’t about:

• Statistical methods

• Sampling theory

• Scholarly literature

What you should get from this talk:

• The ability to constructively critique questionnaires

• The perspective needed to do better survey research

What is a survey?
1. Census != Surveys
Census: an entire population
Survey: a sample representing a population
2. Surveys != Questionnaire
Surveys: highly structured process of measuring
self-reported attitudes, opinions, beliefs, habits,
behaviors of a population via a sample
Questionnaire: instrument used in surveys that
is distributed to the sample

Survey issues
Sampling
Who is the population? Is it possible to use the
whole population? If not, how am I sampling? Is
my method representative?
Design
Are all respondents getting the same survey? Or
do I have multiple conditions?
Analysis
What are the data going to look like? How should I
use counts or proportions? Are my results
statistically significant?

Survey issues
A talk for another time, perhaps!

Questionnaire issues
Question wording
Have I written this question using unambiguous
language? Will every word be understood the
same way by every respondent?
Response methods
What options should I give to the respondent?
Should I use scales? Agree-disagree? Open-
endeds? Should I include no opinion/neutral?
Question ordering
Does it matter which order I put my questions or
response options?

The Construct
Constructs are theoretical variables that you
can’t measure directly
• Examples: user satisfaction, attitude toward the mission

The questionnaire is the instrument used to
measure constructs through observed
variables
• Examples: Likert scales, feeling thermometers

Always consider the following: is my construct valid?
Am I asking respondents questions that are
accurately measuring this construct?

The Construct
Some things to think about your construct:
• What’s the polarity? Does it have valence?
• How would I describe its continuum?
• What’s the dimensionality?

Questionnaires: Wording
If respondents don’t understand your question in the exact same way and
can’t respond equally easily, you will get measurement error.

“Which of the following changes to
Firefox would have the most impact
on your experience?”
Vocabulary ambiguity


“Did you know that Mozilla is a mission-
driven organization to make the
Internet a better place?”
Double-barreled


“Would you say that mobile Firefox
is better than any other mobile
browser available on the
market?”
Lack of balance


“How strongly do you agree or
disagree that Mozilla is a positive
force for Internet privacy?”
Prone to cognitive bias


“Rank these 20 features in order of
most useful to least.”
Prone to satisficing

Questionnaires: Responses

1.Make it as easy as possible for every
respondent to respond!

2. The response options should map as
closely to the construct’s continuum as
possible.


“Can I use a rating scale?”

Unipolar measure = 5pt scale (e.g.
“Not at all -> All the time”)

Bipolar measure = 7pt scale, with
neutral point (e.g. “Strongly agree-
Strongly disagree”)


“Should I enumerate my options or
fully-label them?”

Fully-labeled, non-enumerated options
for scales have been shown to be the
most reliable.
Remember, one respondent’s “3”
might not be the same as another’s!

“Should I include “don’t know“/ “no
opinion” / neutral points?”

Pro: You may get more accurate
responses from low knowledge
respondents (or ones without
opinions)

Con: You may see increased
satisficing

“Can I use ranking?”

Only with a few items, and only if you think all respondents
will be able to clearly distinguish between all options.

What if most respondents don’t care about almost all of
your options?

What if they can’t choose the third most important item
between three different options (equally important)?

Most importantly, how are you going to do your analysis?

“Can I use agree-disagree?”

Think about the eventual distribution of responses
to these questions; it is almost always easier to
agree than to disagree with statements.

It is harder to evaluate from a negative frame than
a positive, so flipping the valence of a question
might not help.

There are, however, exceptions.


“Should I ask for specific quantities?”

Humans are not very accurate at any
quantitatively specific.

Stick to intervals and natural
frequencies (1/10, not 10%) as much
as possible.


“What kind of options should I use for
habitual or behavioral questions?”

Humans are also bad at remembering
their previous habits or behaviors.

Use average time periods, e.g. “In an
average day/week/month…”

“When should I use open-ended questions?”
They are great for exploratory but not confirmatory research
They are also useful if you don’t want to bias your respondents
towards choosing options that they haven’t seen before

“How many open-ended questions can I use?”
Thoughtful, deliberative responses are extremely taxing
cognitively.
If you want a good response rate, never make them
mandatory.
If you must, use them sparingly. No more than 1-3, and try not
to put them together.

Questionnaires: Ordering
Why should I care about the order of questions or
responses?

Questions might have spillover influence on future responses:
The answer to question x might affect responses to question x + 1…n.
This is why demographic questions tend to put at the end of questionnaires.

Response option ordering might skew your distribution:
People tend to focus more on earlier or later options, and spend less time
evaluating middle options (primacy or recency effects).

One way to protect against ordering effects: randomization
Blocks of questions: randomize between blocks and/or within blocks
Response options: ranking, list ordering, polarity

A few examples
Now we’re going to walk through some examples to show how question
phrasing and response options can influence your conclusions.

Consider a classic example: how satisfied are users with a product?

A reasonable first approach: why don’t we just ask users how satisfied
they are with the product?

A few examples
Not bad!

But what does satisfied mean? Do respondents have a set of features
that they evaluate a product on? Does “satisfied” mean that the
product is doing a better job on delivering those features than
otherwise? Are people carefully considering each of these features
when they evaluate a product for their level of satisfaction?

Maybe we should just ask about likelihood to recommend the
product. After all, if they’re satisfied with it, they’re probably more
likely to recommend it to other people they encounter who are in the
market for a product like ours.

A few examples
Now we’re getting some interesting differences; more users say they’re
willing to recommend this product than are satisfied.

At this point, we could do some interesting comparisons; who are these
users who endorse one answer to the first question and then a different
position on the second question?

But remember, we’re still trying to get to this idea of satisfaction. Clearly,
there’s a bit of difference between satisfaction and willingness to
recommend.

What if we just ask about likability? After all, both satisfaction and
willingness to recommend presuppose that you generally like the
product.

A few examples
Likability shows a different distribution of responses than the other two
questions! From this response, we see that more users report that they “like
[the product] a great deal” than they report their satisfaction or their
willingness to recommend.

From these three questions, we can get to a much better understanding of
how attitudes towards the product can influence willingness to recommend it
to others.

Let’s compare this to a well-known, widely used question for measuring
customer satisfaction.

A few examples
This is an 11-pt, partially labeled, unipolar scale with a neutral point. Can you
list all the problems with this approach?

A common way that people use this type of question: subtract the proportion
of respondents who indicate 6 or less from the proportion of respondents who
indicate 9 or up (apologies that the 7+ responses are lumped together in
gray).

Note how the distribution of responses to this question does not allow you the
kind of insight that you would develop from the previous three responses.
Look at how all responses below “neutral” are lumped together.

Note how a single question would not capture the differences between
willingness to recommend, satisfaction, and likability.

Best Practices
1. Always write down your research goal. You should write it down in 2-3
sentences so that a stranger can understand it.

2. Verify that you can’t achieve your research goal through behavioral
measures.

3. Try to make your research questions as clear as possible. This makes it easier
to write your questionnaire to directly address your questions.

4. Work with at least one other person in creating your questionnaire.
5. Pretest your survey with naïve respondents.
6. Always think about the distribution of responses!
7. Don’t put too much emphasis on statistical significance. Remember, you
can make anything significant with enough respondents.

8. Most importantly, it’s questionnaire design not engineering. These aren’t
rules, but guidelines to get better results!

Contact Rebecca
LDAP: rweiss@mozilla.com
IRC: rweiss

Writing surveysthatwork

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Writing surveysthatwork

Similar to Writing surveysthatwork (20)

Writing surveysthatwork