Validity and reliability of questionnaires

VALIDITY
AND
RELIABILITY OF
QUESTIONNAIRES
Dr. R. VENKITACHALAM

CONTENTS
 Introduction
 Steps in questionnaire designing
 Validity
 Concept of validity
 Types of validity
 Steps in questionnaire validation
 Reliability
 Types and measurement of reliability
 Conclusion
 References

INTRODUCTION
 Questionnaire: Important method of data collection used
extensively
 Advantages of questionnaire
 Less expensive
 Offers greater anonymity
 Disadvantages
 Application is limited
 Response rate is low
 Opportunities to clarify issues is lacking

 Ideal requisites of a questionnaire:
 Should be clear and easy to understand
 Layout is easy to read and pleasant to eye
 Sequence of questions easy to follow
 Should be developed in an interactive style
 Sensitive questions must be worded exactly
 NOTE: The terminologies research instrument, measuring
instrument,scale and test in various parts of the seminar
represent questionnaire in this context . . . And item represents
each question in a questionnaire

Steps in questionnaire designing

The concept of validity
 Validity is the ability of an instrument to measure what it is intended to
measure.
 Degree to which the researcher has measured what he has set out to
measure (Smith, 1991)
 Are we measuring what we think we are measuring? (Kerlinger, 1973)
 Extent to which an empirical measure adequately reflects the real
meaning of the concept under consideration (Babbie, 1989)

Why validity ?
 Validity is done mainly to answer the following questions:
 Is the research investigation providing answers to the research
questions for which it was undertaken?
 If so, is it providing these answers using appropriate methods and
procedures?

Questions to ponder
Investigator
Readers of report
Experts in the field
Logic
Statistical tests

Logical thinking
 Justification of each question in relation to objective of study
 Easy if questions relate to tangible matters
 Difficult in situations where we are measuring attitude,
effectiveness of a program, satisfaction etc
 Everybody’s logic doesn’t match . . No statistical backing

Statistical procedures
 By calculating coefficient of correlations between
questions and outcome variables

Types of validity
Validity
Content
validity
Face validity
Criterion
related
Concurrent Predictive
Construct
validity

CONTENT VALIDITY
 Uses logical reasoning and hence easy to apply
 Extent to which a measuring instrument covers a
representative sample of the domain of the aspects measured
 Whether items and questions cover the full range of the
issues or problem being measured

FACE VALIDITY
 The extent to which a measuring instrument appears valid
on its surface
 Each question or item on the research instrument must have a
logical link with the objective

Face validity is not content validity. Why?
 Face validity
 Simply addresses whether a measuring instrument looks
valid
 Not a validity in technical sense because it does not refer
to what is actually being measured rather what it appears
to measure
 It has more to do with rapport and public relations than
with actual validity

Other aspects of content validity
 Coverage of issue should be balanced
 Each aspect should have similar and adequate representation
in questions

Problems associated with content validity
 Based on subjective logic; no definitive conclusion can be
drawn or consensus reached
 Extent to which questions reflect the objectives of the study
may differ. If wordings changed or question substituted,
magnitude of link changes

CRITERION VALIDITY
 The extent to which a measuring instrument accurately
predicts behaviour or ability in a given area.
 The measuring instrument is called ‘criteria’
 It is of two types:
 Predictive validity
 Concurrent validity

Predictive validity
 If the test is used to predict future performance
 Eg: Entrance exam . . . . Performance of these tests correlates
with later performance in professional college
 Eg: Written driving test
 Eg: measurement of sugar exposure for caries development

Concurrent validity
 If the test is used to estimate present performance or person’s
ability at the present time not attempting to predict future
outcomes
 Professional college exam
 Eg: driving test, pilot test
 Eg: measurement of DMFT for caries experience

Problems in criterion validity
 Cannot be used in all circumstances
 Esp in social sciences where some conditions do not have a
relevant criteria
 Eg: for measuring self-esteem, no criteria can be applied

CONSTRUCT VALIDITY
 Most important type of validity
 Assesses the extent to which a measuring instrument
accurately measures a theoretical construct it is designed to
measure
 Measured by correlating performance on the test with
performance on a test for which construct validity has been
determined
 Eg: a new index for measuring caries can be validated by
comparing its values with a standard index (like DMFT)

 Another method is to show that scores of the new test differs
across people with different levels of outcomes being
measured
 Eg: Establishing the validity of a new caries index by
applying it to different stages of dental caries and calculating
its accuracy

Summary of Validity
CONTENT CRITERION CONSTRUCT
CONCURRENT PREDICTIVE
What it
measures
Whether the test
covers a
representative
sample of the
domains to be
measured
The ability of
the test to
estimate present
performance
The ability of the
test to predict
future
performance
The extent to
which the
instrument
measures a
theoretical
construct
How it is
accomplished
Ask experts to
assess the test to
establish that the
items are
representative of
the outcome
Correlate
performance on
the test with a
concurrent
behaviour
Correlate
performance on
the test with a
behaviour in
future
Correlate
performance on
the instrument
with a
performance on
an established
instrument

Steps in
questionnaire
validation

FACE VALIDITY
 Evaluate in terms of:
Readability
Layout
and style
Clarity of wording
Feasibility

CONTENT VALIDITY
Two phases
Researcher: Conceptualization and
domain analysis
Experts: Enhancement of content of
questionnaire (Seven or more
experts)
Specify the full domain of
content that is relevant to
the issue
Sample specific areas form
this domain
Put items/questions in a form
that is testable

How do experts evaluate validity
 Method 1: Average Congruency Percentage (ACP)
[Popham, 1978]
 Experts compute the percentage of questions deemed
to be relevant for them
 Take the average of all experts
 If the value is > 90 . . . Valid
 Eg: 2 experts . . (Expert 1-100%, Expert 2-80%)
 Then ACP = 90%

 Method 2: Content validity index [Martuza 1977]
 Content validity Index for individual items (I-CVI)
 Content Validity Index for the scale (S-CVI)

I-CVI
 Panel of content experts asked to review the relevance of
each question on a 4-point Likert scale (minimum 3
maximum 10 experts)
 1= not relevant
 2= somewhat relevant
 3= relevant
 4= very relevant
 Then for each question, number of experts giving 3 or 4
score is counted (3,4 – relevant; 1,2 – nonrelevant)
 Proportion is calculated
 Eg: If 4/5 experts give score 3 or 4: I-CVI = 0.80

Critics of I-CVI
 Collapses experts multipoint assessment into two categories
(relevant and non-relevant)
 Does not give inference on comprehensiveness of whole
questionnaire
 Problem of chance agreement. To overcome that, Lynn
proposed
 Five or fewer experts: all must agree (I-CVI = 1.0)
 Six or more: (I-CVI should not be less than 0.78)

S-CVI
 The proportion of items on an instrument that achieved a
rating of 3 or 4 by all the content experts
 Two approaches:
 S-CVI/UA – Universal agreement
 S-CVI/Ave - Average

 Which would be an effective measure here ??
 S-CVI/UA or S-CVI/Ave

 Which to follow?
 Report both the values I-CVI and S-CVI rather than using
CVI as an acronym
 Report the range of I-CVI values
 The best method is S-CVI/UA for stringent validity, but
will be difficult to use if multiple experts are validating. .
In such situations S-CVI/Ave is used

CONSTRUCT VALIDITY
 Method: Factor analysis
 To examine empirically the interrelationship among items and to
identify clusters of items that share sufficient variation to justify
their existence as a factor or construct to be measured by the
instrument
 Various items are gathered into common factors
 Common factors are synthesized into fewer factors and then
relation between each item and factor is measured
 Unrelated items are eliminated

RELIABILITY
 Definition: It is the ability of an instrument to create
reproducible results
 Each time it is used, similar scores should be obtained
 A questionnaire is said to be reliable if we get same/similar
answers repeatedly
 Though it cannot be calculated exactly, it can be measured
by estimating correlation coefficients

Reliability measured in aspects of:
• Done to ensure that same results are obtained
when used consecutively for two or more times
• Test-retest method is used
STABILITY
• To ensure all subparts of a instrument measure
the same characteristic (Homogeneity)
• Split-half method
INTERNAL
CONSISTENCY
• Used when two observers study a single
phenomenon simulataneously
• Inter-rater reliability
EQUIVALENCE

Test-Retest reliability (for stability)
 Test administered twice to the same participant at different
times
 Used for things that are stable over time
 Easy and straight-forward approach
 Useful for questionnaires, checklist, rating scales etc
 Disadvantages
 Practice effect (mainly for tests)
 Too short intervals in between (effect of memory)
 Some traits may change with time

Statistical calculation
 Administration of instrument to a sample on two
different occasions
 Scores compared and calculated by using
correlation coefficient formula (pearson)

Correlation coefficient
 Measures the degree of relationship between two sets of
scores
 Can range from -1 to +1
 0 indicates absence of any relationships
Correlation coefficient Strength of relationship
+/- 0.7 to 1.0 Strong
+/- 0.3 to 0.69 Moderate
+/- 0.0 to 0.29 None to weak

Split halves reliability (homogenity)
 Split the contents of the questionnaire into two equivalent
halves; either odd/even number or first/second half
 Correlate scores of one half with scores of the other
 Formula: r = Σ (x-x’)(y-y’)
√ Σ(x-x’)2 (y-y’)2
 But this r is only for the half, so to check reliability of
entire test, use the formula

 R’ = 2r/1+r
 (r = coefficient of split half, R’ = coefficient of entire
test)
 Cronbach’s alpha:
 Another method of calculation using the formula:
R = k/k-1 (1-Σσ1
2/σy2)
k = total number of items in list
σ1 = variance of individual items
σy2 = variance of total test scores

Inter-rater reliability (Equivalence)
 Used when a single event is measured simultaneously and
independently by two or more trained observers
R = Number of agreements
Number of agreements + Number of disagreements

Summary of Reliability
TEST RETEST SPLIT HALF INTERRATER
What it
measures
Stability over
time
Equivalency of items Agreement between
raters
How it is
accomplished
Administer the
same test to the
same people at
two different
times
Correlate
performance for a
group of people on
two equivalent
halves of same test
Have multiple
researchers measure
same instrument and
determine percentage
of agreement between
them

Conclusion
 Validated questionnaire
 It is one which has undergone a validation procedure to
show that it accurately measures what it aims to,
regardless of who responds, when they respond, and to
whom they respond or when self-administered and whose
reliability has also been examined thereby:
 Reducing bias and ambiguities
 Better quality of data and credible information

In a nutshell . . . .
A questionnaire can be reliable but invalid . . .
But a valid questionnaire is always reliable . . .

Acknowledgements
 Dr. Joe Joseph
 Dr. Chandrashekar

References
 Linda Del Greco, Walop W, Richard H McCarthy. Questionnaire
development: 2. Validity and Reliability. CMAJ. 1987;136:699–700.
 Sushil S, Verma N. Questionnaire validation made easy. Eur J Sci Res.
2010;46(2):172–8.
 Polit DF, Cheryl Tatano Beck. The Content Validity Index: Are You Sure
You Know What’s Being Reported? Critique and Recommendations. Res
Nurs Health. 2006;29:489–97.
 Reliability and Validity Module 6. Cengage Learning; 2010.

 Rama B Radhakrishna. Tips for Developing and Testing
Questionnaires/Instruments. J Ext. 2007;35(1):710–4.
 06Article04.pdf [Internet]. [cited 2015 Apr 7]. Available from:
http://www.uk.sagepub.com/salkind2study/articles/06Article04.pdf
 pta_6871_6791004_64131.pdf [Internet]. [cited 2015 Apr 7]. Available
from:
http://cfd.ntunhs.edu.tw/ezfiles/6/1006/attach/33/pta_6871_6791004_6
4131.pdf
 Questionnaire designing and validation [Internet]. [cited 2015 Apr 7].
Available from:
http://www.jpma.org.pk/full_article_text.php?article_id=3414

 Suresh K Sharma. Nursing Research and Statistics. 1st ed. New Delhi:
Elsevier Saunders;
 Edward G, Richard Zeller. Reliability and Validity Assessment. New Delhi:
SAGE publication; 1979.
 Ranjit Kumar. Research Methodology - A step by step guide for beginners.
3rd ed. New Delhi: SAGE publication; 2012.
 Articles from Dr. Joe

Validity and reliability of questionnaires

Validity and reliability of questionnaires

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Validity and reliability of questionnaires

Similar to Validity and reliability of questionnaires (20)

Recently uploaded

Recently uploaded (20)

Validity and reliability of questionnaires