SlideShare a Scribd company logo
1 of 30
1
DATA PREPARATION
AND
PROCESSING
2
DATA PREPARATION
• Once data is collected, process of analysis
begins.
• But, data has to be translated in an appropriate
form.
• This process is known as Data Preparation
3
STEPS IN DATA PREPARATION
• Validate data
• Questionnaire checking
• Edit acceptable questionnaires
• Code the questionnaires
• Keypunch the data
• Clean the data set
• Statistically adjust the data
• Store the data set for analysis
• Analyse data
4
VALIDATION
• Validity exists when the data actually measure
what they are suppose to measure. If they fail
to, they are misleading and should not be
accepted.
• One of the most serious concerns is errors in
survey data.
• When secondary data are involved, they may
be ancient or unimportant.
• With primary data also, this review is
important.
5
QUESTIONNAIRE CHECKING
• A questionnaire returned from the field may be
unacceptable for several reasons.
–Parts of the questionnaire may be
incomplete. Inadequate answers. No
responses to specific questions
–The pattern of responses may indicate that
the respondent did not understand or follow
the instructions.
–The responses show little variance.
–One or more pages are missing.
6
QUESTIONNAIRE CHECKING
–The questionnaire is answered by someone
who does not qualify for participation.
–Fictitious interviews
–Inconsistencies
–Illegible responses
–Yea- or nay-saying patterns
–Middle-of-the-road patterns
7
EDITING
• Next phase of data preparation involves
editing of the raw data.
• Three basic approaches:
- Go back to the respondents for clarification
- Infer from other responses
- Discard the response altogether
8
Treatment of Unsatisfactory Responses
Treatment of
Unsatisfactory
Responses
Return to the
Field
Discard
Unsatisfactory
Respondents
Assign Missing
Values
Substitute a
Neutral Value
Casewise
Deletion
Pairwise
Deletion
9
Treatment of Unsatisfactory Results:
- Returning to the Field – The
questionnaires with unsatisfactory responses
may be returned to the field, where the
interviewers recontact the respondents.
- Assigning Missing Values – If returning the
questionnaires to the field is not feasible, the
editor may assign missing values to
unsatisfactory responses.
- Discarding Unsatisfactory Respondents –
In this approach, the respondents with
unsatisfactory responses are simply discarded
10
CODING
• Data entry refers to the creation of a
computer file that holds the raw data taken
from all of the questionnaires deemed suitable
for analysis
• Coding means assigning a code, usually a
number, to each possible response to each
question. The code includes an indication of
the column position (field) and data record it
will occupy.
11
CODING
• Fixed field codes, which mean that the
number of records for each respondent is the
same and the same data appear in the same
column(s) for all respondents, are highly
desirable.
–If possible, standard codes should be used
for missing data. Coding of structured
questions is relatively simple, since the
response options are predetermined.
12
CODING
–In questions that permit a large number of
responses, each possible response option
should be assigned a separate column.
–Guidelines for coding unstructured questions:
– Category codes should be mutually exclusive and
collectively exhaustive.
– Only a few (10% or less) of the responses should fall into
the “other” category.
– Category codes should be assigned for critical issues even
if no one has mentioned them.
– Data should be coded to retain as much detail as possible.
13
CODING
• Principles for establishing categories for
coding:
- Convenient number of categories
- Similar responses within categories
- Differences of responses between categories
- Mutually exclusive categories
- Exhaustive categories
- Avoid open-ended class intervals
- Class interval of the same width
- Midpoints of class intervals
14
CODE BOOK
• A codebook contains coding instructions and
the necessary information about variables in
the data set. A codebook generally contains
the following information:
- column number
- record number
- variable number
- variable name
- question number
- instructions for coding
15
CODE BOOK
• Thus, a Data code book identifies all of the
variable names and code numbers associated
with each possible response to each question
that makes up the data set
16
Restaurant Preference
ID PREFER. QUALITY QUANTITY VALUE SERVICE INCOME
1 2 2 3 1 3 6
2 6 5 6 5 7 2
3 4 4 3 4 5 3
4 1 2 1 1 2 5
5 7 6 6 5 4 1
6 5 4 4 5 4 3
7 2 2 3 2 3 5
8 3 3 4 2 3 4
9 7 6 7 6 5 2
10 2 3 2 2 2 5
11 2 3 2 1 3 6
12 6 6 6 6 7 2
13 4 4 3 3 4 3
14 1 1 3 1 2 4
15 7 7 5 5 4 2
16 5 5 4 5 5 3
17 2 3 1 2 3 4
18 4 4 3 3 3 3
19 7 5 5 7 5 5
20 3 2 2 3 3 3
17
A Codebook Excerpt
Column
Number
Variable
Number
Variable
Name
Question
Number
Coding
Instructions
1 1 ID 1 to 20 as coded
2 2 Preference 1 input the number circled.
1=Weak Preference
7=Strong Preference
3 3 Quality 2 Input the number circled.
1=Poor
7=Excellent
4 4 Quantity 3 Input the number circled.
1=Poor
7=Excellent
18
A Codebook Excerpt
Column
Number
Variable
Number
Variable
Name
Question
Number
Coding
Instructions
5 5 Value 4 Input the number circled.
1=Poor
7=Excellent
6 6 Service 5 Input the number circled.
1=Poor
7=Excellent
7 7 Income 6 Input the number circled.
1 = Less than $20,000
2 = $20,000 to 34,999
3 = $35,000 to 49,999
4 = $50,000 to 74,999
5 = $75,000 to 99,999
6 = $100,00 or more
19
SPSS Variable View of the Data of Table
20
Keypunch the data / Data
transcription
• Transcribing data is the process of
transferring the coded data from the
questionnaire or coding sheets onto
disks or magnetic tapes or directly into
computers by keypunching.
21
Keypunch the data / Data transcription
Transcribed Data
CATI /
CAPI
Keypunching via
CRT Terminal
Optical
Scanning
Mark Sense
Forms
Computerized
Sensory
Analysis
Verification:Correct
Keypunching Errors
Disks Magnetic
Tapes
Computer
Memory
Raw Data
22
Data Cleaning
• Consistency Checks
- Consistency checks identify data that are out of
range, logically inconsistent, or have extreme
values.
- Computer packages like SPSS, SAS, EXCEL and
MINITAB can be programmed to identify out-of-
range values for each variable and print out the
respondent code, variable code, variable name,
record number, column number, and out-of-range
value.
- Extreme values should be closely examined.
23
Data Cleaning
• Treatment of Missing Responses
• Substitute a Neutral Value – A neutral value, typically the
mean response to the variable, is substituted for the missing
responses.
• Substitute an Imputed Response – The respondents' pattern
of responses to other questions are used to impute or
calculate a suitable response to the missing questions.
• In case wise deletion, cases, or respondents, with any
missing responses are discarded from the analysis.
• In pair wise deletion, instead of discarding all cases with
any missing values, the researcher uses only the cases or
respondents with complete responses for each calculation.
24
Statistically Adjusting the Data
• Weighting
• In weighting, each case or respondent in the
database is assigned a weight to reflect its
importance relative to other cases or respondents.
• Weighting is most widely used to make the sample
data more representative of a target population on
specific characteristics.
• Yet another use of weighting is to adjust the
sample so that greater importance is attached to
respondents with certain characteristics.
25
Statistically Adjusting the Data
Use of Weighting for Representativeness
Years of Sample Population
Education Percentage Percentage Weight
Elementary School
0 to 7 years 2.49 4.23 1.70
8 years 1.26 2.19 1.74
High School
1 to 3 years 6.39 8.65 1.35
4 years 25.39 29.24 1.15
College
1 to 3 years 22.33 29.42 1.32
4 years 15.02 12.01 0.80
5 to 6 years 14.94 7.36 0.49
7 years or more 12.18 6.90 0.57
Totals 100.00 100.00
26
Statistically Adjusting the Data
• Variable Respecification
• Variable respecification involves the transformation of
data to create new variables or modify existing
variables.
• E.G., the researcher may create new variables that are
composites of several other variables.
• Dummy variables are used for respecifying categorical
variables. The general rule is that to respecify a
categorical variable with K categories, K-1 dummy
variables are needed
27
Statistically Adjusting the Data
Product Usage Original Dummy Variable Code
Category Variable
Code X1 X2 X3
Nonusers 1 1 0 0
Light users 2 0 1 0
Medium users 3 0 0 1
Heavy users 4 0 0 0
Note that X1 = 1 for nonusers and 0 for all others. Likewise, X2 =
1 for light users and 0 for all others, and X3 = 1 for medium users
and 0 for all others. In analyzing the data, X1, X2, and X3 are
used to represent all user/nonuser groups
28
Statistically Adjusting the Data
• Scale Transformation and Standardization:
- Scale transformation involves a manipulation of scale
values to ensure comparability with other scales or
otherwise make the data suitable for analysis.
- A more common transformation procedure is
standardization. Standardized scores, Zi, may be
obtained as:
Zi = (Xi - )/sxX
29
A Classification of Univariate Techniques
Independent Related
Independent Related
* Two- Group test
* Z test
* One-Way
ANOVA
* Paired
t test * Chi-Square
* Mann-Whitney
* Median
* K-S
* K-W ANOVA
* Sign
* Wilcoxon
* McNemar
* Chi-Square
Metric Data Non-numeric Data
Univariate Techniques
One Sample Two or More
Samples
One Sample Two or More
Samples
* t test
* Z test
* Frequency
* Chi-Square
* K-S
* Runs
* Binomial
30
A Classification of Multivariate Techniques
More Than One
Dependent
Variable
* Multivariate
Analysis of
Variance and
Covariance
* Canonical
Correlation
* Multiple
Discriminant
Analysis
* Cross-
Tabulation
* Analysis of
Variance and
Covariance
* Multiple
Regression
* Conjoint
Analysis
* Factor
Analysis
One Dependent
Variable
Variable
Interdependence
Interobject
Similarity
* Cluster Analysis
* Multidimensional
Scaling
Dependence
Technique
Interdependence
Technique
Multivariate Techniques

More Related Content

What's hot

Measurement & scaling ,Research methodology
    Measurement & scaling ,Research methodology    Measurement & scaling ,Research methodology
Measurement & scaling ,Research methodologySONA SEBASTIAN
 
Research design and types of research design final ppt
Research design and types of research design final pptResearch design and types of research design final ppt
Research design and types of research design final pptPrahlada G Bhakta
 
processng and analysis of data
 processng and analysis of data processng and analysis of data
processng and analysis of dataAruna Poddar
 
Steps in research process
Steps in research processSteps in research process
Steps in research processNasir Mughal
 
sampling error.pptx
sampling error.pptxsampling error.pptx
sampling error.pptxtesfkeb
 
SAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORSSAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORSrambhu21
 
Analysis and interpretation of data
Analysis and interpretation of dataAnalysis and interpretation of data
Analysis and interpretation of datateppxcrown98
 
Research process
Research processResearch process
Research processaditi garg
 
data collection primary secondary methods
data collection primary secondary methodsdata collection primary secondary methods
data collection primary secondary methodsAlen philip
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collectionsimij
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data ProcessingDrMAlagupriyasafiq
 
Research Design
Research DesignResearch Design
Research Designgaurav22
 
Techniques involved in defining a research problem
Techniques involved in defining a research problemTechniques involved in defining a research problem
Techniques involved in defining a research problemDr.Sangeetha R
 
Research Process and Research Design.
Research Process and Research Design.Research Process and Research Design.
Research Process and Research Design.Utkarsh Gupta
 
Format of research report
Format of research reportFormat of research report
Format of research reportRam Doss
 
Measurement in research
Measurement in researchMeasurement in research
Measurement in researchBikram Pradhan
 

What's hot (20)

Measurement & scaling ,Research methodology
    Measurement & scaling ,Research methodology    Measurement & scaling ,Research methodology
Measurement & scaling ,Research methodology
 
Research design
Research designResearch design
Research design
 
Research design and types of research design final ppt
Research design and types of research design final pptResearch design and types of research design final ppt
Research design and types of research design final ppt
 
processng and analysis of data
 processng and analysis of data processng and analysis of data
processng and analysis of data
 
Steps in research process
Steps in research processSteps in research process
Steps in research process
 
sampling error.pptx
sampling error.pptxsampling error.pptx
sampling error.pptx
 
SAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORSSAMPLING AND SAMPLING ERRORS
SAMPLING AND SAMPLING ERRORS
 
Primary & Secondary Data
Primary & Secondary DataPrimary & Secondary Data
Primary & Secondary Data
 
Analysis and interpretation of data
Analysis and interpretation of dataAnalysis and interpretation of data
Analysis and interpretation of data
 
Research process
Research processResearch process
Research process
 
Chapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATIONChapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATION
 
data collection primary secondary methods
data collection primary secondary methodsdata collection primary secondary methods
data collection primary secondary methods
 
Methods of data collection
Methods of data collectionMethods of data collection
Methods of data collection
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
Research Design
Research DesignResearch Design
Research Design
 
sampling ppt
sampling pptsampling ppt
sampling ppt
 
Techniques involved in defining a research problem
Techniques involved in defining a research problemTechniques involved in defining a research problem
Techniques involved in defining a research problem
 
Research Process and Research Design.
Research Process and Research Design.Research Process and Research Design.
Research Process and Research Design.
 
Format of research report
Format of research reportFormat of research report
Format of research report
 
Measurement in research
Measurement in researchMeasurement in research
Measurement in research
 

Viewers also liked

Initial analysis of data metpen
Initial analysis of data metpenInitial analysis of data metpen
Initial analysis of data metpenGfv Gfv
 
Multivariate Analysis Techniques
Multivariate Analysis TechniquesMultivariate Analysis Techniques
Multivariate Analysis TechniquesMehul Gondaliya
 
2012 data analysis
2012 data analysis2012 data analysis
2012 data analysischerylyap61
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretationVasista Vinuthan
 
Business Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisBusiness Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisAhsan Khan Eco (Superior College)
 
Statistical analysis of process data 7 stages oil flow chart power point temp...
Statistical analysis of process data 7 stages oil flow chart power point temp...Statistical analysis of process data 7 stages oil flow chart power point temp...
Statistical analysis of process data 7 stages oil flow chart power point temp...SlideTeam.net
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methodsguest9fa52
 
ppt on data collection , processing , analysis of data & report writing
ppt on data collection , processing , analysis of data & report writingppt on data collection , processing , analysis of data & report writing
ppt on data collection , processing , analysis of data & report writingIVRI
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataRoqui Malijan
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate AnalysisSoumya Sahoo
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysissristi1992
 
Quantitative Data Analysis
Quantitative Data AnalysisQuantitative Data Analysis
Quantitative Data AnalysisAsma Muhamad
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 

Viewers also liked (18)

Initial analysis of data metpen
Initial analysis of data metpenInitial analysis of data metpen
Initial analysis of data metpen
 
Multivariate Analysis Techniques
Multivariate Analysis TechniquesMultivariate Analysis Techniques
Multivariate Analysis Techniques
 
Data Interpretation
Data Interpretation Data Interpretation
Data Interpretation
 
2012 data analysis
2012 data analysis2012 data analysis
2012 data analysis
 
Data interpretation
Data interpretationData interpretation
Data interpretation
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretation
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Univariate Analysis
Univariate AnalysisUnivariate Analysis
Univariate Analysis
 
Business Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysisBusiness Research Methods. data collection preparation and analysis
Business Research Methods. data collection preparation and analysis
 
Statistical analysis of process data 7 stages oil flow chart power point temp...
Statistical analysis of process data 7 stages oil flow chart power point temp...Statistical analysis of process data 7 stages oil flow chart power point temp...
Statistical analysis of process data 7 stages oil flow chart power point temp...
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methods
 
ppt on data collection , processing , analysis of data & report writing
ppt on data collection , processing , analysis of data & report writingppt on data collection , processing , analysis of data & report writing
ppt on data collection , processing , analysis of data & report writing
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of Data
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
 
Multivariate analysis
Multivariate analysisMultivariate analysis
Multivariate analysis
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysis
 
Quantitative Data Analysis
Quantitative Data AnalysisQuantitative Data Analysis
Quantitative Data Analysis
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 

Similar to Data Preparation and Processing

Data Collection Preparation
Data Collection PreparationData Collection Preparation
Data Collection PreparationBusiness Student
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aRai University
 
RSS 2012 Data Entry SPSS
RSS 2012 Data Entry SPSSRSS 2012 Data Entry SPSS
RSS 2012 Data Entry SPSSWesam Abuznadah
 
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningBrief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningJennifer Morrow
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Ho Cao Viet
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
 
Data analysis and Presentation
Data analysis and PresentationData analysis and Presentation
Data analysis and PresentationJignesh Kariya
 
Lecture 1- data preparation.pptx
Lecture 1- data preparation.pptxLecture 1- data preparation.pptx
Lecture 1- data preparation.pptxEricRajat
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vsIan Feller
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 
Basics of Data Analysis
Basics of Data AnalysisBasics of Data Analysis
Basics of Data Analysisankurjain1909
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Abdm4064 week 11 data analysis
Abdm4064 week 11 data analysisAbdm4064 week 11 data analysis
Abdm4064 week 11 data analysisStephen Ong
 
DATA PROCESSING on marketing research...
DATA PROCESSING on marketing research...DATA PROCESSING on marketing research...
DATA PROCESSING on marketing research...120SupritBhuyan
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfAmmarAhmedSiddiqui2
 

Similar to Data Preparation and Processing (20)

Malhotra14
Malhotra14Malhotra14
Malhotra14
 
Data Collection Preparation
Data Collection PreparationData Collection Preparation
Data Collection Preparation
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
 
RSS 2012 Data Entry SPSS
RSS 2012 Data Entry SPSSRSS 2012 Data Entry SPSS
RSS 2012 Data Entry SPSS
 
Unit 5.pptx
Unit 5.pptxUnit 5.pptx
Unit 5.pptx
 
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data CleaningBrief Introduction to the 12 Steps of Evaluation Data Cleaning
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
Data analysis and Presentation
Data analysis and PresentationData analysis and Presentation
Data analysis and Presentation
 
Lecture 1- data preparation.pptx
Lecture 1- data preparation.pptxLecture 1- data preparation.pptx
Lecture 1- data preparation.pptx
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs
 
Dataanalysis
DataanalysisDataanalysis
Dataanalysis
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Basics of Data Analysis
Basics of Data AnalysisBasics of Data Analysis
Basics of Data Analysis
 
Data analysis copy
Data analysis   copyData analysis   copy
Data analysis copy
 
Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Abdm4064 week 11 data analysis
Abdm4064 week 11 data analysisAbdm4064 week 11 data analysis
Abdm4064 week 11 data analysis
 
DATA PROCESSING on marketing research...
DATA PROCESSING on marketing research...DATA PROCESSING on marketing research...
DATA PROCESSING on marketing research...
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 

Recently uploaded

Publuu Demo Presentation Brochure Online
Publuu Demo Presentation Brochure OnlinePubluu Demo Presentation Brochure Online
Publuu Demo Presentation Brochure OnlinePubluu
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!dstvtechnician
 
Aryabhata I, II of mathematics of both.pptx
Aryabhata I, II of mathematics of both.pptxAryabhata I, II of mathematics of both.pptx
Aryabhata I, II of mathematics of both.pptxtegevi9289
 
Unraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptxUnraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptxelizabethella096
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Onlineanilsa9823
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.DanielaQuiroz63
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15SearchNorwich
 
Labour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxLabour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxelizabethella096
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxelizabethella096
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesSearch Engine Journal
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...ChesterYang6
 
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
April 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupApril 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupVbout.com
 
The+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfThe+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfSocial Samosa
 
Kraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationKraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationtbatkhuu1
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsVWO
 

Recently uploaded (20)

Publuu Demo Presentation Brochure Online
Publuu Demo Presentation Brochure OnlinePubluu Demo Presentation Brochure Online
Publuu Demo Presentation Brochure Online
 
Digital Strategy Master Class - Andrew Rupert
Digital Strategy Master Class - Andrew RupertDigital Strategy Master Class - Andrew Rupert
Digital Strategy Master Class - Andrew Rupert
 
Local SEO Domination: Put your business at the forefront of local searches!
Local SEO Domination:  Put your business at the forefront of local searches!Local SEO Domination:  Put your business at the forefront of local searches!
Local SEO Domination: Put your business at the forefront of local searches!
 
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan ScheltgenHow to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
 
Aryabhata I, II of mathematics of both.pptx
Aryabhata I, II of mathematics of both.pptxAryabhata I, II of mathematics of both.pptx
Aryabhata I, II of mathematics of both.pptx
 
Unraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptxUnraveling the Mystery of The Circleville Letters.pptx
Unraveling the Mystery of The Circleville Letters.pptx
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
 
Labour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxLabour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptx
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
 
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose GuirgisCreator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
 
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 150 Noida Escorts >༒8448380779 Escort Service
 
April 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupApril 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting Group
 
The+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfThe+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdf
 
Kraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationKraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentation
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 Reports
 

Data Preparation and Processing

  • 2. 2 DATA PREPARATION • Once data is collected, process of analysis begins. • But, data has to be translated in an appropriate form. • This process is known as Data Preparation
  • 3. 3 STEPS IN DATA PREPARATION • Validate data • Questionnaire checking • Edit acceptable questionnaires • Code the questionnaires • Keypunch the data • Clean the data set • Statistically adjust the data • Store the data set for analysis • Analyse data
  • 4. 4 VALIDATION • Validity exists when the data actually measure what they are suppose to measure. If they fail to, they are misleading and should not be accepted. • One of the most serious concerns is errors in survey data. • When secondary data are involved, they may be ancient or unimportant. • With primary data also, this review is important.
  • 5. 5 QUESTIONNAIRE CHECKING • A questionnaire returned from the field may be unacceptable for several reasons. –Parts of the questionnaire may be incomplete. Inadequate answers. No responses to specific questions –The pattern of responses may indicate that the respondent did not understand or follow the instructions. –The responses show little variance. –One or more pages are missing.
  • 6. 6 QUESTIONNAIRE CHECKING –The questionnaire is answered by someone who does not qualify for participation. –Fictitious interviews –Inconsistencies –Illegible responses –Yea- or nay-saying patterns –Middle-of-the-road patterns
  • 7. 7 EDITING • Next phase of data preparation involves editing of the raw data. • Three basic approaches: - Go back to the respondents for clarification - Infer from other responses - Discard the response altogether
  • 8. 8 Treatment of Unsatisfactory Responses Treatment of Unsatisfactory Responses Return to the Field Discard Unsatisfactory Respondents Assign Missing Values Substitute a Neutral Value Casewise Deletion Pairwise Deletion
  • 9. 9 Treatment of Unsatisfactory Results: - Returning to the Field – The questionnaires with unsatisfactory responses may be returned to the field, where the interviewers recontact the respondents. - Assigning Missing Values – If returning the questionnaires to the field is not feasible, the editor may assign missing values to unsatisfactory responses. - Discarding Unsatisfactory Respondents – In this approach, the respondents with unsatisfactory responses are simply discarded
  • 10. 10 CODING • Data entry refers to the creation of a computer file that holds the raw data taken from all of the questionnaires deemed suitable for analysis • Coding means assigning a code, usually a number, to each possible response to each question. The code includes an indication of the column position (field) and data record it will occupy.
  • 11. 11 CODING • Fixed field codes, which mean that the number of records for each respondent is the same and the same data appear in the same column(s) for all respondents, are highly desirable. –If possible, standard codes should be used for missing data. Coding of structured questions is relatively simple, since the response options are predetermined.
  • 12. 12 CODING –In questions that permit a large number of responses, each possible response option should be assigned a separate column. –Guidelines for coding unstructured questions: – Category codes should be mutually exclusive and collectively exhaustive. – Only a few (10% or less) of the responses should fall into the “other” category. – Category codes should be assigned for critical issues even if no one has mentioned them. – Data should be coded to retain as much detail as possible.
  • 13. 13 CODING • Principles for establishing categories for coding: - Convenient number of categories - Similar responses within categories - Differences of responses between categories - Mutually exclusive categories - Exhaustive categories - Avoid open-ended class intervals - Class interval of the same width - Midpoints of class intervals
  • 14. 14 CODE BOOK • A codebook contains coding instructions and the necessary information about variables in the data set. A codebook generally contains the following information: - column number - record number - variable number - variable name - question number - instructions for coding
  • 15. 15 CODE BOOK • Thus, a Data code book identifies all of the variable names and code numbers associated with each possible response to each question that makes up the data set
  • 16. 16 Restaurant Preference ID PREFER. QUALITY QUANTITY VALUE SERVICE INCOME 1 2 2 3 1 3 6 2 6 5 6 5 7 2 3 4 4 3 4 5 3 4 1 2 1 1 2 5 5 7 6 6 5 4 1 6 5 4 4 5 4 3 7 2 2 3 2 3 5 8 3 3 4 2 3 4 9 7 6 7 6 5 2 10 2 3 2 2 2 5 11 2 3 2 1 3 6 12 6 6 6 6 7 2 13 4 4 3 3 4 3 14 1 1 3 1 2 4 15 7 7 5 5 4 2 16 5 5 4 5 5 3 17 2 3 1 2 3 4 18 4 4 3 3 3 3 19 7 5 5 7 5 5 20 3 2 2 3 3 3
  • 17. 17 A Codebook Excerpt Column Number Variable Number Variable Name Question Number Coding Instructions 1 1 ID 1 to 20 as coded 2 2 Preference 1 input the number circled. 1=Weak Preference 7=Strong Preference 3 3 Quality 2 Input the number circled. 1=Poor 7=Excellent 4 4 Quantity 3 Input the number circled. 1=Poor 7=Excellent
  • 18. 18 A Codebook Excerpt Column Number Variable Number Variable Name Question Number Coding Instructions 5 5 Value 4 Input the number circled. 1=Poor 7=Excellent 6 6 Service 5 Input the number circled. 1=Poor 7=Excellent 7 7 Income 6 Input the number circled. 1 = Less than $20,000 2 = $20,000 to 34,999 3 = $35,000 to 49,999 4 = $50,000 to 74,999 5 = $75,000 to 99,999 6 = $100,00 or more
  • 19. 19 SPSS Variable View of the Data of Table
  • 20. 20 Keypunch the data / Data transcription • Transcribing data is the process of transferring the coded data from the questionnaire or coding sheets onto disks or magnetic tapes or directly into computers by keypunching.
  • 21. 21 Keypunch the data / Data transcription Transcribed Data CATI / CAPI Keypunching via CRT Terminal Optical Scanning Mark Sense Forms Computerized Sensory Analysis Verification:Correct Keypunching Errors Disks Magnetic Tapes Computer Memory Raw Data
  • 22. 22 Data Cleaning • Consistency Checks - Consistency checks identify data that are out of range, logically inconsistent, or have extreme values. - Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to identify out-of- range values for each variable and print out the respondent code, variable code, variable name, record number, column number, and out-of-range value. - Extreme values should be closely examined.
  • 23. 23 Data Cleaning • Treatment of Missing Responses • Substitute a Neutral Value – A neutral value, typically the mean response to the variable, is substituted for the missing responses. • Substitute an Imputed Response – The respondents' pattern of responses to other questions are used to impute or calculate a suitable response to the missing questions. • In case wise deletion, cases, or respondents, with any missing responses are discarded from the analysis. • In pair wise deletion, instead of discarding all cases with any missing values, the researcher uses only the cases or respondents with complete responses for each calculation.
  • 24. 24 Statistically Adjusting the Data • Weighting • In weighting, each case or respondent in the database is assigned a weight to reflect its importance relative to other cases or respondents. • Weighting is most widely used to make the sample data more representative of a target population on specific characteristics. • Yet another use of weighting is to adjust the sample so that greater importance is attached to respondents with certain characteristics.
  • 25. 25 Statistically Adjusting the Data Use of Weighting for Representativeness Years of Sample Population Education Percentage Percentage Weight Elementary School 0 to 7 years 2.49 4.23 1.70 8 years 1.26 2.19 1.74 High School 1 to 3 years 6.39 8.65 1.35 4 years 25.39 29.24 1.15 College 1 to 3 years 22.33 29.42 1.32 4 years 15.02 12.01 0.80 5 to 6 years 14.94 7.36 0.49 7 years or more 12.18 6.90 0.57 Totals 100.00 100.00
  • 26. 26 Statistically Adjusting the Data • Variable Respecification • Variable respecification involves the transformation of data to create new variables or modify existing variables. • E.G., the researcher may create new variables that are composites of several other variables. • Dummy variables are used for respecifying categorical variables. The general rule is that to respecify a categorical variable with K categories, K-1 dummy variables are needed
  • 27. 27 Statistically Adjusting the Data Product Usage Original Dummy Variable Code Category Variable Code X1 X2 X3 Nonusers 1 1 0 0 Light users 2 0 1 0 Medium users 3 0 0 1 Heavy users 4 0 0 0 Note that X1 = 1 for nonusers and 0 for all others. Likewise, X2 = 1 for light users and 0 for all others, and X3 = 1 for medium users and 0 for all others. In analyzing the data, X1, X2, and X3 are used to represent all user/nonuser groups
  • 28. 28 Statistically Adjusting the Data • Scale Transformation and Standardization: - Scale transformation involves a manipulation of scale values to ensure comparability with other scales or otherwise make the data suitable for analysis. - A more common transformation procedure is standardization. Standardized scores, Zi, may be obtained as: Zi = (Xi - )/sxX
  • 29. 29 A Classification of Univariate Techniques Independent Related Independent Related * Two- Group test * Z test * One-Way ANOVA * Paired t test * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA * Sign * Wilcoxon * McNemar * Chi-Square Metric Data Non-numeric Data Univariate Techniques One Sample Two or More Samples One Sample Two or More Samples * t test * Z test * Frequency * Chi-Square * K-S * Runs * Binomial
  • 30. 30 A Classification of Multivariate Techniques More Than One Dependent Variable * Multivariate Analysis of Variance and Covariance * Canonical Correlation * Multiple Discriminant Analysis * Cross- Tabulation * Analysis of Variance and Covariance * Multiple Regression * Conjoint Analysis * Factor Analysis One Dependent Variable Variable Interdependence Interobject Similarity * Cluster Analysis * Multidimensional Scaling Dependence Technique Interdependence Technique Multivariate Techniques