SlideShare a Scribd company logo
1 of 77
Introduction to Data Analysis
•Data Measurement
•Measurement of the data is the first step in the process that ultimately
guides the final analysis.
•Consideration of sampling, controls, errors (random and systematic)
and the required precision all influence the final analysis.
•Validation: Instruments and methods used to measure the data must
be validated for accuracy.
•Precision and accuracy…Determination of error
•Social vs. Physical Sciences
1
Introduction to Data Analysis
•Types of data
•Univariate/Multivariate
•Univariate: When we use one variable to describe a person, place,
or thing. (e.g. Heights of individuals)
•Multivariate: When we use two or more variables to measure a
person, place or thing. Variables may or may not be dependent on
each other.
(Bivariate e.g. name and marks, Multivariate: name, caste and marks)
•Cross-sectional data/Time-ordered data (business, social sciences)
•Cross-Sectional: Measurements taken at one time period
•E.g. (caste and per capita incomes)
•Time-Ordered: Measurements taken over time in chronological
sequence. e.g. years and per capita income
The type of data will dictate (in part) the appropriate data-analysis method.
2
•Measurement Scales
•Nominal or Categorical Scale (e.g. fair, brown and black)
•Classification of people, places, or things into categories (e.g. age
ranges, colors, etc.).
•Classifications must be mutually exclusive (every element should
belong to one category with no ambiguity).
•Weakest of the four scales. No category is greater than or less
(better or worse) than the others. They are just different.
•Ordinal or Ranking Scale
•Classification of people, places, or things into a ranking such that
the data is arranged into a meaningful order (e.g. poor, fair, good,
excellent).
•Qualitative classification only
Introduction to Data Analysis
3
Introduction to Data Analysis
•Measurement Scales (business, social sciences)
•Interval Scale
•Data classified by ranking.
•Quantitative classification (time, temperature, etc).
•Zero point of scale is arbitrary (differences are meaningful).
•Ratio Scale
•Data classified as the ratio of two numbers.
•Quantitative classification (height, weight, distance, etc).
•Zero point of scale is real
•(data can be added, subtracted, multiplied, and divided).
4
Univariate Analysis/Descriptive Statistics
• Descriptive Statistics
– The Range
– Min/Max
– Average
– Median
– Mode
– Variance
– Standard Deviation
– Histograms and Normal Distributions
5
Univariate Analysis/Histograms
• Distributions
– Descriptive statistics are easier to interpret when
graphically illustrated.
– However, charting each data element can lead to very
busy and confusing charts that do not help interpret the
data.
– Grouping the data elements into categories and
charting the frequency within these categories yields a
graphical illustration of how the data is distributed
throughout its range.
6
Univariate Analysis/Histograms
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
X-axis labels
DataValues
With just a few columns this chart is difficult to interpret. It tells you very little
about the data set. Even finding the Min and Max can be difficult.
The data can be presented such that more statistical parameters can be
estimated from the chart (average, standard deviation).
7
Univariate Analysis/Histograms
• Frequency Table
– The first step is to decide on the categories and group
the data appropriately.
(45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74,
78, 81, 85, 87, 100)
Category Labels Frequency
0-50 3
51-60 2
61-70 6
71-80 5
81-90 3
>90 1
8
Univariate Analysis/Histograms
• Histogram
– A histogram is simply a column chart of the frequency
table.
Category Labels Frequency
0-50 3
51-60 2
61-70 6
71-80 5
81-90 3
>90 1
0
1
2
3
4
5
6
7
0-50 51-60 61-70 71-80 81-90 >90
Scores
Frequency
9
Univariate Analysis/Histograms
• Histogram
0
1
2
3
4
5
6
7
0-50 51-60 61-70 71-80 81-90 >90
Scores
Frequency
Average (68.6)
and Median (68)
Mode (74)
-1SD
+1SD
10
0
0.02
0.04
0.06
0.08
0.1
0.12
25 45 65 85 105 125 145 165
Univariate Analysis/Normal Distributions
• Distributions that can be described mathematically
as Gaussian are also called Normal
• The Bell curve
– Symmetrical
– Mean ≈ Median
Mean, Median, Mode
11
Univariate Analysis/Skewed Distributions
• When data are skewed, the
mean and SD can be
misleading
• Skewness
sk= 3(mean-median)/SD
If sk>|1| then distribution is
non-symetrical
• Negatively skewed
– Mean<Median
– Sk is negative
• Positively Skewed
– Mean>Median
– Sk is positive
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 20 40 60 80 100 120 140 160
0
0.02
0.04
0.06
0.08
0.1
0.12
25 45 65 85 105 125 145 165 185 205 225
12
Central Limit Theorem
• Regardless of the shape of a distribution, the distribution
of the sample mean based on samples of size N
approaches a normal curve as N increases.
– N must be less than the entire sample
N=10
13
Univariate Analysis/Descriptive Statistics
• The Range
– Difference between minimum and maximum
values in a data set
– Larger range usually (but not always)
indicates a large spread or deviation in the
values of the data set.
(73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87,
74, 65, 74, 50, 85, 45, 63, 100)
14
Univariate Analysis/Descriptive Statistics
• The Average (Mean)
– Sum of all values divided by the number of values in the data set.
– One measure of central location in the data set.
Average =
Average=(73+66+69+67+49+60+81+71+78+62+53+87+74+65+74
+50+85+45+63+100)/20 = 68.6
Excel function: AVERAGE()
∑=
N
i
im
N 1
1
15
Univariate Analysis/Descriptive Statistics
0 2.5 7.5 10
4.8
0 2.5 7.5 10
4.8
The data may or
may not be
symmetrical around
its average value
16
Univariate Analysis/Descriptive Statistics
• The Median
– The middle value in a sorted data set. Half the values
are greater and half are less than the median.
– Another measure of central location in the data set.
(45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74,
78, 81, 85, 87, 100)
Median: 68
(1, 2, 4, 7, 8, 9, 9)
– Excel function: MEDIAN()
17
Univariate Analysis/Descriptive Statistics
• The Median
– May or may not be close to the mean.
– Combination of mean and median are used to define
the skewness of a distribution.
0 2.5 7.5 10
6.25
18
Univariate Analysis/Descriptive Statistics
• The Mode
– Most frequently occurring value.
– Another measure of central location in the data set.
– (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74,
74, 78, 81, 85, 87, 100)
– Mode: 74
– Generally not all that meaningful unless a larger
percentage of the values are the same number.
19
Univariate Analysis/Descriptive Statistics
• Variance
– One measure of dispersion (deviation from the mean) of a data
set. The larger the variance, the greater is the average deviation
of each datum from the average value.
=
−∑=
m
mm
N
N
i
i
2
1
)(
1
Variance =
Average value of the data set
Variance = [(45 – 68.6)2
+ (49 – 68.6)2
+ (50 – 68.6)2
+ (53 – 68.6)2
+ …]/20 = 181
Excel Functions: VARP(), VAR()
20
Univariate Analysis/Descriptive Statistics
• Standard Deviation
– Square root of the variance. Can be thought of as the
average deviation from the mean of a data set.
– The magnitude of the number is more in line with the
values in the data set.
Standard Deviation = ([(45 – 68.6)2
+ (49 – 68.6)2
+ (50 – 68.6)2
+ (53 – 68.6)2
+
…]/20)1/2
= 13.5
Excel Functions: STDEVP(), STDEV()
21
Bivariate Analysis
Cross-tabulation and chi-square
22
So far the statistical methods we
have used only permit us to:
• Look at the frequency in which certain
numbers or categories occur.
• Look at measures of central tendency such
as means, modes, and medians for one
variable.
• Look at measures of dispersion such as
standard deviation and z scores for one
interval or ratio level variable.
23
Bivariate analysis allows us to:
• Look at associations/relationships among
two variables.
• Look at measures of the strength of the
relationship between two variables.
• Test hypotheses about relationships between
two nominal or ordinal level variables.
24
For example, what does this table tell us about
opinions on welfare by gender?
Support cutting
welfare benefits
for immigrants
Male Female
Yes 15 5
No 10 20
Total 25 25
25
Are frequencies sufficient to
allow us to make comparisons
about groups?
What other information do we
need?
26
Is this table more helpful?
Benefits for
Immigrants
Males Female
Yes 15 (60%) 5 (20%)
No 10 (40%) 20 (80%)
Total 25 (100%) 25 (100%)
27
How would you write a sentence
or two to describe what is in this
table?
28
Rules for cross-tabulation
• Calculate either column or row percents.
• Calculations are the number of frequencies
in a cell of a table divided by the total
number of frequencies in that column or
row, for example 20/25 = 80.0%
• All percentages in a column or row should
total 100%.
29
Let’s look at another example –
social work degrees by gender
Social Work
Degree
Male Female
BA 20 (33.3%) 20 ( %)
MSW 30 ( ) 70 (70.0%)
Ph.D. 10 (16.7%) 10 (10.0%)
60 (100.0%) 100 (100.0%
30
Questions:
What group had the largest percentage of
Ph.Ds?
What are the ways in which you could
find the missing numbers?
Is it obvious why you would use
percentages to make comparisons among
two or more groups? 31
In the following table, were people with drug,
alcohol, or a combination of both most likely
to be referred for individual treatment?
Services Alcohol Drugs Both
Individual
Treatment
10 (25%) 30 (60%) 5 (50%)
Group
Treatment
10 (25%) 10 (20%) 2 (20%)
AA 20 (50%) 10 (20%) 3 (30%)
Total 40 (100%) 50 (100%) 10 (100%)
32
Use the same table to answer the
following question:
How much more likely are
people with alcohol problems
alone to be referred to AA than
people with drug problems or a
combination of drug and alcohol
problems? 33
We use cross-tabulation when:
• We want to look at relationships among two
or three variables.
• We want a descriptive statistical measure to
tell us whether differences among groups
are large enough to indicate some sort of
relationship among variables.
34
Cross-tabs are not sufficient to:
• Tell us the strength or actually size of the relationships
among two or three variables.
• Test a hypothesis about the relationship between two or
three variables.
• Tell us the direction of the relationship among two or more
variables.
• Look at relationships between one nominal or ordinal
variable and one ratio or interval variable unless the range
of possible values for the ratio or interval variable is small.
What do you think a table with a large number of ratio
values would look like?
35
We can use cross-tabs to visually
assess whether independent and
dependent variables might be
related. In addition, we also use
cross-tabs to find out if
demographic variables such as
gender and ethnicity are related
to the second variable.
36
For example, gender may
determine if someone votes
Democratic or Republican or if
income is high, medium, or low.
Ethnicity might be related to
where someone lives or attitudes
about whether undocumented
workers should receive driver’s
licenses. 37
Because we use tables in these ways, we can
set up some decision rules about how to use
tables.
• Independent variables should be column variables.
• If you are not looking at independent and
dependent variable relationships, use the variable
that can logically be said to influence the other as
your column variable.
• Using this rule, always calculate column
percentages rather than row percentages.
• Use the column percentages to interpret your
results.
38
For example,
• If we were looking at the relationship between gender and
income, gender would be the column variable and income
would be the row variable. Logically gender can determine
income. Income does not determine your gender.
• If we were looking at the relationship between ethnicity
and location of a person’s home, ethnicity would be the
column variable.
• However, if we were looking at the relationship between
gender and ethnicity, one does not influence the other.
Either variable could be the column variable.
39
SPSS will allow you to choose a
column variable and row variable
and whether or not your table
will include column or row
percents.
40
You must use an additional statistic, chi-
square, if you want to:
• Test a hypothesis about two variables.
• Look at the strength of the relationship between an
independent and dependent variable.
• Determine whether the relationship between the
two variables is large enough to rule out random
chance or sampling error as reasons that there
appears to be a relationship between the two
variables.
41
Chi-square is simply an extension of a
cross-tabulation that gives you more
information about the relationship.
However, it provides no information
about the direction of the relationship
(positive or negative) between the two
variables.
42
Let’s use the following table to
test a hypothesis:
Education
Income High Low Total
High (Above
$40,000)
40 50
Low ($39,999
or less)
50
Total 50 50 100
43
I have not filled in all of the information
because we need to talk about two concepts
before we start calculations:
• Degrees of Freedom: In any table, there are
a limited number of choices for the values
in each cell.
• Marginals: Total frequencies in columns
and rows.
44
Let’s look at the number of choices
we have in the previous table:
Education
Income High Low Total
High (Above
$40,000)
40 50
Low ($39,999
or less)
50
Total 50 50 100
45
So the table becomes:
Education
Income High Low Total
High (Above
$40,000)
40 10 50
Low ($39,999
or less)
10 40 50
Total 50 50 100
46
The rules for determining degrees of freedom
in cross-tabulations or contingency tables:
• In any two by two tables (two columns, two
rows, excluding marginals) DF = 1.
• For all other tables, calculate DF as:
(c -1 ) * (r-1) where c = columns and r =
rows.
( So for a table with 3 columns and 4 rows,
DF = ____. )
47
Importance of Degrees of Freedom
• You will see degrees of freedom on your SPSS
print out.
• Most types of inferential statistics use DF in
calculations.
• In chi-square, we need to know DF if we are
calculating chi-square by hand. You must use the
value of the chi-square and DF to determine if the
chi-square value is large enough to be statistically
significant (consult chi-square table in most
statistics books).
48
Hypothesis Testing
• Goal: Make statement(s) regarding unknown population
parameter values based on sample data
• Elements of a hypothesis test:
– Null hypothesis - Statement regarding the value(s) of unknown
parameter(s). Typically will imply no association between
explanatory and response variables in our applications (will
always contain an equality)
– Alternative hypothesis - Statement contradictory to the null
hypothesis (will always contain an inequality)
– Test statistic - Quantity based on sample data and null
hypothesis used to test between null and alternative hypotheses
– Rejection region - Values of the test statistic for which we
reject the null in favor of the alternative hypothesis
49
Hypothesis Testing
Test Result –
True State
H0 True H0 False
H0 True Correct
Decision
Type I Error
H0 False Type II Error Correct
Decision
)()( ErrorIITypePErrorITypeP == βα
• Goal: Keep α, β reasonably small 50
Example - Efficacy Test for New drug
• Drug company has new drug, wishes to compare it
with current standard treatment
• Federal regulators tell company that they must
demonstrate that new drug is better than current
treatment to receive approval
• Firm runs clinical trial where some patients
receive new drug, and others receive standard
treatment
• Numeric response of therapeutic effect is obtained
(higher scores are better).
• Parameter of interest: µNew - µStd
51
Example - Efficacy Test for New drug
• Null hypothesis - New drug is no better than standard trt
( )00:0 =−≤− StdNewStdNewH µµµµ
• Alternative hypothesis - New drug is better than standard trt
0: >− StdNewAH µµ
• Experimental (Sample) data:
StdNew
StdNew
StdNew
nn
ss
yy
52
Sampling Distribution of Difference in Means
• In large samples, the difference in two sample means is
approximately normally distributed: N= Normal
distribution, with a mean and SD








+−−
2
2
2
1
2
1
2121 ,~
nn
NYY
σσ
µµ
• Under the null hypothesis, µ1-µ2=0 and:
)1,0(~
2
2
2
1
2
1
21
N
nn
YY
Z
σσ
+
−
=
53
Example - Efficacy Test for New drug
• Type I error - Concluding that the new drug is better than the
standard (HA) when in fact it is no better (H0). Ineffective drug is
deemed better.
– Traditionally α = P(Type I error) = 0.05
• Type II error - Failing to conclude that the new drug is better (HA)
when in fact it is. Effective drug is deemed to be no better.
– Traditionally a clinically important difference (∆) is assigned
and sample sizes chosen so that:
β = P(Type II error | µ1-µ2 = ∆) ≤ 0.20
54
Elements of a Hypothesis Test
• Test Statistic - Difference between the Sample means,
scaled to number of standard deviations (standard errors)
from the null difference of 0 for the Population means:
2
2
2
1
2
1
21
:..
n
s
n
s
yy
zST obs
+
−
=
• Rejection Region - Set of values of the test statistic that are consistent
with HA, such that the probability it falls in this region when H0 is true is α
(we will always set α=0.05)
645.105.0:.. =⇒=≥ αα α zzzRR obs
55
P-value (aka Observed Significance Level)
• P-value - Measure of the strength of evidence the sample
data provides against the null hypothesis:
P(Evidence This strong or stronger against H0 | H0 is true)
)(: obszZPpvalP ≥=−
56
Large-Sample Test H0:µ1-µ2=0 vs H0:µ1-µ2>0
• H0: µ1-µ2 = 0 (No difference in population means
• HA: µ1-µ2 > 0 (Population Mean 1 > Pop Mean 2)
ty_value][probabiliobs
obs
2
2
2
1
2
1
21
obs
)zZ(P:valueP
zz:.R.R
n
s
n
s
yy
z:.S.T
Region][Rejection
Statistic][Test
=≥−•
≥=•
+
−
==•
α
• Conclusion - Reject H0 if test statistic falls in rejection region, or
equivalently the P-value is ≤ α
57
Example - Botox for Cervical Dystonia
• Patients - Individuals suffering from cervical dystonia
• Response - Tsui score of severity of cervical dystonia
(higher scores are more severe) at week 8 of Tx
• Research (alternative) hypothesis - Botox A decreases
mean Tsui score more than placebo
• Groups - Placebo (Group 1) and Botox A (Group 2)
• Experimental (Sample) Results:
354.37.7
336.31.10
222
111
===
===
nsy
nsy
Source: Wissel, et al (2001)
58
Example - Botox for Cervical Dystonia
0024.)82.2(:
645.1:..
82.2
85.0
4.2
35
)4.3(
33
)6.3(
7.71.10
:..
0:
0:
05.
22
21
210
=≥−•
==≥•
==
+
−
=•
>−•
=−•
ZPvalP
zzzRR
zST
H
H
obs
obs
A
α
µµ
µµ
Test whether Botox A produces lower mean Tsui scores than
placebo (α = 0.05)
Conclusion: Botox A produces lower mean Tsui scores than placebo
(since 2.82 > 1.645 and P-value < 0.05)
There is only 0.24%
chance that it is by
chance. Hence
Botox is better.
59
2-Sided Tests
• Many studies don’t assume a direction wrt the
difference µ1-µ2
• H0: µ1-µ2 = 0 HA: µ1-µ2 ≠ 0
• Test statistic is the same as before
• Decision Rule:
– Conclude µ1-µ2> 0 if zobs ≥ zα/2 (α=0.05 ⇒ zα/2=1.96)
– Conclude µ1-µ2< 0 if zobs ≥ -zα/2 (α=0.05 ⇒ -zα/2= -1.96)
– Do not reject µ1-µ2 = 0 if -zα/2 ≤ zobs ≤ zα/2
• P-value: 2P(Z≥ |zobs|)
60
Power of a Test
• Power - Probability a test rejects H0(depends on µ1- µ2)
– H0 True: Power = P(Type I error) = α
– H0 False: Power = 1-P(Type II error) = 1-β
· Example:
· H0: µ1- µ2 = 0 HA: µ1- µ2 > 0
• σ1
2
= σ2
2
= 25 n1 = n2 = 25
· Decision Rule: Reject H0 (at α=0.05 significance level) if:
326.2645.1
2
21
21
2
2
2
1
2
1
21
≥−⇒≥
−
=
+
−
= yy
yy
nn
yy
zobs
σσ
1.414* 1.645= 2.326
61
Power of a Test
• Now suppose in reality that µ1-µ2 = 3.0 (HA is true)
• Power now refers to the probability we (correctly)
reject the null hypothesis. Note that the sampling
distribution of the difference in sample means is
approximately normal, with mean 3.0 and standard
deviation (standard error) 1.414.
• Decision Rule (from last slide): Conclude population
means differ if the sample mean for group 1 is at least
2.326 higher than the sample mean for group 2
• Power for this case can be computed as:
)414.10.2,3(~)326.2( 2121 =−≥− NYYYYP
62
Power of a Test
• All else being equal:
• As sample sizes increase, power increases
• As population variances decrease, power
increases
• As the true mean difference increases, power
increases
63
Power of a Test
Distribution (H0) Distribution (HA)
64
Power of a Test
Power Curves for group sample sizes of 25,50,75,100 and varying true
values µ1-µ2 with σ1=σ2=5.
• For given µ1-µ2 , power increases with sample size
• For given sample size, power increases with µ1-µ2 65
Steps in testing a hypothesis:
• State the research hypothesis
• State the null hypothesis
• Choose a level of statistical significance
(alpha level)
• Select and compute the test statistic
• Make a decision regarding whether to
accept or reject the null hypothesis.
66
Calculating Chi-Square
• Formula is [0 - E]2
E
Where 0 is the observed value in a cell
E is the expected value in the same
cell we would see if there was no
association
67
First steps
Alternative hypothesis is: There is a relationship
between income level and education for
respondents in a survey of BA students.
Null hypothesis is: There is no relationship between
income level and education for respondents in a
survey of BA students
Confidence level set at 0.05
68
Rules for determining whether the chi-square
statistic and probability are large enough to verify a
relationship.
• For hand calculations, use the degree(s) of
freedom and the confidence level you set to check
the Chi-square table found in most statistics
books. For the chi-square to be statistically
significant, it must be the same size or larger than
the number in the table.
• On an SPSS print out, the p. or significance value
must be the same size or smaller than your
significance level.
69
The formula for expected values are
E = R*C
Education
Income High Low Total
High (Above
$40,000)
25 25 50
Low ($39,999
or less)
25 25 50
Total 50 50 100
70
Go back to our first table
Education
Income High Low Total
High (Above
$40,000)
40 10 50
Low ($39,999
or less)
10 40 50
Total 50 50 100
71
Chi-square calculation is
Expected
Values Chi-square
Cell 1 50 * 50/100= 25 (40-25)2
/25= 9
Cell 2 50*50/100= 25 (10-25)2
/25= 9
Cell 3 50 * 50/100= 25 (10-25)2
/25= 9
Cell 4 50*50/100= 25 (40-25)2
/25= 9
36
At 0.05, 1 = df, chi-square must be larger
than 3.84 to be statistically significant
72
Chi-Square Table
73
Let’s calculate another chi-square- service
receipt by location of residence
Service Urban Rural Total
Yes 20 40 60
No 30 10 40
Total 50 50 100
74
For this table,
• DF = 1
• Alternative hypothesis:
Receiving service is associated with
location of residence.
Null hypothesis:
There is no association between receiving
service and location of residence.
75
Calculations for chi-square are
Expected
Values Chi-square
Cell 1 50 * 60/100= 30 (20-30)2
/30= 3.33
Cell 2 50*40/100= 20 (30-20)2
/20= 5.00
Cell 3 50*60/100= 30 (40-30)2
/30= 3.33
Cell 4 50*40/100= 20 (10-20)2
/20= 5.00
16.67
At 1 DF at 0.01 chi-square must be greater than 6.64.
Do we accept or reject the null hypothesis? 76
Running chi-square in SPSS
• Select descriptive statistics
• Select cross-tabulation
• Highlight your independent variable and click on the arrow.
• Highlight your dependent variable and click on the arrow.
• Select Cells
• Choose column percents
• Click continue
• Select statistics
• Select chi-square
• Click continue
• Click ok
77

More Related Content

What's hot

F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVAParag Shah
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettyApplication of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettySundar B N
 
Data screening
Data screeningData screening
Data screening緯鈞 沈
 
Chi square test final
Chi square test finalChi square test final
Chi square test finalHar Jindal
 
Stat 3203 -sampling errors and non-sampling errors
Stat 3203 -sampling errors  and non-sampling errorsStat 3203 -sampling errors  and non-sampling errors
Stat 3203 -sampling errors and non-sampling errorsKhulna University
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Types of data and graphical representation
Types of data and graphical representationTypes of data and graphical representation
Types of data and graphical representationReena Titoria
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Harve Abella
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysisSumit Das
 
How to choose a right statistical test
How to choose a right statistical testHow to choose a right statistical test
How to choose a right statistical testKhalid Mahmood
 
Data Analysis with SPSS PPT.pdf
Data Analysis with SPSS PPT.pdfData Analysis with SPSS PPT.pdf
Data Analysis with SPSS PPT.pdfThanavathi C
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 

What's hot (20)

F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVA
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Ordinal Logistic Regression
Ordinal Logistic RegressionOrdinal Logistic Regression
Ordinal Logistic Regression
 
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettyApplication of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
 
Normality tests
Normality testsNormality tests
Normality tests
 
Data screening
Data screeningData screening
Data screening
 
Chi square test final
Chi square test finalChi square test final
Chi square test final
 
Stat 3203 -sampling errors and non-sampling errors
Stat 3203 -sampling errors  and non-sampling errorsStat 3203 -sampling errors  and non-sampling errors
Stat 3203 -sampling errors and non-sampling errors
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Types of data and graphical representation
Types of data and graphical representationTypes of data and graphical representation
Types of data and graphical representation
 
Statistical software
Statistical softwareStatistical software
Statistical software
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
 
How to choose a right statistical test
How to choose a right statistical testHow to choose a right statistical test
How to choose a right statistical test
 
Data Analysis with SPSS PPT.pdf
Data Analysis with SPSS PPT.pdfData Analysis with SPSS PPT.pdf
Data Analysis with SPSS PPT.pdf
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 

Viewers also liked

Viewers also liked (8)

Bivariate analysis
Bivariate analysisBivariate analysis
Bivariate analysis
 
Malhotra15
Malhotra15Malhotra15
Malhotra15
 
Freq distribution
Freq distributionFreq distribution
Freq distribution
 
Bivariate
BivariateBivariate
Bivariate
 
Bivariate
BivariateBivariate
Bivariate
 
Multivariate Analysis Techniques
Multivariate Analysis TechniquesMultivariate Analysis Techniques
Multivariate Analysis Techniques
 
Multivariate Analysis An Overview
Multivariate Analysis An OverviewMultivariate Analysis An Overview
Multivariate Analysis An Overview
 
Chi square test
Chi square testChi square test
Chi square test
 

Similar to Univariate, bivariate analysis, hypothesis testing, chi square

Class1.ppt
Class1.pptClass1.ppt
Class1.pptGautam G
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1RajnishSingh367990
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSnagamani651296
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)Chhom Karath
 
Basics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasBasics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasArup8
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture PackShaun Cochrane
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatMarwa Zalat
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
 
presentation
presentationpresentation
presentationPwalmiki
 
Student’s presentation
Student’s presentationStudent’s presentation
Student’s presentationPwalmiki
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematicshktripathy
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxSailajaReddyGunnam
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mininghktripathy
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxMuhammadNafees42
 

Similar to Univariate, bivariate analysis, hypothesis testing, chi square (20)

Statistics
StatisticsStatistics
Statistics
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)
 
Basics of statistics by Arup Nama Das
Basics of statistics by Arup Nama DasBasics of statistics by Arup Nama Das
Basics of statistics by Arup Nama Das
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
presentation
presentationpresentation
presentation
 
Student’s presentation
Student’s presentationStudent’s presentation
Student’s presentation
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
Dscriptive statistics
Dscriptive statisticsDscriptive statistics
Dscriptive statistics
 

More from kongara

K.chaitanya sm
K.chaitanya smK.chaitanya sm
K.chaitanya smkongara
 
Stakeholder management
Stakeholder managementStakeholder management
Stakeholder managementkongara
 
K.chaitanya pm
K.chaitanya pmK.chaitanya pm
K.chaitanya pmkongara
 
2 e salesforce objectives.pdf (3 files merged)
2 e salesforce objectives.pdf (3 files merged)2 e salesforce objectives.pdf (3 files merged)
2 e salesforce objectives.pdf (3 files merged)kongara
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregressionkongara
 
Adwords introduction (1)
Adwords introduction (1)Adwords introduction (1)
Adwords introduction (1)kongara
 
Offpage optimization
Offpage optimizationOffpage optimization
Offpage optimizationkongara
 
Basics of search engines and algorithms (1)
Basics of search engines and algorithms (1)Basics of search engines and algorithms (1)
Basics of search engines and algorithms (1)kongara
 
Isttm evol, dynamics, trends hrm
Isttm evol, dynamics, trends hrmIsttm evol, dynamics, trends hrm
Isttm evol, dynamics, trends hrmkongara
 
Isttm hyd ir v2.0
Isttm hyd ir v2.0Isttm hyd ir v2.0
Isttm hyd ir v2.0kongara
 
Isstm merit rating, promotions & transfers
Isstm merit rating, promotions & transfersIsstm merit rating, promotions & transfers
Isstm merit rating, promotions & transferskongara
 
Matching entrepreneur
Matching entrepreneurMatching entrepreneur
Matching entrepreneurkongara
 
Marketing channel selection
Marketing channel selection Marketing channel selection
Marketing channel selection kongara
 
Market feasibility
Market feasibilityMarket feasibility
Market feasibilitykongara
 
Innovation & entrepreneurship development program
Innovation & entrepreneurship development programInnovation & entrepreneurship development program
Innovation & entrepreneurship development programkongara
 
Industrial policy
Industrial policyIndustrial policy
Industrial policykongara
 
government industrial policies
government industrial policies government industrial policies
government industrial policies kongara
 
Current scenario
Current scenarioCurrent scenario
Current scenariokongara
 
Feasibilitystudy
FeasibilitystudyFeasibilitystudy
Feasibilitystudykongara
 

More from kongara (20)

K.chaitanya sm
K.chaitanya smK.chaitanya sm
K.chaitanya sm
 
Stakeholder management
Stakeholder managementStakeholder management
Stakeholder management
 
K.chaitanya pm
K.chaitanya pmK.chaitanya pm
K.chaitanya pm
 
2 e salesforce objectives.pdf (3 files merged)
2 e salesforce objectives.pdf (3 files merged)2 e salesforce objectives.pdf (3 files merged)
2 e salesforce objectives.pdf (3 files merged)
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
Adwords introduction (1)
Adwords introduction (1)Adwords introduction (1)
Adwords introduction (1)
 
Offpage optimization
Offpage optimizationOffpage optimization
Offpage optimization
 
Basics of search engines and algorithms (1)
Basics of search engines and algorithms (1)Basics of search engines and algorithms (1)
Basics of search engines and algorithms (1)
 
Isttm evol, dynamics, trends hrm
Isttm evol, dynamics, trends hrmIsttm evol, dynamics, trends hrm
Isttm evol, dynamics, trends hrm
 
Isttm hyd ir v2.0
Isttm hyd ir v2.0Isttm hyd ir v2.0
Isttm hyd ir v2.0
 
Isstm merit rating, promotions & transfers
Isstm merit rating, promotions & transfersIsstm merit rating, promotions & transfers
Isstm merit rating, promotions & transfers
 
Matching entrepreneur
Matching entrepreneurMatching entrepreneur
Matching entrepreneur
 
Marketing channel selection
Marketing channel selection Marketing channel selection
Marketing channel selection
 
Market feasibility
Market feasibilityMarket feasibility
Market feasibility
 
Innovation & entrepreneurship development program
Innovation & entrepreneurship development programInnovation & entrepreneurship development program
Innovation & entrepreneurship development program
 
Industrial policy
Industrial policyIndustrial policy
Industrial policy
 
government industrial policies
government industrial policies government industrial policies
government industrial policies
 
Current scenario
Current scenarioCurrent scenario
Current scenario
 
Feasibilitystudy
FeasibilitystudyFeasibilitystudy
Feasibilitystudy
 
Dpr
DprDpr
Dpr
 

Recently uploaded

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Recently uploaded (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

Univariate, bivariate analysis, hypothesis testing, chi square

  • 1. Introduction to Data Analysis •Data Measurement •Measurement of the data is the first step in the process that ultimately guides the final analysis. •Consideration of sampling, controls, errors (random and systematic) and the required precision all influence the final analysis. •Validation: Instruments and methods used to measure the data must be validated for accuracy. •Precision and accuracy…Determination of error •Social vs. Physical Sciences 1
  • 2. Introduction to Data Analysis •Types of data •Univariate/Multivariate •Univariate: When we use one variable to describe a person, place, or thing. (e.g. Heights of individuals) •Multivariate: When we use two or more variables to measure a person, place or thing. Variables may or may not be dependent on each other. (Bivariate e.g. name and marks, Multivariate: name, caste and marks) •Cross-sectional data/Time-ordered data (business, social sciences) •Cross-Sectional: Measurements taken at one time period •E.g. (caste and per capita incomes) •Time-Ordered: Measurements taken over time in chronological sequence. e.g. years and per capita income The type of data will dictate (in part) the appropriate data-analysis method. 2
  • 3. •Measurement Scales •Nominal or Categorical Scale (e.g. fair, brown and black) •Classification of people, places, or things into categories (e.g. age ranges, colors, etc.). •Classifications must be mutually exclusive (every element should belong to one category with no ambiguity). •Weakest of the four scales. No category is greater than or less (better or worse) than the others. They are just different. •Ordinal or Ranking Scale •Classification of people, places, or things into a ranking such that the data is arranged into a meaningful order (e.g. poor, fair, good, excellent). •Qualitative classification only Introduction to Data Analysis 3
  • 4. Introduction to Data Analysis •Measurement Scales (business, social sciences) •Interval Scale •Data classified by ranking. •Quantitative classification (time, temperature, etc). •Zero point of scale is arbitrary (differences are meaningful). •Ratio Scale •Data classified as the ratio of two numbers. •Quantitative classification (height, weight, distance, etc). •Zero point of scale is real •(data can be added, subtracted, multiplied, and divided). 4
  • 5. Univariate Analysis/Descriptive Statistics • Descriptive Statistics – The Range – Min/Max – Average – Median – Mode – Variance – Standard Deviation – Histograms and Normal Distributions 5
  • 6. Univariate Analysis/Histograms • Distributions – Descriptive statistics are easier to interpret when graphically illustrated. – However, charting each data element can lead to very busy and confusing charts that do not help interpret the data. – Grouping the data elements into categories and charting the frequency within these categories yields a graphical illustration of how the data is distributed throughout its range. 6
  • 7. Univariate Analysis/Histograms 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 X-axis labels DataValues With just a few columns this chart is difficult to interpret. It tells you very little about the data set. Even finding the Min and Max can be difficult. The data can be presented such that more statistical parameters can be estimated from the chart (average, standard deviation). 7
  • 8. Univariate Analysis/Histograms • Frequency Table – The first step is to decide on the categories and group the data appropriately. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Category Labels Frequency 0-50 3 51-60 2 61-70 6 71-80 5 81-90 3 >90 1 8
  • 9. Univariate Analysis/Histograms • Histogram – A histogram is simply a column chart of the frequency table. Category Labels Frequency 0-50 3 51-60 2 61-70 6 71-80 5 81-90 3 >90 1 0 1 2 3 4 5 6 7 0-50 51-60 61-70 71-80 81-90 >90 Scores Frequency 9
  • 10. Univariate Analysis/Histograms • Histogram 0 1 2 3 4 5 6 7 0-50 51-60 61-70 71-80 81-90 >90 Scores Frequency Average (68.6) and Median (68) Mode (74) -1SD +1SD 10
  • 11. 0 0.02 0.04 0.06 0.08 0.1 0.12 25 45 65 85 105 125 145 165 Univariate Analysis/Normal Distributions • Distributions that can be described mathematically as Gaussian are also called Normal • The Bell curve – Symmetrical – Mean ≈ Median Mean, Median, Mode 11
  • 12. Univariate Analysis/Skewed Distributions • When data are skewed, the mean and SD can be misleading • Skewness sk= 3(mean-median)/SD If sk>|1| then distribution is non-symetrical • Negatively skewed – Mean<Median – Sk is negative • Positively Skewed – Mean>Median – Sk is positive 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 20 40 60 80 100 120 140 160 0 0.02 0.04 0.06 0.08 0.1 0.12 25 45 65 85 105 125 145 165 185 205 225 12
  • 13. Central Limit Theorem • Regardless of the shape of a distribution, the distribution of the sample mean based on samples of size N approaches a normal curve as N increases. – N must be less than the entire sample N=10 13
  • 14. Univariate Analysis/Descriptive Statistics • The Range – Difference between minimum and maximum values in a data set – Larger range usually (but not always) indicates a large spread or deviation in the values of the data set. (73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100) 14
  • 15. Univariate Analysis/Descriptive Statistics • The Average (Mean) – Sum of all values divided by the number of values in the data set. – One measure of central location in the data set. Average = Average=(73+66+69+67+49+60+81+71+78+62+53+87+74+65+74 +50+85+45+63+100)/20 = 68.6 Excel function: AVERAGE() ∑= N i im N 1 1 15
  • 16. Univariate Analysis/Descriptive Statistics 0 2.5 7.5 10 4.8 0 2.5 7.5 10 4.8 The data may or may not be symmetrical around its average value 16
  • 17. Univariate Analysis/Descriptive Statistics • The Median – The middle value in a sorted data set. Half the values are greater and half are less than the median. – Another measure of central location in the data set. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Median: 68 (1, 2, 4, 7, 8, 9, 9) – Excel function: MEDIAN() 17
  • 18. Univariate Analysis/Descriptive Statistics • The Median – May or may not be close to the mean. – Combination of mean and median are used to define the skewness of a distribution. 0 2.5 7.5 10 6.25 18
  • 19. Univariate Analysis/Descriptive Statistics • The Mode – Most frequently occurring value. – Another measure of central location in the data set. – (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) – Mode: 74 – Generally not all that meaningful unless a larger percentage of the values are the same number. 19
  • 20. Univariate Analysis/Descriptive Statistics • Variance – One measure of dispersion (deviation from the mean) of a data set. The larger the variance, the greater is the average deviation of each datum from the average value. = −∑= m mm N N i i 2 1 )( 1 Variance = Average value of the data set Variance = [(45 – 68.6)2 + (49 – 68.6)2 + (50 – 68.6)2 + (53 – 68.6)2 + …]/20 = 181 Excel Functions: VARP(), VAR() 20
  • 21. Univariate Analysis/Descriptive Statistics • Standard Deviation – Square root of the variance. Can be thought of as the average deviation from the mean of a data set. – The magnitude of the number is more in line with the values in the data set. Standard Deviation = ([(45 – 68.6)2 + (49 – 68.6)2 + (50 – 68.6)2 + (53 – 68.6)2 + …]/20)1/2 = 13.5 Excel Functions: STDEVP(), STDEV() 21
  • 23. So far the statistical methods we have used only permit us to: • Look at the frequency in which certain numbers or categories occur. • Look at measures of central tendency such as means, modes, and medians for one variable. • Look at measures of dispersion such as standard deviation and z scores for one interval or ratio level variable. 23
  • 24. Bivariate analysis allows us to: • Look at associations/relationships among two variables. • Look at measures of the strength of the relationship between two variables. • Test hypotheses about relationships between two nominal or ordinal level variables. 24
  • 25. For example, what does this table tell us about opinions on welfare by gender? Support cutting welfare benefits for immigrants Male Female Yes 15 5 No 10 20 Total 25 25 25
  • 26. Are frequencies sufficient to allow us to make comparisons about groups? What other information do we need? 26
  • 27. Is this table more helpful? Benefits for Immigrants Males Female Yes 15 (60%) 5 (20%) No 10 (40%) 20 (80%) Total 25 (100%) 25 (100%) 27
  • 28. How would you write a sentence or two to describe what is in this table? 28
  • 29. Rules for cross-tabulation • Calculate either column or row percents. • Calculations are the number of frequencies in a cell of a table divided by the total number of frequencies in that column or row, for example 20/25 = 80.0% • All percentages in a column or row should total 100%. 29
  • 30. Let’s look at another example – social work degrees by gender Social Work Degree Male Female BA 20 (33.3%) 20 ( %) MSW 30 ( ) 70 (70.0%) Ph.D. 10 (16.7%) 10 (10.0%) 60 (100.0%) 100 (100.0% 30
  • 31. Questions: What group had the largest percentage of Ph.Ds? What are the ways in which you could find the missing numbers? Is it obvious why you would use percentages to make comparisons among two or more groups? 31
  • 32. In the following table, were people with drug, alcohol, or a combination of both most likely to be referred for individual treatment? Services Alcohol Drugs Both Individual Treatment 10 (25%) 30 (60%) 5 (50%) Group Treatment 10 (25%) 10 (20%) 2 (20%) AA 20 (50%) 10 (20%) 3 (30%) Total 40 (100%) 50 (100%) 10 (100%) 32
  • 33. Use the same table to answer the following question: How much more likely are people with alcohol problems alone to be referred to AA than people with drug problems or a combination of drug and alcohol problems? 33
  • 34. We use cross-tabulation when: • We want to look at relationships among two or three variables. • We want a descriptive statistical measure to tell us whether differences among groups are large enough to indicate some sort of relationship among variables. 34
  • 35. Cross-tabs are not sufficient to: • Tell us the strength or actually size of the relationships among two or three variables. • Test a hypothesis about the relationship between two or three variables. • Tell us the direction of the relationship among two or more variables. • Look at relationships between one nominal or ordinal variable and one ratio or interval variable unless the range of possible values for the ratio or interval variable is small. What do you think a table with a large number of ratio values would look like? 35
  • 36. We can use cross-tabs to visually assess whether independent and dependent variables might be related. In addition, we also use cross-tabs to find out if demographic variables such as gender and ethnicity are related to the second variable. 36
  • 37. For example, gender may determine if someone votes Democratic or Republican or if income is high, medium, or low. Ethnicity might be related to where someone lives or attitudes about whether undocumented workers should receive driver’s licenses. 37
  • 38. Because we use tables in these ways, we can set up some decision rules about how to use tables. • Independent variables should be column variables. • If you are not looking at independent and dependent variable relationships, use the variable that can logically be said to influence the other as your column variable. • Using this rule, always calculate column percentages rather than row percentages. • Use the column percentages to interpret your results. 38
  • 39. For example, • If we were looking at the relationship between gender and income, gender would be the column variable and income would be the row variable. Logically gender can determine income. Income does not determine your gender. • If we were looking at the relationship between ethnicity and location of a person’s home, ethnicity would be the column variable. • However, if we were looking at the relationship between gender and ethnicity, one does not influence the other. Either variable could be the column variable. 39
  • 40. SPSS will allow you to choose a column variable and row variable and whether or not your table will include column or row percents. 40
  • 41. You must use an additional statistic, chi- square, if you want to: • Test a hypothesis about two variables. • Look at the strength of the relationship between an independent and dependent variable. • Determine whether the relationship between the two variables is large enough to rule out random chance or sampling error as reasons that there appears to be a relationship between the two variables. 41
  • 42. Chi-square is simply an extension of a cross-tabulation that gives you more information about the relationship. However, it provides no information about the direction of the relationship (positive or negative) between the two variables. 42
  • 43. Let’s use the following table to test a hypothesis: Education Income High Low Total High (Above $40,000) 40 50 Low ($39,999 or less) 50 Total 50 50 100 43
  • 44. I have not filled in all of the information because we need to talk about two concepts before we start calculations: • Degrees of Freedom: In any table, there are a limited number of choices for the values in each cell. • Marginals: Total frequencies in columns and rows. 44
  • 45. Let’s look at the number of choices we have in the previous table: Education Income High Low Total High (Above $40,000) 40 50 Low ($39,999 or less) 50 Total 50 50 100 45
  • 46. So the table becomes: Education Income High Low Total High (Above $40,000) 40 10 50 Low ($39,999 or less) 10 40 50 Total 50 50 100 46
  • 47. The rules for determining degrees of freedom in cross-tabulations or contingency tables: • In any two by two tables (two columns, two rows, excluding marginals) DF = 1. • For all other tables, calculate DF as: (c -1 ) * (r-1) where c = columns and r = rows. ( So for a table with 3 columns and 4 rows, DF = ____. ) 47
  • 48. Importance of Degrees of Freedom • You will see degrees of freedom on your SPSS print out. • Most types of inferential statistics use DF in calculations. • In chi-square, we need to know DF if we are calculating chi-square by hand. You must use the value of the chi-square and DF to determine if the chi-square value is large enough to be statistically significant (consult chi-square table in most statistics books). 48
  • 49. Hypothesis Testing • Goal: Make statement(s) regarding unknown population parameter values based on sample data • Elements of a hypothesis test: – Null hypothesis - Statement regarding the value(s) of unknown parameter(s). Typically will imply no association between explanatory and response variables in our applications (will always contain an equality) – Alternative hypothesis - Statement contradictory to the null hypothesis (will always contain an inequality) – Test statistic - Quantity based on sample data and null hypothesis used to test between null and alternative hypotheses – Rejection region - Values of the test statistic for which we reject the null in favor of the alternative hypothesis 49
  • 50. Hypothesis Testing Test Result – True State H0 True H0 False H0 True Correct Decision Type I Error H0 False Type II Error Correct Decision )()( ErrorIITypePErrorITypeP == βα • Goal: Keep α, β reasonably small 50
  • 51. Example - Efficacy Test for New drug • Drug company has new drug, wishes to compare it with current standard treatment • Federal regulators tell company that they must demonstrate that new drug is better than current treatment to receive approval • Firm runs clinical trial where some patients receive new drug, and others receive standard treatment • Numeric response of therapeutic effect is obtained (higher scores are better). • Parameter of interest: µNew - µStd 51
  • 52. Example - Efficacy Test for New drug • Null hypothesis - New drug is no better than standard trt ( )00:0 =−≤− StdNewStdNewH µµµµ • Alternative hypothesis - New drug is better than standard trt 0: >− StdNewAH µµ • Experimental (Sample) data: StdNew StdNew StdNew nn ss yy 52
  • 53. Sampling Distribution of Difference in Means • In large samples, the difference in two sample means is approximately normally distributed: N= Normal distribution, with a mean and SD         +−− 2 2 2 1 2 1 2121 ,~ nn NYY σσ µµ • Under the null hypothesis, µ1-µ2=0 and: )1,0(~ 2 2 2 1 2 1 21 N nn YY Z σσ + − = 53
  • 54. Example - Efficacy Test for New drug • Type I error - Concluding that the new drug is better than the standard (HA) when in fact it is no better (H0). Ineffective drug is deemed better. – Traditionally α = P(Type I error) = 0.05 • Type II error - Failing to conclude that the new drug is better (HA) when in fact it is. Effective drug is deemed to be no better. – Traditionally a clinically important difference (∆) is assigned and sample sizes chosen so that: β = P(Type II error | µ1-µ2 = ∆) ≤ 0.20 54
  • 55. Elements of a Hypothesis Test • Test Statistic - Difference between the Sample means, scaled to number of standard deviations (standard errors) from the null difference of 0 for the Population means: 2 2 2 1 2 1 21 :.. n s n s yy zST obs + − = • Rejection Region - Set of values of the test statistic that are consistent with HA, such that the probability it falls in this region when H0 is true is α (we will always set α=0.05) 645.105.0:.. =⇒=≥ αα α zzzRR obs 55
  • 56. P-value (aka Observed Significance Level) • P-value - Measure of the strength of evidence the sample data provides against the null hypothesis: P(Evidence This strong or stronger against H0 | H0 is true) )(: obszZPpvalP ≥=− 56
  • 57. Large-Sample Test H0:µ1-µ2=0 vs H0:µ1-µ2>0 • H0: µ1-µ2 = 0 (No difference in population means • HA: µ1-µ2 > 0 (Population Mean 1 > Pop Mean 2) ty_value][probabiliobs obs 2 2 2 1 2 1 21 obs )zZ(P:valueP zz:.R.R n s n s yy z:.S.T Region][Rejection Statistic][Test =≥−• ≥=• + − ==• α • Conclusion - Reject H0 if test statistic falls in rejection region, or equivalently the P-value is ≤ α 57
  • 58. Example - Botox for Cervical Dystonia • Patients - Individuals suffering from cervical dystonia • Response - Tsui score of severity of cervical dystonia (higher scores are more severe) at week 8 of Tx • Research (alternative) hypothesis - Botox A decreases mean Tsui score more than placebo • Groups - Placebo (Group 1) and Botox A (Group 2) • Experimental (Sample) Results: 354.37.7 336.31.10 222 111 === === nsy nsy Source: Wissel, et al (2001) 58
  • 59. Example - Botox for Cervical Dystonia 0024.)82.2(: 645.1:.. 82.2 85.0 4.2 35 )4.3( 33 )6.3( 7.71.10 :.. 0: 0: 05. 22 21 210 =≥−• ==≥• == + − =• >−• =−• ZPvalP zzzRR zST H H obs obs A α µµ µµ Test whether Botox A produces lower mean Tsui scores than placebo (α = 0.05) Conclusion: Botox A produces lower mean Tsui scores than placebo (since 2.82 > 1.645 and P-value < 0.05) There is only 0.24% chance that it is by chance. Hence Botox is better. 59
  • 60. 2-Sided Tests • Many studies don’t assume a direction wrt the difference µ1-µ2 • H0: µ1-µ2 = 0 HA: µ1-µ2 ≠ 0 • Test statistic is the same as before • Decision Rule: – Conclude µ1-µ2> 0 if zobs ≥ zα/2 (α=0.05 ⇒ zα/2=1.96) – Conclude µ1-µ2< 0 if zobs ≥ -zα/2 (α=0.05 ⇒ -zα/2= -1.96) – Do not reject µ1-µ2 = 0 if -zα/2 ≤ zobs ≤ zα/2 • P-value: 2P(Z≥ |zobs|) 60
  • 61. Power of a Test • Power - Probability a test rejects H0(depends on µ1- µ2) – H0 True: Power = P(Type I error) = α – H0 False: Power = 1-P(Type II error) = 1-β · Example: · H0: µ1- µ2 = 0 HA: µ1- µ2 > 0 • σ1 2 = σ2 2 = 25 n1 = n2 = 25 · Decision Rule: Reject H0 (at α=0.05 significance level) if: 326.2645.1 2 21 21 2 2 2 1 2 1 21 ≥−⇒≥ − = + − = yy yy nn yy zobs σσ 1.414* 1.645= 2.326 61
  • 62. Power of a Test • Now suppose in reality that µ1-µ2 = 3.0 (HA is true) • Power now refers to the probability we (correctly) reject the null hypothesis. Note that the sampling distribution of the difference in sample means is approximately normal, with mean 3.0 and standard deviation (standard error) 1.414. • Decision Rule (from last slide): Conclude population means differ if the sample mean for group 1 is at least 2.326 higher than the sample mean for group 2 • Power for this case can be computed as: )414.10.2,3(~)326.2( 2121 =−≥− NYYYYP 62
  • 63. Power of a Test • All else being equal: • As sample sizes increase, power increases • As population variances decrease, power increases • As the true mean difference increases, power increases 63
  • 64. Power of a Test Distribution (H0) Distribution (HA) 64
  • 65. Power of a Test Power Curves for group sample sizes of 25,50,75,100 and varying true values µ1-µ2 with σ1=σ2=5. • For given µ1-µ2 , power increases with sample size • For given sample size, power increases with µ1-µ2 65
  • 66. Steps in testing a hypothesis: • State the research hypothesis • State the null hypothesis • Choose a level of statistical significance (alpha level) • Select and compute the test statistic • Make a decision regarding whether to accept or reject the null hypothesis. 66
  • 67. Calculating Chi-Square • Formula is [0 - E]2 E Where 0 is the observed value in a cell E is the expected value in the same cell we would see if there was no association 67
  • 68. First steps Alternative hypothesis is: There is a relationship between income level and education for respondents in a survey of BA students. Null hypothesis is: There is no relationship between income level and education for respondents in a survey of BA students Confidence level set at 0.05 68
  • 69. Rules for determining whether the chi-square statistic and probability are large enough to verify a relationship. • For hand calculations, use the degree(s) of freedom and the confidence level you set to check the Chi-square table found in most statistics books. For the chi-square to be statistically significant, it must be the same size or larger than the number in the table. • On an SPSS print out, the p. or significance value must be the same size or smaller than your significance level. 69
  • 70. The formula for expected values are E = R*C Education Income High Low Total High (Above $40,000) 25 25 50 Low ($39,999 or less) 25 25 50 Total 50 50 100 70
  • 71. Go back to our first table Education Income High Low Total High (Above $40,000) 40 10 50 Low ($39,999 or less) 10 40 50 Total 50 50 100 71
  • 72. Chi-square calculation is Expected Values Chi-square Cell 1 50 * 50/100= 25 (40-25)2 /25= 9 Cell 2 50*50/100= 25 (10-25)2 /25= 9 Cell 3 50 * 50/100= 25 (10-25)2 /25= 9 Cell 4 50*50/100= 25 (40-25)2 /25= 9 36 At 0.05, 1 = df, chi-square must be larger than 3.84 to be statistically significant 72
  • 74. Let’s calculate another chi-square- service receipt by location of residence Service Urban Rural Total Yes 20 40 60 No 30 10 40 Total 50 50 100 74
  • 75. For this table, • DF = 1 • Alternative hypothesis: Receiving service is associated with location of residence. Null hypothesis: There is no association between receiving service and location of residence. 75
  • 76. Calculations for chi-square are Expected Values Chi-square Cell 1 50 * 60/100= 30 (20-30)2 /30= 3.33 Cell 2 50*40/100= 20 (30-20)2 /20= 5.00 Cell 3 50*60/100= 30 (40-30)2 /30= 3.33 Cell 4 50*40/100= 20 (10-20)2 /20= 5.00 16.67 At 1 DF at 0.01 chi-square must be greater than 6.64. Do we accept or reject the null hypothesis? 76
  • 77. Running chi-square in SPSS • Select descriptive statistics • Select cross-tabulation • Highlight your independent variable and click on the arrow. • Highlight your dependent variable and click on the arrow. • Select Cells • Choose column percents • Click continue • Select statistics • Select chi-square • Click continue • Click ok 77