2. Definition of Statistics
• Statistical analysis involves the process of collecting and
analyzing data and then summarizing the data into a numerical
form.
• Statistics is the study of the collection, analysis, interpretation,
presentation, and organization of data.
• A collection of methods for planning experiments, obtaining
data, and then organizing, summarizing, presenting, analyzing,
interpreting, and drawing conclusions based on the data.
3. Characteristics
• Aggregate of facts
• Affected to a market extent by multiplicity of causes
• Numerically expressed
• Estimated according to a reasonable standard of
accuracy
• Collected in systematic order
• Collected for a predetermined purpose
4. Need of study of statistics
• Business
• Mathematics
• Accounting
• Economics
• Banking
• Management & administration
• Astronomy
6. Data
• Observations (such as measurements,
genders, survey responses) that have been
collected.
• Information in raw or unorganized form (such
as alphabets, numbers, or symbols) that refer
to, or represent, conditions, ideas, or objects.
7. Data collection
• Data collection is a systematic approach to gathering
information from a variety of sources to get a
complete and accurate picture of an area of interest.
Methods of collection :
• Primary data
• Internal records: it is a kind of secondary data which
is not published like bank customer’s records, internal
records
• Secondary data
8. Classification of data
• Geography: area, country, state, region
• Polity: ideology – socialistic, capitalist, sovereign
• Demography: age, religion, gender, height, weight,
any individual characteristics
• Income: low income, mid income, high income
• Quantitative data
• Qualitative data : preference, likability, other
characteristics pertaining to individual
• Chronological : sorting, ascending or descending
order
9. Mathematics
• The study of the measurement, properties, and relations
hips of quantities and sets, using numbers and symbols.
Mathematics is an equation format.
• Mathematics is
a group of related sciences, including algebra, geometr
y, and calculus, concerned with the
study of number, quantity, shape, and space and their in
terrelationships by using a specialized notation
10. Branches of mathematics
• Foundations: formulation & analysis of language
• Algebra: study of one or more several variables
• Arithmetic
• Analysis
• Geometry: concerned with the axiomatic study of polygons,
conic sections, spheres, polyhedra, and related geometric
objects in two and three dimensions.
• Applied mathematics: numerical methods
and computer science, which seeks concrete solutions,
sometimes approximate, to explicit mathematical problems
11. Central tendency
• Central tendency is a central or typical value for a
probability distribution.
Objectives:
• To get one single value that describes the
characteristics of the entire data
• To facilitate comparison
Computation :
• Mean
• Median
• Mode
12. Dispersion
• A statistical term describing the size of the range of
values expected for a particular variable.
• Index of dispersion, a normalized measure of the
dispersion of a probability distribution
• Price dispersion, a variation in prices across sellers of
the same item
• Wage dispersion, the amount of variation in wages
encountered in an economy
13. Objectives
• To determine the reliability of an average
• To serve as a basis for the control of variability
• To compare two or more series
• To facilitate the use of other statistical measures
14. Measure of dispersion includes:
• Range
• Interquartile range
• Variance
• Mean deviation
• Standard deviation
15. Range
• Range of a set of data is the difference between the
largest and smallest values.
• Difference between highest value and lowest value
16. Interquartile range
A measure of statistical dispersion, being equal to the
difference between the upper and lower quartiles.
• the first quartile
• and the third quartile
• If the actual values of the first or third quartiles differ
substantially. from the calculated values, P is not
normally distributed.
17. Variance
• variance measures how far a set of numbers is spread out. A
variance of zero indicates that all the values are identical.
Variance is always non-negative: a small variance indicates
that the data points tend to be very close to the mean(expected
value) and hence to each other, while a high variance indicates
that the data points are very spread out around the mean and
from each other.
• The variance is a numerical value used to indicate how widely
individuals in a group vary. If individual observations vary
greatly from the group mean, the variance is big; and vice
versa.
Variance =σ2 = Σ ( Xi - X )2 / N
18. Standard deviation
• The square root of the arithmetic mean of the squares
of the deviation of the values taken from the mean.
Standard deviation is denoted by small Greek letter
(read as sigma) Standard deviation is also called as
root mean square deviation.
• In other way Standard Deviation is defined as the
square root of the sum of the squares of the difference
of each observation from its mean divided by the no.
of observations in the sample or population.
19. • A standard deviation close to 0 indicates that the data
points tend to be very close to the mean (also called
the expected value) of the set, while a high standard
deviation indicates that the data points are spread out
over a wider range of values.
• Square root of mean of deviation.
20. Skewness
• Skewness is a measure of symmetry, or more precisely, the
lack of symmetry. A distribution, or data set, is symmetric if it
looks the same to the left and right of the center point.
g1=∑Ni=1(Yi−Y¯)3/Ns3
• Skewness : a symmetrical distribution has a skewness of zero.
• An asymmetrical distribution with long tail(higher value) to
right has positive skew
• An asymmetrical distribution with long tail to left(lower value)
has negative skew
• The skewness is unit less
21. Kurtosis
• Kurtosis is a measure of whether the data are peaked or flat
relative to a normal distribution. That is, data sets with high
kurtosis tend to have a distinct peak near the mean, decline
rather rapidly, and have heavy tails. Data sets with low
kurtosis tend to have a flat top near the mean rather than a
sharp peak. A uniform distribution would be the extreme case.
kurtosis=∑Ni=1(Yi−Y¯)4/Ns4
kurtosis is a descriptor of the shape of a probability distribution
and, just as for skewness, there are different ways of quantifying
it for a theoretical distribution and corresponding ways of
estimating it from a sample from a population.
22. Correlation
• A measure of the linear correlation (dependence)
between two variables X and Y, giving a value
between +1 and −1 inclusive, where 1 is total
positive correlation, 0 is no correlation, and −1 is
total negative correlation. It is widely used in the
sciences as a measure of the degree of linear
dependence between two variables.
23. Sampling
A Sample out of population is a predefined set of
potential respondents in a geographical area.
The most common sampling element in
Marketing Research is Human Respondent who
could be :
- Consumer,
- A potential Consumer,
- A Dealer or Retailer
- A person exposed to an advertisement
24. Types of sampling
• Probabilistic sampling: In probable sampling
technique each sampling unit (household or
individual) has a known probability of being
included in the sample.
1. Simple Random Sampling.
2. Stratified random sampling
3. Cluster sampling
4. Systematic sampling
5. Multistage or Combination sampling.
25. Non probabilistic sampling:
• Quota Sampling ( a fixed number)
• Judgment sampling
• Convenience Sampling
• Snowball Sampling.
26. • An enumeration is a complete, ordered listing of all the items
in a collection.
• Methods of enumeration :
• Explicit complete enumeration: Full enumeration of all
possible alternatives and comparison of all of them to pick the
best solution.
• Implicit complete enumeration: Parts of the solution space
that are definitely sub-optimal are excluded. This reduces
complexity because only the most promising solutions have to
be considered. For implicit complete enumeration, methods
like Branch & Bound, limited enumeration and dynamic
optimization can be used.
• Incomplete enumeration: Selecting alternatives by only
looking at parts of the solution space by applying certain
heuristics. This provides approximate solutions, but not
necessarily optimal ones.
27. Sampling error
• Sampling error is incurred when the statistical
characteristics of a population are estimated from a
subset, or sample, of that population. Since the
sample does not include all members of the
population, statistics on the sample, such as means
and quintiles, generally differ from parameters on the
entire population.
• Sampling bias
• Random sampling
28. Time series
• A Time series is a sequence of data points, typically
consisting of successive measurements made over a time
interval.
• Time series are used in statistics, signal
processing, pattern recognition, econometrics,
mathematical finance, weather forecasting, intelligent
transport and trajectory forecasting ,earthquake
prediction, control.
29. • Engineering, astronomy, communications
engineering, and largely in any domain of
applied science and engineering which
involves temporal measurements. time
series analysis comprises methods for analyzing time
series data in order to extract meaningful statistics
and other characteristics of the data.
Components of Time series:
• Secular trends
• Seasonal variation
• Cyclical variation
• Irregular variation
30. Components of time series
• Secular trend: A time series data may show upward trend or
downward trend for a period of years and this may be due to
factors like increase in population, change in technological
progress ,large scale shift in consumers demands,etc. time
series which results from long term effect of socio-economic
and political factors. This trend may show the growth or
decline in a time series over a long period.
• Seasonal variation: Seasonal variation are short-term
fluctuation in a time series which occur periodically in a year.
This continues to repeat year after year. The major factors that
are responsible for the repetitive pattern of seasonal variations
are weather conditions and customs of people. More woolen
clothes are sold in winter than in the season of summer.
31. • Cyclical variations: Cyclical variations are recurrent upward
or downward movements in a time series but the period of
cycle is greater than a year. Also these variations are not
regular as seasonal variation. There are different types of
cycles of varying in length and size. The ups and downs in
business activities are the effects of cyclical variation.
• Irregular variation: Irregular variations are fluctuations in
time series that are short in duration, erratic in nature and
follow no regularity in the occurrence pattern. These are
sudden changes occurring in a time series which are unlikely
to be repeated. These variations are also referred to as residual
variations since by definition they represent what is left out in
a time series after trend ,cyclical and seasonal variations.
Irregular fluctuations results due to the occurrence of
unforeseen events like floods,earthquakes,wars,famines,etc.