SlideShare a Scribd company logo
1 of 114
Download to read offline
“LAKSHYA: NET-JRF & KSET”
STUDENT’S ACADEMIC DEVELOPMENT
PROGRAMME
AT
SRI H. D. DEVEGOWDA G.F.G.COLLEGE,
PADUVALAHIPPE
Business Statistics in
NET-JRF/KSET
Presented by
Sundar B. N.
Assistant Professor
I am the wisest man alive, for I
know one thing, and that is that I
know nothing.
-Plato, The Republic
Business Statistics and Research Methods
 Measures of central tendency
 Measures of dispersion
 Measures of Skewness
 Correlation and regression of two variables
 Probability: Approaches to probability; Bayes’ theorem
 Probability distributions: Binomial, Poisson and normal distributions
 Research: Concept and types; Research designs
 Data: Collection and classification of data
 Sampling and estimation: Concepts; Methods of sampling - probability and
nonprobability methods; Sampling distribution; Central limit theorem; Standard
error; Statistical estimation
 Hypothesis testing: z-test; t-test; ANOVA; Chi–square test; Mann-Whitney test
(Utest); Kruskal-Wallis test (H-test); Rank correlation test
 Report writing
STATISTICS
Measures of central tendency
Measures of dispersion
Measures of Skewness
Correlation and regression of two variables
Probability: Approaches to probability; Bayes’ theorem
Probability distributions: Binomial, Poisson and normal
distributions
Hypothesis testing: z-test; t-test; ANOVA; Chi–square test;
Mann-Whitney test (Utest); Kruskal-Wallis test (H-test);
Rank correlation test
Report writing
Meaning of Research
Research is an investigative process of finding
reliable solution to a problem through a systematic
selection, collection, analysis and interpretation of
data relating to problem.
Business statistics for NET-JRF SLET according to updated syllabus
Analysis of Data
Data analysis is the process of systematically
applying statistical and logical techniques to
describe ad illustrate, condense and recap and
evaluate data
Technically speaking, processing implies editing,
coding, classification and tabulation of collected
data so that they are amenable to analysis.
Statistics
The word “Statistics’ seems to have been derived from Latin word ‘Status’ or Italian
word ‘Statista’ or German word ‘Statistik’
But according to the observations of great John Graunt (1620-1674), the word
‘Statistics’ is of Italian origin and it is derived from the word ‘Stato’ and
statista means a person who deals with affairs of the state.
That is, initially kings or monarchs or governments used it to collect the
information related to the population, agricultural land, wealth, etc. of the
state. Their aim behind it was just to get an idea about the men power of the
state, force needed for the purpose of a war and necessary taxes to be impose to
meet the financial need of the state. So, it indicates that initially it was used by
kings or monarchs or governments for administrative requirements of the
state. That is why its origin lies in the State craft(The art of managing state
affairs).
On the basis of evidences form papyrus manuscripts and ancient monuments in
pharaonic temples, it is assumed that first census in the world was carried out
in Egypt in 3050 BC. Yet, China’s census data around 2000 BC is considered
as the oldest surviving census data in the world.
Statistics in India
In the 3rd century BC when “Arthashastra” came into existence written by one of the greatest geniuses of
political administration, Kautilya. In it, he had described the details related to conduct of population,
agriculture and economic census. An efficient system of collecting official and administrative statistics
was in use during the reign of Chandra Gupta Maurya ( 324-300 BC) under the guidance of Kautilya.
Many things like taxation policy of the state, governance and administration, public finance, duties of
a king, etc. had also been discussed in this celebrated Arthashastra.
Another evidence that statistics was in use during Emperor Akbar’s empire (1556-1605) is in the form of
“Ain-I-Akbari” written by Abul Fazl, one of the nine jems of Akbar. Raja Todar Mal, Akbar’s finance
minister and another one of the nine jems of Akbar, used to keep very good records of land and
revenue and he developed a very systematic revenue collections system in the kingdom of Akbar by
using his expertise and the recorded data. Revenue collection system developed by Raja Todar Mal
was so systematic that it became a model for future Mughals and later on for British.
British Government, after transfer of the power from East India Company to it, started a publication
entitled ‘Statistical Abstract of British India’ as a regular annual feature in 1868 in which all the useful
statistical information related to local administrations to all the British Provinces was provided. In
between some census reports were coming on based on a particular area, but not at the national
level. The first attempt to get detailed information on the whole population of India was made between
1867 and 1872. First decennial census was undertaken on 17th February 1881 by W.W. Plowden,
first census commissioner of India. After that a census has been carried out over a period of 10 years
in India. 2011 census was the 15th census in India.
Credit of establishing Statistics as a discipline in India goes to Prasanta Chandra Mahalanobis (P.C.
Mahalanobis). He was a professor of physics in the Presidency College in Kolkata. During his study at
Cambridge he got a chance to go through the work of Karl Pearson and R. A. Fisher. Continuing his
interest in Statistics, he established a Statistical laboratory in the Presidency College Kolkata. On 17
December 1931, this statistical laboratory was given the nameIndian Statistical Institute (ISI).
First post graduate course in Statistics was started by Kolkata University in 1941, while first under
graduate course in Statistics was started by the Presidency College Kolkata.
DEFINITION OF STATISTICS
“Statistics is the science of counting”. – A.L. Bowly.
“Statistics is the science of average.” – A.L. Bowly.
Statistics is “The science of the measurement of the social organism, regarded as a whole, in all
its manifestations.” – A.L. Bowly.
“Statistics are the numerical statements of facts in any department of enquiry placed in relation to
each other.” – A.L. Bowly.
“By statistics we mean quantitative data affected to a marked extent by multiplicity of causes.” –
Yule and Kendall.
“Science of estimates and probabilities.” – Boddington.
“The method of judging collective natural or social phenomena from the results obtained by the
analysis of an enumeration or collection of estimaties.” – W.I. King.
“Statistics is the science which deals with collection classification and tabulation of numerical
facts as the basis for explanation description and comparison of phenomenon”. – Lovitt.
“The science which deals with the collection, tabulation, analysis and interpretation of numerical
data.” – Croxton and Cowden.
From the above author definition comprehended as
“Statistics is a branch of science which deals with collection, classification,
tabulation, analysis and interpretation of data.”
DATA
Data play the role of raw material for any statistical
investigation and defined in a single sentence as
“The values of different objects collected in a survey or
recorded values of an experiment over a time period taken
together constitute what we call data in Statistics”
Each value in the data is known as observation. Statistical
data based on the characteristic, nature of the
characteristic, level of measurement, time and ways of
obtaining it may be classified as follows:
Types of Data
Based on the characteristic
 Qualitative Data
 Quantitative Data
Based on nature of the characteristic
 Discrete data
 Continuous data
Based on the level of measurement
 Nominal Data
 Ordinal Data
 Interval Data
 Ratio Data
Based on the Time Component
 Time Series data
 Cross Sectional data
Based on the ways of obtaining the data
 Primary Data
 Secondary Data
Quantitative Data
As the name quantitative itself suggests that it is related to the quantity. In fact, data are
said to be quantitative data if a numerical quantity (which exactly measure the
characteristic under study) is associated with each observation.
Generally, interval or ratio scales are used as a measurement of scale in case of
quantitative data. Data based on the following characteristics generally gives
quantitative type of data. Such as weight, height, ages, length, area, volume, money,
temperature, humidity, size, etc.
For example,
(i) Weights in kilogram (say) of students of a class.
(ii)Height in centimetre (say) of the candidates appearing in a direct recruitment of
Indian army organised by a particular cantonment.
(iii)Age of the females at the time of marriage celebrated over a period of week in
Delhi.
(iv)Length (in cm) of different tables in a showroom of furniture.
Qualitative Data
As the name qualitative itself suggests that it is related to the quality of an object/thing. It is
obvious that quality cannot be measured numerically in exact terms. Thus, if the
characteristic/attribute under study is such that it is measured only on the bases of presence or
absence then the data thus obtained is known as qualitative data.
Generally nominal and ordinal scales are used as a measurement of scale in case of qualitative
data. Data based on the following characteristics generally gives qualitative data. Such as
gender, marital status, qualification, colour, religion, satisfaction, types of trees, beauty,
honesty, etc.
For example,
i. If the characteristic under study is gender then objects can be divided into two categories,
male and female.
ii. If the characteristic under study is marital status then objects can be divided into four
categories married, unmarried, divorcee, widower.
iii. If the characteristic under study is qualification (say) ‘matriculation’ then objects can be
divided into two categories as ‘Matriculation passed’ and ‘not passed’.
iv. If the characteristic under study is ‘colour’ then the objects can be divided into a number of
categories Violet, Indigo, Blue, Green, Yellow, Orange and Red.
Discrete Data
If the nature of the characteristic under study is such that values of observations may be
at most countable between two certain limits then corresponding data are known as
discrete data
For example,
(i) Number of books on the self of an Elmira in a library form discrete data. Because
number of books may be 0 or 1 or 2 or 3,…. But number of books cannot take any
real values such as 0.8, 1.32, 1.53245, etc.
(ii)If there are 30 students in a class, then number of students presents in a lecture
forms discrete data. Because number of present students may be 1 or 2 or 3 or 4
or…or 30. But number of present students cannot take any real values between 0
and 30 such as 1.8675, 22.56, 29.95, etc.
(iii)Number of children in a family in a locality forms discrete data. Because number of
children in a family may be 0 or 1 or 2 or 3 or 4 or…. But number of children
cannot take any real values such as 2.3, 3.75, etc.
(iv)Number of mistakes on a particular page of a book. Obviously number of
mistakes may be 0 or 1 or 2 or 3…. But cannot be 6.74, 3.9832, etc.
Continuous Data
Data are said to be continuous if the measurement of the observations of
a characteristic under study may be any real value between two
certain limits.
For example,
(i)Data obtained by measuring the heights of the students of a class of
say 30 students form continuous data, because if minimum and
maximum heights are 152cm and 175 cm then heights of the students
may take any possible values between 152 cm and 175 cm. For
example, it may be 152.2375 cm, 160.31326… cm, etc.
(ii)Data obtained by measuring weights of the students of a class also
form continuous data because weights of students may be 48.25796…
kg, 50.275kg, 42.314314314…kg, etc.
Time Series Data
Collection of data is done to solve a purpose in hand. The purpose may have its connection with time,
geographical location or both. If the purpose of data collection has its connection with time then it is
known as time series data. That is, in time series data, time is one of the main variables and the data
collected usually at regular interval of time related to the characteristic(s) under study show how
characteristic(s) changes over the time.
For example, quarterly profit of a company for last eight quarters, yearly production of a crop in India for last
six years, yearly expenditure of a family on different items for last five years, weekly rate of inflation for
last ten weeks, etc. all form time series data.
If the purpose of the data collection has its connection with geographical location then it is known as Spatial
Data. For example,
(i) Price of petrol in Delhi, Haryana, Punjab, Chandigarh at a particular time.
(ii) Number of runs scored by a batsman in different matches in a one day series in different stadiums.
If the purpose of the data collection has its connection with both time and geographical location then it is
known as Spacio Temporal Data.
For example, data related to population of different states in India in 2001 and 2011 will be Spacio Temporal
Data.
In time series data, spatial data and spacio temporal data we see that concept of frequency have no
significance and hence known as non-frequency data.
For instance, in the example discussed in case of time series data, expenditure of Rs 40000 on food in 2006 is
itself important, here its frequency say 3 (repeated three times) does not make any sense.
Now consider the case of marks of 40 students in a class out of 10 (say). Here we note that there may be
more than one student who score same marks in the test. Suppose out of 40 students 5 score 10 out of 10,
it means marks 10 have frequency 5. This type of data where frequency is meaningful is known as
frequency data.
Cross Sectional Data
Sometimes we are interested to know that how is a characteristic (such
as income or expenditure, population, votes in an election, etc.) under
study at one point in time is distributed over different subjects (such
as families, countries, political parties, etc.). This type of data which
is collected at one point in time is known as cross sectional data.
For example, annual income of different families of a locality, survey of
consumer’s expenditure conducted by a research scholar, opinion
polls conducted by an agency, salaries of all employees of an institute,
etc.
Primary Data
Data which are collected by an investigator or agency or institution for a specific purpose and
these people are first to use these data, are called primary data. That is, these data are
originally collected by these people and they are first to use these data.
For example, suppose a research scholar wants to know the mean age of students of M.Sc.
Chemistry of a particular university. If he collects the data related to the age of each student
of M.Sc. Chemistry of that particular university by contacting each student personally. The
data so obtained by the research scholar is an example of primary data for the same research
scholar.
There are a number of methods of collection of primary data depending upon many factors such
as geographical area of the field, money available, time period, accuracy needed, literacy of
the respondents/informants, etc.
Here we will discuss only following commonly used methods.
(1) Direct Personal Investigation Method
(2) Telephone Method
(3) Indirect Oral Interviews Method
(4) Local Correspondents Method
(5) Mailed Questionnaires Method
(6) Schedules Method
Let us discuss these methods one by one with some examples, merits and demerits.
SECONDARY DATA
Discussion in the previous section shows that collection of primary data requires lot of time, money,
manpower, etc. But sometimes some or all these resources are not sufficient to go for the collection of
primary data. Also, in some situations it may not be feasible to collect primary data easily. To overcome
these types of difficulties, there is another way of collecting data known as secondary data. The data
obtained/gathered by an investigator or agency or institution from a source which already exists, are
called secondary data. That is, these data were originally collected by an investigator or agency or
institution and has been used by them at least once and now, these are going to be used at least second
time. Already existed data in different sources may be in published or unpublished form. So sources of
secondary data can broadly be classified under the following two heads.
(1) Published Sources
When an institution or organisation publishes its own collected data (primary data) in public domain either in
printed form or in electronic form then these data are said to be secondary data in published form and the
source where these data are available is known as published source of the secondary data of the
corresponding institution or organisation. Some of the published sources of secondary data are given
below:
 International Publications
 Government Publications in India
 Published Reports of Commissions and Committees
 Research Publications
 Reports of Trade and Industry Associations
 Published Printed Sources
 Published Electronic Sources
SECONDARY DATA(2)
(2)Unpublished Sources- Collected information in term of
data or data observed through own experience by an
individual or by an organisation which is in unpublished
form is known as unpublished source of secondary data.
(i) Records and statistics maintained by different institutions
or organisations whether they are government or non-
government
(ii)Unpublished projects works, field works or some other
research related works submitted by students in their
corresponding institutes
(iii)Records of Central Bureau of Investigation
(iv)Personal diaries, etc.
MEASUREMENT SCALES
Two words “counting” and “measurement” are very frequently used by everybody. For
example, if you want to know the number of pages in a note book, you can easily
count them. Also, if you want to know the height of a man, you can easily measure
it. But, in Statistics, act of counting and measurement is divided into 4 levels of
measurement scales known as
(1) Nominal Scale
In Latin, ‘Nomen’ means name. The word nominal has come from this Latin word, i.e.
‘Nomen’. Therefore, under nominal scale we divide the objects under study into
two or more categories by giving them unique names. The classification of objects
into atleast two or more categories is done in such a way that
(a) Each object takes place only in one category, i.e. each object falls in a unique
category, i.e. it either belongs to a category or not. Mathematically, we may use the
symbol (“=”, “”) if an object falls in a category or not. (b) Number of categories
must be sufficient to include all objects, i.e. there should not be scope for missing
even a single object which does not fall in any of the categories. That is, in
statistical language categories must be mutually exclusive and exhaustive. Generally
nominal scale is used when we want to categories the data based on the
characteristic such as gender, race, region, religion, etc.
(2) Ordinal Scale
We have seen that order does not make any sense in nominal scale. As the name ordinal itself suggests
that other than the names or codes given to the different categories, it also provides the order
among the categories. That is, we can place the objects in a series based on the orders or ranks
given by using ordinal scale. But here we cannot find actual difference between the two categories.
Generally ordinal scale is used when we want to measure the attitude scores towards the level of liking,
satisfaction, preference, etc. Different designation in an institute can also be measured by using
ordinal scale. For example
Suppose, a school boy is asked to list the name of three ice-cream flavours according to his preference.
Suppose he lists them in the following order:
Vanilla Straw berry Tooty-frooty
This indicates that he likes vanilla more compared to straw berry and straw berry more as
compared to tooty-frooty. But the actual difference between his liking between vanilla and straw berry
cannot be measured.
In sixth pay commission, teachers of colleges and universities are designated as Assistant Professor,
Associate Professor and Professor. The rank of Professor is higher than that of Associate Professor
and designation of Associate Professor is higher than Assistant Professor. But you cannot find the
actual difference between Professor and Associate Professor or Professor and Assistant Professor
or Associate Professor and Assistant Professor. This is because, one teacher in a designation might
have served certain number of years and have done a good quality of research work, etc. and other
teacher in the same designation might have served for lesser number of years have done
unsatisfactory research work, etc. So, the actual difference between one designation and other
designation cannot be found. So one may be very near to his next higher designation and other may
(3) Interval Scale
If I = [4, 9] then length of this interval is 9-4 =5, i.e. difference between 4 and 9 is 5, i.e.
we can find the difference between any two points of the interval. For example, 7, 7.3,
difference between 7 and 7.3 is 0.3. Thus we see that property of difference holds in case
of intervals. Similarly, third level of measurement, i.e. interval scale possesses the
property of difference which was not satisfied in case of nominal and ordinal scales.
Nominal scale gives only names to the different categories, ordinal scale moving one step
further also provides the concept of order between the categories and interval scale
moving one step ahead to ordinal scale also provides the characteristic of the difference
between any two categories.
Interval scale is used when we want to measure years/historical time/calendar time,
temperature (except in the Kelvin scale), sea level, marks in the tests where there is
negative marking also, etc. Mathematically, this scale includes +, – in addition to >, <
and = and not equal.
let us consider some examples:
The measurement of time of an historical event comes under interval scale because there is
no fixed origin of time (i.e. ‘0’ year). As’0’ year differ calendar to calendar or
society/country to society/country e.g. Hindus, Muslim and Hebrew calendars have
different origin of time, i.e. ‘0’ year is not defined. In Indian history also, we may find
BC (Before Christ).
(4) Ratio Scale
Ratio scale is the highest level of measurement because nominal scale gives only names
to the different categories, ordinal scale provides orders between categories other
than names, interval scale provides the facility of difference between categories
other than names and orders but ratio scale other than names, orders and
characteristic of difference also provides natural zero (absolute zero). In ratio
measurement scale values of characteristic cannot be negative.
Ratio scale is used when we want to measure temperature in Kelvin, weight, height,
length, age, mass, time, plane angle, etc. Ratio scale includes x, division in addition
to +, –, >, <, =, not equal. But be careful never take ‘0’ in the denominator while
finding ratios.
For example, 0/4 is meaningless.
let us consider some examples,
Measurement of temperature in Kelvin scale comes under ratio scale because it has an
absolute zero which is equivalent to C 15.273 0. This characteristic of origin allows
us to make the statement like 50K (‘50K’ read as 50 degree Kelvin) is 5 time hot
compare to 10K.
Both height (in cm.) and age (in days) of students of M.Sc. Statistics of a particular
university satisfy all the requirements of a ratio scale. Because height and age both
cannot be negative (i.e have an absolute zero).
Permissible Statistical Tools in measurement scales
MEASUREMENT SCALE PERMISSIBLE
STATISTICAL TOOLS
LOGIC/REASON
NOMINAL SCALE Mode, chi-square test and run
test
Here counting is only
permissible operation.
ORDINAL SCALE Median all positional averages
like quartile, Decile,
percentile, Spearman’s Rank
correlation
Here other than counting,
order relation (less than or
greater than) also exists.
INTERVAL SCALE Mean , S.D., t-test, F-test,
ANOVA, sample multiple and
moment correlations,
regression.
Here counting, order and
difference operations hold.
RATIO SCALE Geometric mean (G.M.),
Harmonic mean (H.M.),
Coefficient of variation.
Here counting, order,
difference and natural zero
exist.
Types of Data Analysis
1. Descriptive Statistics - provide an overview of the attributes
of a data set. These include measurements of central tendency
(frequency, histograms, mean, median, & mode) and dispersion
(range, variance & standard deviation)
2. Inferential Statistics - provide measures of how well your
data support your hypothesis and if your data are generalizable
beyond what was tested (significance tests)
Types of Data Analysis
Descriptive
Measures of central tendency
Measures of dispersion
Measures of Skewness
Correlation and regression of
two variables
Inferential
Parametric tests-
Hypothesis testing: z-test; t-test;
ANOVA(1 Way);
Chi–square test;
Non-Parametric tests-
Mann-Whitney test (U-test);
Kruskal-Wallis test (H-test);
Rank correlation test
Measures of central tendency-According to Professor Bowley, averages are
“statistical constants which enable us to comprehend in a single effort the significance
of the whole”. They throw light as to how the values are concentrated in the central part
of the distribution. For this reason as on last page that they are also called the measures
of central tendency, an average is a single value which is considered as the most
representative for a given set of data. Measures of central tendency show the tendency
of some central value around which data tend to cluster.
Significance of the Measure of Central Tendency
The following are two main reasons for studying an average:
1. To get a single representative
Measure of central tendency enables us to get a single value from the mass of data and
also provide an idea about the entire data. For example it is impossible to remember
the heights measurement of all students in a class. But if the average height is
obtained, we get a single value that represents the entire class.
2. To facilitate comparison
Measures of central tendency enable us to compare two or more than two populations
by reducing the mass of data in one single figure. The comparison can be made
either at the same time or over a period of time. For example, if a subject has been
taught in more than two classes so by obtaining the average marks of those classes,
comparison can be made.
Properties of a Good Average
1. It should be simple to understand Since we use the measures of central tendency to simplify the
complexity of a data, so an average should be understandable easily otherwise its use is bound to be very
limited.
2. It should be easy to calculate An average not only should be easy to understand but also should be simple
to compute, so that it can be used as widely as possible.
3. It should be rigidly defined A measure of central tendency should be defined properly so that it has an
appropriate interpretation. It should also have an algebraic formula so that if different people compute the
average from same figures, they get the same answer.
4. It should be liable for algebraic manipulations A measure of central tendency should be liable for the
algebraic manipulations. If there are two sets of data and the individual information is available for both
set, then one can be able to find the information regarding the combined set also then something is
missing.
5. It should be least affected by sampling fluctuations We should prefer a tool which has a sampling stability.
In other words, if we select 10 different groups of observations from same population and compute the
average of each group, then we should expect to get approximately the same values. There may be little
difference because of the sampling fluctuation only.
6. It should be based on all the observations If any measure of central tendency is used to analyse the data, it
is desirable that each and every observation is used for its calculation.
7. It should be possible to calculate even for open-end class intervals
A measure of central tendency should able to be calculated for the data with open end classes.
8. It should not be affected by extremely small or extremely large observations
It is assumed that each and every observation influences the value of the average. If one or two very small or
very large observations affect the average i.e. either increase or decrease its value largely, then the
average cannot be consider as a good average.
Different Measures of central tendency
1) Arithmetic Mean
2) Weighted Mean
3) Geometric Mean
4) Harmonic Mean
5) Median
6) Mode
Partition Values
1) Quartiles
2) Deciles
3) Percentiles
Arithmetic Mean Arithmetic mean (also called mean) is defined as the
sum of all the observations divided by the number of observations.
Arithmetic mean fulfils most of the properties of a good average except
the last two. It is particularly useful when we are dealing with a sample
as it is least affected by sampling fluctuations. It is the most popular
average and should always be our first choice unless there is a strong
reason for not using it.
Calculate mean of the weights of five students 54, 56, 70, 45, 50 (in kg)
Sum of the given value is 275/5=55
Therefore, average weight of students is 55 kg
Merits
1. It utilizes all the observations;
2. It is rigidly defined;
3. It is easy to understand and
compute; and
4. It can be used for further
mathematical treatments.
Demerits
1. It is badly affected by extremely
small or extremely large values;
2. It cannot be calculated for open
end class intervals; and
3. It is generally not preferred for
highly skewed distributions.
WEIGHTED MEAN
Weight here refers to the importance of a value in a distribution. A
simple logic is that a number is as important in the distribution as the
number of times it appears. So, the frequency of a number can also be
its weight. But there may be other situations where we have to
determine the weight based on some other reasons.
For example, the number of innings in which runs were made may be
considered as weight because runs (50 or 100 or 200) show their
importance. Calculating the weighted mean of scores of several
innings of a player, we may take the strength of the opponent (as
judged by the proportion of matches lost by a team against the
opponent) as the corresponding weight. Higher the proportion
stronger would be the opponent and hence more would be the weight.
MEDIAN
Median is that value of the variable which divides the whole distribution into two equal parts. Here, it may be
noted that the data should be arranged in ascending or descending order of magnitude. When the number of
observations is odd then the median is the middle value of the data. For even number of observations, there
will be two middle values. So we take the arithmetic mean of these two middle values. Number of the
observations below and above the median, are same. Median is not affected by extremely large or extremely
small values (as it corresponds to the middle value) and it is also not affected by open end class intervals. In
such situations, it is preferable in comparison to mean. It is also useful when the distribution is skewed
(asymmetric).
Find median of following observations: 6, 4, 3, 7, 8
First we arrange the given data in ascending order as 3, 4, 6, 7, 8
Since, the number of observations i.e. 5, is odd, so median would be the middle value that is 6.
Merits
1. It is rigidly defined;
2. It is easy to understand and compute;
3. It is not affected by extremely small or extremely
large values; and
4. It can be calculated even for open end classes (like
“less than 10” or “50 and above”).
Demerits
1. In case of even number of observations we get
only an estimate of the median by taking the
mean of the two middle values. We don’t get its
exact value;
2. It does not utilize all the observations. The median
of 1, 2, 3 is 2. If the observation 3 is replaced by
any number higher than or equal to 2 and if the
number 1 is replaced by any number lower than
or equal to 2, the median value will be
unaffected. This means 1 and 3 are not being
utilized;
3. It is not amenable to algebraic treatment; and
4. It is affected by sampling fluctuations.
MODE
Highest frequent observation in the distribution is known as mode. In other words,
mode is that observation in a distribution which has the maximum frequency. For
example, when we say that the average size of shoes sold in a shop is 7 it is the modal
size which is sold most frequently.
Merits
1. Mode is the easiest average to
understand and also easy to calculate;
2. It is not affected by extreme values;
3. It can be calculated for open end
classes;
4. As far as the modal class is
confirmed the pre-modal class and
the post modal class are of equal
width; and
5. Mode can be calculated even if the
other classes are of unequal width
Demerits
1. It is not rigidly defined. A
distribution can have more than one
mode;
2. It is not utilizing all the observations;
3. It is not amenable to algebraic
treatment; and
4. It is greatly affected by sampling
fluctuations.
Relationship between Mean, Median
and Mode
For a symmetrical distribution the mean, median and
mode coincide. But if the distribution is moderately
asymmetrical, there is an empirical relationship
between them. The relationship is
Mean – Mode = 3 (Mean – Median)
Mode = 3 Median – 2 Mean
Note: Using this formula, we can calculate
mean/median/mode if other two of them are known.
GEOMETRIC MEAN The geometric mean (GM) of n observations is defined as the n-th
root of the product of the n observations. It is useful for averaging ratios or proportions. It
is the ideal average for calculating index numbers (index numbers are economic barometers
which reflect the change in prices or commodity consumption in the current period with
respect to some base period taken as standard). It fails to give the correct average if an
observation is zero or negative.
Merits
1. It is rigidly defined;
2. It utilizes all the observations;
3. It is amenable to algebraic treatment
(the reader should verify that if GM1
and GM2 are Geometric Means of two
series-Series 1 of size n and Series 2
of size m respectively, then Geometric
Mean of the combined series is given
by
Log GM = (n GM1 + m GM2) / (n + m);
4. It gives more weight to small items;
and
5. It is not affected greatly by sampling
fluctuations.
Demerits
1. Difficult to understand and calculate;
and
2. It becomes imaginary for an odd
number of negative observations and
becomes zero or undefined if a single
observation is zero.
HARMONIC MEAN
HM is defined as the value obtained when the number of values in the
data set is divided by the sum of reciprocals
The harmonic mean (HM) is defined as the reciprocal (inverse) of the
arithmetic mean of the reciprocals of the observations of a set.
Merits
1. It is rigidly defined;
2. It utilizes all the
observations;
3. It is amenable to algebraic
treatment; and
4. It gives greater importance
to small items.
Demerits
1. Difficult to understand and
compute.
PARTITION VALUES-
Partition values are those values of variable which divide the distribution into a certain number of equal parts.
Here it may be noted that the data should be arranged in ascending or descending order of magnitude.
Commonly used partition values are quartiles, deciles and percentiles. For example, quartiles divide the data
into four equal parts. Similarly, deciles and percentiles divide the distribution into ten and hundred equal
parts, respectively.
Quartiles
Quartiles divide whole distribution in to four equal parts. There are three quartiles- 1st Quartile
denoted as Q1, 2nd Quartile denoted as Q2 and 3rd Quartile as Q3, which divide the whole
data in four parts. 1st Quartile contains the ¼ part of data, 2nd Quartile contains ½ of the data
and 3rd Quartile contains the ¾ part of data. Here, it may be noted that the data should be
arranged in ascending or descending order of magnitude.
Deciles
Deciles divide whole distribution in to ten equal parts. There are nine deciles. D1, D2,...,D9 are
known as 1st Decile, 2nd Decile,...,9th Decile respectively and ith Decile contains the
(iN/10)th part of data. Here, it may be noted that the data should be arranged in ascending or
descending order of magnitude.
Percentiles
Percentiles divide whole distribution in to 100 equal parts. There are ninety nine percentiles. P1,
P2, …,P99 are known as 1st percentile, 2nd percentile,…,99th percentile and ith percentile
contains the (iN/100)th part of data. Here, it may be noted that the data should be arranged in
ascending or descending order of magnitude.
MEASURES OF DISPERSION
Different measures of central tendency give a value around which the data is concentrated. But it
gives no idea about the nature of scatter or spread. For example, the observations 10, 30 and
50 have mean 30 while the observations 28, 30, 32 also have mean 30. Both the distributions
are spread around 30. But it is observed that the variability among units is more in the first
than in the second. In other words, there is greater variability or dispersion in the first set of
observations in comparison to other. Measure of dispersion is calculated to get an idea about
the variability in the data.
According to Spiegel, the degree to which numerical data tend to spread about an average value
is called the variation or dispersion of data. Actually, there are two basic kinds of a measure
of dispersion (i) Absolute measures and (ii) Relative measures. The absolute measures of
dispersion are used to measure the variability of a given data expressed in the same unit,
while the relative measures are used to compare the variability of two or more sets of
observations. Following are the different measures of dispersion:
1. Range
2. Quartile Deviation
3. Mean Deviation
4. Standard Deviation and Variance
Properties of Good Measure of
Dispersion
The properties of a good measure of dispersion are similar to the
properties of a good measure of average. So, a good measure of
dispersion should possess the following properties:
1. It should be simple to understand;
2. It should be easy to compute;
3. It should be rigidly defined;
4. It should be based on each and every observations of data;
5. It should be amenable to further algebraic treatment;
6. It should have sampling stability; and
7. It should not be unduly affected by extreme observations
RANGE
Range is the simplest measure of dispersion. It is defined as the difference between the
maximum value of the variable and the minimum value of the variable in the
distribution. Its merit lies in its simplicity. The demerit is that it is a crude measure
because it is using only the maximum and the minimum observations of variable.
However, it still finds applications in Order Statistics and Statistical Quality
Control.
R=X max-X min
where, X max : Maximum value of variable and
X min : Minimum value of variable
Find the range of the distribution 6, 8, 2, 10, 15, 5, 1, 13.
For the given distribution, the maximum value of variable is 15 and the minimum value
of variable is 1. Hence range = 15 -1 = 14.
Merits of Range
1. It is the simplest to understand;
2. It can be visually obtained since one can detect the largest and the smallest
observations easily and can take the difference without involving much calculations;
and
3. Though it is crude, it has useful applications in areas like order statistics and
QUARTILE DEVIATION
As you have already studied about quartile that Q1 and Q3 are the first
quartile and the third quartile respectively. (Q3 – Q1) gives the inter
quartile range. The semi inter quartile range which is also known as
Quartile Deviation (QD) is given by
Quartile Déviation (QD) = (Q3 – Q1) / 2
Relative measure of Q.D. known as Coefficient of Q.D. and is defined as
Coefficient of Q.D = Q3-Q1/Q3+Q1
MEAN DEVIATION
Mean deviation is defined as average of the sum of the absolute values of deviation
from any arbitrary value viz. mean, median, mode, etc. It is often suggested to
calculate it from the median because it gives least value when measured from the
median.
The deviation of an observation xi from the assumed mean A is defined as (xi – A).
Therefore, the mean deviation can be defined as
MD = (xi – A)/n
Merits of Mean Deviation
1. It utilizes all the observations;
2. It is easy to understand and calculate; and
3. It is not much affected by extreme values.
Demerits of Mean Deviation
1. Negative deviations are straightaway made positive;
2. It is not amenable to algebraic treatment; and
3. It can not be calculated for open end classes
VARIANCE
In the previous section, we have seen that while calculating the mean deviation, negative deviations are
straightaway made positive. To overcome this drawback we move towards the next measure of dispersion
called variance. Variance is the average of the square of deviations of the values taken from mean. Taking a
square of the deviation is a better technique to get rid of negative deviations.
Variance is defined as
And for frequency distribution, the formula is
It should be noted that sum of squares of deviations is least when deviations are measured from the mean.
This means Sum(xi – A)2 is least when A = Mean.
Merits of Variance
1. It is rigidly defined;
2. It utilizes all the observations;
3. Amenable to algebraic treatment;
4. Squaring is a better technique to get rid of negative deviations; and
5. It is the most popular measure of dispersion.
Demerits of Variance
1. In cases where mean is not a suitable average, standard deviation may not be the coveted measure of
dispersion like when open end classes are present. In such cases quartile deviation may be used;
2.Although easy to understand, calculation may require a calculator or a computer; and
3.Its unit is square of the unit of the variable due to which it is difficult to judge the magnitude of dispersion
compared to standard deviation.
Standard Deviation
Standard deviation (SD) is defined as the positive square root of variance. The formula is
Merits of Standard Deviation
1. It is rigidly defined;
2. It utilizes all the observations;
3. It is amenable to algebraic treatment;
4. Squaring is a better technique to get rid of negative deviations; and
5. It is the most popular measure of dispersion.
Demerits of Standard Deviation
1. In cases where mean is not a suitable average, standard deviation may not be the appropriate
measure of dispersion like when open end classes are present. In such cases quartile deviation
may be used;
2. It is not unit free; and
3. Although it is easy to understand but calculation may require a calculator or a computer.
SKEWNESS
We have talked about average and dispersion. They give the location and scale of the
distribution.
In addition to measures of central tendency and dispersion, we also need to have an idea about
the shape of the distribution. Measure of Skewness gives the direction and the magnitude of the
lack of symmetry whereas the kurtosis gives the idea of flatness.
Lack of symmetry is called skewness for a frequency distribution. If the distribution is not
symmetric, the frequencies will not be uniformly distributed about the centre of the distribution.
CONCEPT OF SKEWNESS
Skewness means lack of symmetry. In mathematics, a figure is called symmetric if there exists a
point in it through which if a perpendicular is drawn on the X-axis, it divides the figure into two
congruent parts i.e. identical in all respect or one part can be superimposed on the other i.e mirror
images of each other. In Statistics, a distribution is called symmetric if mean, median and mode
coincide. Otherwise, the distribution becomes asymmetric. If the right tail is longer, we get a
positively skewed distribution for which mean > median > mode while if the left tail is longer,
we
get a negatively skewed distribution for which mean < median < mode.
The example of the Symmetrical curve, Positive skewed curve and Negative skewed curve are
given in the next slide
Business statistics for NET-JRF SLET according to updated syllabus
VARIOUS MEASURES OF SKEWNESS
Measures of skewness help us to know to what degree and in which direction (positive or
negative) the frequency distribution has a departure from symmetry. Although positive or
negative skewness can be detected graphically depending on whether the right tail or the left
tail is longer but, we don’t get idea of the magnitude. Besides, borderline cases between
symmetry and asymmetry may be difficult to detect graphically. Hence some statistical
measures are required to find the magnitude of lack of symmetry.
A good measure of skewness should
1. It should be a unit free number so that the shapes of different distributions, so far as symmetry
is concerned, can be compared even if the unit of the underlying variables are different;
2. If the distribution is symmetric, the value of the measure should be zero. Similarly, the
measure should give positive or negative values according as the distribution has positive or
negative skewness respectively; and
3. As we move from extreme negative skewness to extreme positive skewness, the value of the
measure should vary accordingly. possess three criteria:
Measures of skewness can be both absolute as well as relative. Since in a symmetrical
distribution mean, median and mode are identical more the mean moves away from the
mode, the larger the asymmetry or skewness. An absolute measure of skewness can not be
used for purposes of comparison because of the same amount of skewness has different
meanings in distribution with small variation and in distribution with large variation.
Absolute Measures of Skewness
Following are the absolute measures of skewness:
1. Skewness (Sk) = Mean – Median
2. Skewness (Sk) = Mean – Mode
3. Skewness (Sk) = (Q3 - Q2) - (Q2 - Q1)
For comparing to series, we do not calculate these absolute measures we
calculate the relative measures which are called coefficient of
skewness. Coefficient of skewness are pure numbers independent of
units of measurements.
Relative Measures of Skewness
In order to make valid comparison between the skewness of two or more
distributions we have to eliminate the distributing influence of
variation. Such elimination can be done by dividing the absolute
skewness by standard deviation.
The following are the important methods of measuring relative
skewness:
 Karl Pearson’s coefficient of skewness=
Sk=Mean-Mode/SD
Sk(P)=3(Mean-Mode)/SD
 Bowley’s coefficient of skewness
Sk(B)=Q3+Q1-2Md/Q3-Q1
CORRELATION CONCEPT
In many practical applications, we might come across the situation where observations are
available on two or more variables. The following examples will illustrate the situations clearly:
1. Heights and weights of persons of a certain group;
2. Sales revenue and advertising expenditure in business; and
3. Time spent on study and marks obtained by students in exam.
If data are available for two variables, say X and Y, it is called bivariate distribution.
Let us consider the example of sales revenue and expenditure on advertising in business. A
natural question arises in mind that is there any connection between sales revenue and
expenditure on advertising? Does sales revenue increase or decrease as expenditure on
advertising increases or decreases?
If we see the example of time spent on study and marks obtained by students, a natural question
appears whether marks increase or decrease as time spent on study increase or decrease.
In all these situations, we try to find out relation between two variables and correlation answers
the question, if there is any relationship between one variable and another.
When two variables are related in such a way that change in the value of one variable
affects the value of another variable, then variables are said to be correlated or there is
correlation between these two variables.
TYPES OF CORRELATION
1. Positive Correlation
Correlation between two variables is said to be positive if the values of the variables
deviate in the same direction i.e. if the values of one variable increase (or decrease)
then
the values of other variable also increase (or decrease).
Some examples of positive correlation are correlation between
1. Heights and weights of group of persons;
2. House hold income and expenditure;
3. Amount of rainfall and yield of crops; and
4. Expenditure on advertising and sales revenue
In the last example, it is observed that as the expenditure on advertising increases, sales
revenue also increases. Thus, the change is in the same direction. Hence the
correlation is positive.
In remaining three examples, usually value of the second variable increases (or
decreases) as the value of the first variable increases (or decreases).
2. Negative Correlation
Correlation between two variables is said to be negative if the values of variables
deviate in opposite direction i.e. if the values of one variable increase (or decrease) then
the values of other variable decrease (or increase).
Some examples of negative correlations are correlation between
1. Volume and pressure of perfect gas;
2. Price and demand of goods;
3. Literacy and poverty in a country; and
4. Time spent on watching TV and marks obtained by students in examination.
In the first example pressure decreases as the volume increases or pressure increases as
the volume decreases. Thus the change is in opposite direction.
Therefore, the correlation between volume and pressure is negative.
In remaining three examples also, values of the second variable change in the opposite
direction of the change in the values of first variable.
SCATTER DIAGRAM
Scatter diagram is a statistical tool for determining the potentiality of correlation between
dependent variable and independent variable. Scatter diagram does not tell about exact
relationship between two variables but it indicates whether they are correlated or not.
Let (Xi, Yi); (1,2,.......n) be the bivariate distribution. If the values of the dependent variable Y
are plotted against corresponding values of the independent variable X in the XY plane, such
diagram of dots is called scatter diagram or dot diagram. It is to be noted that scatter diagram
is not suitable for large number of observations.
Interpretation from Scatter Diagram
If dots are in the shape of a line and line rises from left bottom to the right top (Fig.1), then
correlation is said to be perfect positive.
If dots in the scatter diagram are in the shape of a line and line moves from left top to
right bottom (Fig. 2), then correlation is perfect negative.
If dots show some trend and trend is upward rising from left bottom to right top (Fig.3)
correlation is positive.
If dots show some trend and trend is downward from left top to the right bottom (Fig.4) correlation is said to be negative.
If dots of scatter diagram do not show any trend (Fig. 5) there is no correlation between the variables.
COEFFICIENT OF CORRELATION
Scatter diagram tells us whether variables are correlated or not. But it does not indicate the extent
of which they are correlated. Coefficient of correlation gives the exact idea of the extent of
which they are correlated.
If X and Y are two random variables then correlation coefficient between X and Y is denoted by
r and defined as
Coefficient of correlation measures the intensity or degree of linear relationship between two
variables. It was given by British Biometrician Karl Pearson (1867-1936).
Assumptions for Correlation Coefficient
1. Assumption of Linearity Variables being used to know correlation coefficient must be linearly
related. You can see the linearity of the variables through scatter diagram.
2. Assumption of Normality Both variables under study should follow Normal distribution. They
should not be skewed in either the positive or the negative direction.
3. Assumption of Cause and Effect Relationship There should be cause and effect relationship
between both variables, for example, Heights and Weights of children, Demand and Supply
of goods, etc. When there is no cause and effect relationship between variables then
correlation coefficient should be zero. If it is non zero then correlation is termed as chance
correlation or spurious correlation.
For example, correlation coefficient between:
1. Weight and income of a person over periods of time; and
2. Rainfall and literacy in a state over periods of time.
LINEAR REGRESSION
Prediction or estimation is one of the major problems in most of the human activities. Like
prediction of future production of any crop, consumption, price of any good, sales, income,
profit, etc. are very important in business world. Similarly, prediction of population,
consumption of agricultural product, rainfall, revenue, etc. have great importance to the
government of any country for effective planning.
If two variables are correlated significantly, then it is possible to predict or estimate the values of
one variable from the other. This leads us to very important concept of regression analysis. In
fact, regression analysis is a statistical technique which is used to investigate the relationship
between variables. The effect of price increase on demand, the effect of change in the money
supply on the increase rate, effect of change in expenditure on advertisement on sales and
profit in business are such examples where investigators or researchers try to construct cause
and affect relationship. To handle these type of situations, investigators collect data on
variables of interest and apply regression method to estimate the quantitative effect of the
causal variables upon the variable that they influence.
Regression analysis describes how the independent variable(s) is (are) related to the dependent
variable i.e. regression analysis measures the average relationship between independent
variables and dependent variable. The literal meaning of regression is “stepping back towards
the average” which was used by British Biometrician Sir Francis Galton (1822-1911)
regarding the height of parents and their offspring’s.
Regression analysis is a mathematical measure of the average relationship between two or more
variables.
Types of variables in regression analysis
Independent variable The variable which is used for prediction is called
independent variable. It is also known as regressor or predictor or explanatory
variable.
Dependent variable The variable whose value is predicted by the independent
variable is called dependent variable. It is also known as regressed or explained
variable.
If scatter diagram shows some relationship between independent variable X and
dependent variable Y, then the scatter diagram will be more or less concentrated
round a curve, which may be called the curve of regression.
When the curve is a straight line, it is known as line of regression and the regression is
said to be linear regression.
If the relationship between dependent and independent variables is not a straight line
but curve of any other type then regression is known as nonlinear regression.
Regression can also be classified according to number of variables being used. If only
two variables are being used this is considered as simple regression whereas the
involvement of more than two variables in regression is categorized as multiple
regression.
Formula of Linear Regression
If regression line of y on x is and
If regression line of x on y is as follows
DISTINCTION BETWEEN CORRELATION AND
REGRESSION
Both correlation and regression have important role in relationship study but
there are some distinctions between them which can be described as follow:
(i) Correlation studies the linear relationship between two variables while
regression analysis is a mathematical measure of the average relationship
between two or more variables.
(ii) Correlation has limited application because it gives the strength of linear
relationship while the purpose of regression is to "predict" the value of the
dependent variable for the given values of one or more independent
variables.
(iii) Correlation makes no distinction between independent and dependent
variables while linear regression does it, i.e. correlation does not consider
the concept of dependent and independent variables while in regression
analysis one variable is considered as dependent variable and other(s) is/are
as independent variable(s).
CONCEPT OF HYPOTHESIS TESTING
In our day-to-day life, we see different commercials advertisements in television,
newspapers, magazines, etc. such as
(i) The refrigerator of certain brand saves up to 20% electric bill,
(ii)The motorcycle of certain brand gives 60 km/liter mileage,
(iii)A detergent of certain brand produces the cleanest wash,
(iv)Ninety nine out of hundred dentists recommend brand A toothpaste for their
patients to save the teeth against cavity, etc.
Now, the question may arise in our mind “can such types of claims be verified
statistically?” Fortunately, in many cases the answer is “yes”.
The technique of testing such type of claims or statements or assumptions is known as
testing of hypothesis. The truth or falsity of a claim or statement is never known
unless we examine the entire population. But practically it is not possible in mostly
situations so we take a random sample from the population under study and use the
information contained in this sample to take the decision whether a claim is true or
false.
CONCEPT OF HYPOTHESIS TESTING COUNTD
In our day-to-day life, we see different commercials advertisements in television, newspapers,
magazines, etc. and if someone may be interested to test such type of claims or statement then we
come across the problem of testing of hypothesis.
For example, (i) a customer of motorcycle wants to test whether the claim of motorcycle of certain
brand gives the average mileage 60 km/liter is true or false
(ii) the businessman of banana wants to test whether the average weight of a banana of Kerala is more
than 200 gm,
(iii) a doctor wants to test whether new medicine is really more effective for controlling blood pressure
than old medicine,
(iv) an economist wants to test whether the variability in incomes differ in two populations,
(v) a psychologist wants to test whether the proportion of literates between two groups of people is
same, etc.
In all the cases discussed above, the decision maker is interested in making inference about the
population parameter(s). However, he/she is not interested in estimating the value of parameter(s)
but he/she is interested in testing a claim or statement or assumption about the value of population
parameter(s). Such claim or statement is postulated in terms of hypothesis.
In statistics, a hypothesis is a statement or a claim or an assumption about the value of a population
parameter (e.g., mean, median, variance, proportion, etc.).
Similarly, in case of two or more populations a hypothesis is comparative statement or a claim or an
assumption about the values of population parameters. (e.g., means of two populations are equal,
variance of one population is greater than other, etc.). The plural of hypothesis is hypotheses.
GENERAL PROCEDURE OF TESTING A HYPOTHESIS
Testing of hypothesis is a huge demanded statistical tool by many discipline and professionals. It
is a step by step procedure as you will see in next three units through a large number of
examples. The aim of this section is just give you flavour of that sequence which involves
following steps:
Step I: First of all, we have to setup null hypothesis H0 and alternative hypothesis H1. Suppose,
we want to test the hypothetical / claimed / assumed value θ0 of parameter θ. So we can take
the null and alternative hypotheses as
Step II: After setting the null and alternative hypotheses, we establish a criteria for rejection or
non-rejection of null hypothesis, that is, decide the level of significance (a), at which we want
to test our hypothesis. Generally, it is taken as 5% or 1% (α = 0.05 or 0.01).
Case I: If the alternative hypothesis is right-sided such as H1: θ > θ0 or H1: θ1 > θ2 then
the entire critical or rejection region of size α lies on right tail of the probability curve of
sampling distribution of the test statistic as shown
Case II: If the alternative hypothesis is left-sided such as H1: θ < θ0 or H1: θ1 < θ2 then the entire critical or rejection
region of size α lies on left tail of the probability curve of sampling distribution of the test statistic as shown
Case III: If the alternative hypothesis is two sided such as H1: θ ≠ θ0 or H1: θ1 ≠ θ2 then critical or rejection regions
of size α/2 lies on both tails of the probability curve of sampling distribution of the test statistic as shown
GENERAL PROCEDURE OF TESTING A HYPOTHESIS(3)
Step III: The third step is to choose an appropriate test statistic under H0 for testing the null
hypothesis as given below:
After that, specify the sampling distribution of the test statistic preferably in the standard form like Z
(standard normal), Chi square, t, F or any other well-known in literature.
Step IV: Calculate the value of the test statistic described in Step III on the basis of observed sample
observations.
Step V: Obtain the critical (or cut-off) value(s) in the sampling distribution of the test statistic and construct
rejection (critical) region of size alpha. Generally, critical values for various levels of significance are
putted in the form of a table for various standard sampling distributions of test statistic such as Z-table,
chi square2-table, t-table, etc.
Step VI: After that, compare the calculated value of test statistic obtained from Step IV, with the critical
value(s) obtained in Step V and locates the position of the calculated test statistic, that is, it lies in
rejection region or non-rejection region
Step VII: In testing of hypothesis ultimately we have to reach at a conclusion. It is done as explained below:
(i) If calculated value of test statistic lies in rejection region at  level of significance then we reject null
hypothesis. It means that the sample data provide us sufficient evidence against the null hypothesis and
there is a significant difference between hypothesized value and observed value of the parameter.
(ii) If calculated value of test statistic lies in non-rejection region at  level of significance then we do not
reject null hypothesis. Its means that the sample data fails to provide us sufficient evidence against the
null hypothesis and the difference between hypothesized value and observed value of the parameter due
to fluctuation of sample.
TYPE-I AND TYPE-II ERRORS
Type I Errors
• A Type I error occurs when the sample data appear to show a treatment effect
when, in fact, there is none.
• In this case the researcher will reject the null hypothesis and falsely conclude that
the treatment has an effect.
• Type I errors are caused by unusual, unrepresentative samples. Just by chance the
researcher selects an extreme sample with the result that the sample falls in the
critical region even though the treatment has no effect.
• The hypothesis test is structured so that Type I errors are very unlikely; specifically,
the probability of a Type I error is equal to the alpha level.
Type II Errors
• A Type II error occurs when the sample does not appear to have been affected by
the treatment when, in fact, the treatment does have an effect.
• In this case, the researcher will fail to reject the null hypothesis and falsely conclude
that the treatment does not have an effect.
• Type II errors are commonly the result of a very small treatment effect. Although
the treatment does have an effect, it is not large enough to show up in the research
study.
Business statistics for NET-JRF SLET according to updated syllabus
Difference between Statistic and Parameter
Statistic
 Statistic is a measure which
describes a fraction of population
 Numerical value Variable and
Known
 Statistical Notation
s = Sample Standard Deviation
x = Data Elements
n = Size of sample
r = Correlation coefficient
Parameter
 Parameter refers to a measure
which describes population.
 Numerical value Fixed and
Unknown
 Statistical Notation
μ = Population Mean
σ = Population Standard Deviation
P = Population Proportion
X = Data Elements
N = Size of Population
ρ = Correlation coefficient
Parametric Stastical test
Parametric statistic is a branch of statistic, which assumes that sample data comes from
a population that follows a probability or normal distribution. When the assumption
are correct, parametric methods will produce more accurate and precise estimates.
Assumptions
 The scores must be independent (In other words the selection of any particular score
must not be bias the chance of any other case for inclusion).
 The observations must be drawn from normally distributed populations(Follow ND)
 The selected population is representative of general population
 The data is in Interval or Ratio scale
 The populations(If comparing two or more groups) must have the same variances
Types of Parametric test
1. Z- test.
2. T-test.
3. ANOVA.
4. F-test.
5. Chi-Square test.
Z-test
A Z-test is given by Fisher. A Z-test is a type of hypothesis test or
statistical test.
It is used for testing the mean of a population versus a standard or
comparing the means of two population with large sample (n>30).
When we can run a Z-test
 Your sample size is greater than 30.
 Data point should be independent from each other.
 Your data should be randomly selected from a population, where each
item has an equal chance of being selected.
 Data should follow normal distribution.
 The standard deviation of the populations is known.
There are two ways to calculate z-test
a. one-sample z-test.
b. two-sample z-test.
One-sample z-test
One-sample z-test we are comparing the mean, calculated on a single of
score (one sample) with known standard deviation.
Ex. The manager of a candy manufacture wants to know whether
the mean weight of batch of candy boxes is equal to the target value
of 10 pounds from historical data.
Two-sample z-test
When testing for the differences between two groups can imagine two separate
situation. Comparing the proportion of two population. In two sample z-test both
independent populations.
Ex: 1. Comparing the average engineering salaries of men versus women.
2. Comparing the fraction defectives from two production line.
T-test
It is derived by W.S Gosset in 1908. It is also called student t-test. A t-
test statistical significance indicates whether or not the difference
between two groups.
Assumption:
 Samples must be random and independent.
 When samples are small. n<30
 Standard deviation is not known.
 Population is Normal distributed.
There are two ways to calculate T-test such as,
a. Unpaired t-test.(independent)
b. Paired t-test.
Unpaired t-test:
If there is no link between the data then use the unpaired t-test. When two separate
set of independent sample are obtain one from each of the two population being
compared.
Ex:1. Compare the height of girls and boys.
2. Compare the 2 stress reduction intervention.
When one group practiced mindfulness meditation, while other learned
yoga.
Paired t-test consists of a sample of matched pairs of similar units or one group of units
that has been tested twice (a” repeated measures” t-test). If there is some link
between the data then use the paired t-test.(e.g. Before and after)
Ex: 1. where subject are tested prior to a treatment say for high blood pressure, and the
same subject are tested again after treatment with a blood pressure lowering
medication.
2. Test on person or any group before and after training.
Paired t-test.
ANOVA (Analysis of Variance)
It is developed by Fisher in 1920. ANOVA is a collection of statistical model used to analyze the
differences between groups. Compare multiple groups at one time. It is advanced technique
for the experimental treatment of testing differences all of the mean which is not possible in
case of t-test.
Assumptions:
 All population have same standard deviation.
 Individuals in population are selected randomly.
 Independent samples.
 The population must be normal distribution.
There are two ways to calculate ANOVA such as.
One-way ANOVA: One-way anova compare three or more unmatched groups when data are
categorized in one way.
Ex: You might be studying the effect of tea on weight loss, from three groups, green tea, black
tea, no tea.
Two-way ANOVA
Two way anova technique is used when the data are classified on the basis of two
factors. And two way anova analyzed a 2 independent variable and 1 dependent
variable.
Ex: The agricultural output may be classified on the basis of different verities
of seeds. and also on the basis of different verities of fertilizer used.
Chi-Square test
It is a test that measures how expectations compare to
actual observed data. It is used to investigate
whether distribution of categorical variables differ
from one another
Formula Chi Square= Summation(Oi-Ei)2/Ei
It is drawn by Karl Pearson. Chi square test is a
statistical test used as a parametric for testing for
comparing variance .
It is denoted as “ x²”
Formula:
Non-parametric statistics test
Non-parametric statistics is the branch of statistics. It refers to a statistical method in
which the data is not required to fit a normal distribution. Nonparametric statistics uses data that
is often ordinal, meaning it does not rely on numbers, but rather a ranking or order of sorts.
For example: a survey conveying consumer preferences ranging from like to dislike would be
considered ordinal data.
Nonparametric statistics does not assume that data is drawn from a normal distribution.
Instead, the shape of the distribution is estimated under this form of statistical measurements like
descriptive statistics, statistical test, inference statistics and models. There is no assumption of
sample size because it’s observed data is quantitative.
This type of statistics can be used without the mean, sample size, standard deviation or
estimation of any other parameters.
The non-parametric test are called as “distribution-free” test since they make no assumptions
regarding the population distribution. It is test may be applied ranking test. They are easier to
explain and easier to understand but one should not forget the fact that they usually less
efficient/powerful as they are based on no assumptions. Non-parametric test is always valid, but
not always efficient.
Types of Non-parametric statistics test
Rank sum test
Chi-square test
Spearman’s rank correlation
Rank sum test
Rank sum tests are
U test (Wilcoxon-Mann-Whitney test)
H test (Kruskal-Wallis test)
U test: It is a non-parametric test. This test is
determine whether two independent samples have
been drawn from the same population. The data that
can be ranked i.e., order from lowest to highest
(ordinal data).
U test
For example
The values of one sample 53,
38, 69, 57, 46
The values of another sample
44, 40, 61, 53, 32
We assign the ranks to all
observations, adopting low to
high ranking process and
given items belong to a single
sample.
Size of sample in ascending
order
Rank
32 1
38 2
40 3
44 4
46 5
53 6.5
53 6.5
57 8
61 9
69 10
Kruskal-Wallis H test
H test: The Kruskal-Wallis H test (also called as the “one-
Way ANOVA on ranks”) is a rank-based non parametric
test that can be used to determine if there are statistically
significant difference between two or more groups of an
independent variable on a continuous or ordinal dependent
variable.
For example: H test to understand whether exam performance,
measured on a continuous scale from 0-100, differed based
on test anxiety levels(i.e., dependent variable would be
“exam performance” and independent variable would be
“test axiety level”, which has three independent groups:
students with “low”, “medium” and “high” test anxiety
levels).
Chi square test
The chi-square test is a non-parametric test. It is used mainly when
dealing with a nominal variable. The chi-square test is mainly 2 methods.
Goodness of fit: Goodness of fit refers to whether a significant
difference exists between an observed number and an expected number
of responses, people or other objects.
For example: suppose that we flip a coin 20 times and record the
frequency of occurrence of heads and tails. Then we should expect 10
heads and 10 tails.
Let us suppose our coin-flipping experiment yielded 12 heads and 8 tails.
Our expected frequencies (10-10) and our observed frequencies (12-8).
Independence: the independence of test is difference between the
frequencies of occurrence in two or more categories with two or more
groups.
Spearman’s rank correlation test-In this method a measure of association
that is based on the ranks of the observations and not on the numerical values of the
data. It was developed by famous Charles spearman in the early 1990s and such it is
also known as spearman’s rank correlation co-efficient.
English (marks) Maths (marks) Rank (English) Rank (maths) Difference of
ranks
56 66 9 4 5
75 70 3 2 1
45 40 10 10 0
71 60 4 7 3
62 65 6 5 1
64 56 5 9 16
58 59 8 8 0
80 77 1 1 0
76 67 2 3 1
61 63 7 6 1
PROBABILITY
In our daily lives, we face many situations when we are unable to forecast the future with complete certainty. That is,
in many decisions, the uncertainty is faced. Need to cope up with the uncertainty leads to the study and use of the
probability theory. The first attempt to give quantitative measure of probability was made by Galileo (1564-1642),
an
Italian mathematician, when he was answering the following question on the request of his patron, the Grand Duke
of
Tuscany, who wanted to improve his performance at the gambling tables: “With three dice a total of 9 and 10 can
each be produced by six different combinations, and yet experience shows that the number 10 is oftener thrown than
the number 9?” To the mind of his patron the cases were (1, 2, 6), (1, 3, 5), (1, 4, 4), (2, 2, 5), (2, 3, 4), (3,3, 3) for 9
and (1, 3, 6), (1, 4, 5), (2, 2, 6), (2, 3, 5), (2, 4, 4), (3, 3, 4) for 10 and hence he was thinking that why they do not
occur equally frequently i.e. why there chances are not the same? Galileo makes a careful analysis of all the cases
which can occur, and he showed that out of the 216 possible cases 27 are favourable to the appearance of the number
10 since permutations of (1, 3, 6) are (1, 3, 6), (1, 6, 3), (3, 1, 6), (3, 6, 1), (6, 1, 3), (6, 3, 1) i.e. number of
permutations of (1, 3, 6) is 6; similarly, the number of permutations of (1, 4, 5), (2, 2, 6), (2, 3, 5), (2, 4, 4), (3, 3, 4)
is 6, 3, 6, 3, 3 respectively and hence the total number of cases come out to be 6 + 6 + 3 + 6 + 3 + 3 = 27 whereas the
number of favourable cases for getting a total of 9 on three dice are 6 + 6 + 3 + 3 + 6 + 1 = 25. Hence, this was the
reason for10 appearing oftener thrown than 9. But the first foundation was laid by the two mathematicians Pascal
(1623-62) and Fermat (1601-65) due to a gambler's dispute in 1654 which led to the creation of a mathematical
theory of probability by them. Later, important contributions were made by various researchers including Huyghens
(1629 - 1695), Jacob Bernoulli (1654-1705), Laplace (1749-1827), Abraham De Moivre (1667-1754), and Markov
(1856-1922). Thomas Bayes (died in 1761, at the age of 59) gave an important technical result known as Bayes’
theorem, published after his death in 1763, using which probabilities can be revised on the basis of some new
information. Thereafter, the probability, an important branch of Statistics, is being used worldwide.
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Business statistics for NET-JRF SLET according to updated syllabus
Probability Distribution
There are two types of Probability Distribution;
1)Discrete Probability Distribution- the set of all
possible values is at most a finite or a countable
infinite number of possible values
 Poisson Distribution
Binomial Distribution
1)Continuous Probability Distribution- takes on
values at every point over a given interval
Normal (Gaussian) Distribution
Normal (Gaussian) Distribution
• The normal distribution is a descriptive model
that describes real world situations.
• It is defined as a continuous frequency distribution of infinite range (can take any
values not just integers as in the case of binomial and Poisson distribution).
• This is the most important probability distribution in statistics and important tool
in analysis of epidemiological data and management science.
Characteristics of Normal Distribution
• It links frequency distribution to probability distribution
• Has a Bell Shape Curve and is Symmetric
• It is Symmetric around the mean:
Two halves of the curve are the same (mirror images)
• Hence Mean = Median
• The total area under the curve is 1 (or 100%)
• Normal Distribution has the same shape as Standard Normal Distribution.
• In a Standard Normal Distribution:
The mean (μ ) = 0 and
Standard deviation (σ) =1
Normal (Gaussian) Distribution(2)
Z Score (Standard Score)
• Z = X - μ
• Z indicates how many standard deviations away
from the mean the point x lies.
• Z score is calculated to 2 decimal places.
Tables
Areas under the standard normal curve
13.6%
2.2%
0.15%
-3 -2 -1 μ 1 2 3
Diagram of Normal Distribution Curve
(z distribution)
33.35%
Normal (Gaussian) Distribution(4)
Distinguishing Features
• The mean ± 1 standard deviation covers 66.7% of the area under the
curve
• The mean ± 2 standard deviation covers 95% of the area under the
curve
• The mean ± 3 standard deviation covers 99.7% of the area under the
curve
Application/Uses of Normal Distribution
• It’s application goes beyond describing distributions
• It is used by researchers and modelers.
• The major use of normal distribution is the role it plays in
statistical inference.
• The z score along with the t –score, chi-square and F-statistics is
important in hypothesis testing.
• It helps managers/management make decisions.
Binomial Distribution
A widely known discrete distribution constructed by determining the probabilities of X
successes in n trials.
Assumptions of the Binomial Distribution
• The experiment involves n identical trials
• Each trial has only two possible outcomes: success and failure
• Each trial is independent of the previous trials
• The terms p and q remain constant throughout the experiment
• p is the probability of a success on any one trial
• q = (1-p) is the probability of a failure on any one trial
• In the n trials X is the number of successes possible where X is a whole number
between 0 and n.
• Applications
• Sampling with replacement
• Sampling without replacement causes p to change but if the sample size n < 5%
N, the independence assumption is not a great concern.
Binomial Distribution Formula
• Probability
function
• Mean value
• Variance
and
standard
deviation
( )
P X
n
X n X
X n
X n X
p q( )
!
! !
=
−
⋅ ≤ ≤
−
for 0
µ = ⋅n p
2
2
σ
σ σ
= ⋅ ⋅
= = ⋅ ⋅
n p q
n p q
Poisson Distribution
French mathematician Siméon Denis Poisson proposed Poisson
DistributionThe Poisson distribution is popular for modelling
the number of times an event occurs in an interval of time or space. It
is a discrete probability distribution that expresses the probability of a
given number of events occurring in a fixed interval of time or space
if these events occur with a known constant rate and independently of
the time since the last event.
The Poisson distribution may be useful to model events such as
• The number of meteorites greater than 1 meter diameter that strike
Earth in a year
• The number of patients arriving in an emergency room between 10
and 11 pm
• The number of photons hitting a detector in a particular time interval
• The number of mistakes committed per pages
Poisson Distribution
Assumptions of the Poisson Distribution
• Describes discrete occurrences over a continuum or
interval
• A discrete distribution
• Describes rare events
• Each occurrence is independent any other
occurrences.
• The number of occurrences in each interval can vary
from zero to infinity.
• The expected number of occurrences must hold
constant throughout the experiment.
© 2002 Thomson /
South-Western
Slide 5-113
Poisson Distribution Formula
• Probability function
P X
X
X
where
long run average
e
X
e( )
!
, , , ,...
:
. ...
= =
= −
=
−
λ
λ
λ
for
(the base of natural logarithms)
0 1 2 3
2 718282
λ
Mean valueMean value
λ
Standard deviationStandard deviationVarianceVariance
λ
Business statistics for NET-JRF SLET according to updated syllabus

More Related Content

What's hot

Ppt on accounting standards
Ppt on accounting standardsPpt on accounting standards
Ppt on accounting standardsKunal Kapadia
 
1 introduction to financial accounting
1 introduction to financial accounting1 introduction to financial accounting
1 introduction to financial accountingItisha Sharma
 
Minimum Alternate Tax
Minimum Alternate Tax Minimum Alternate Tax
Minimum Alternate Tax Sundar B N
 
Role of sebi in market management and corporate governance
Role of sebi in market management and corporate governanceRole of sebi in market management and corporate governance
Role of sebi in market management and corporate governanceAltacit Global
 
Life insurance Policy Conditions by Dr. Amitabh Mishra
Life insurance Policy Conditions by Dr. Amitabh MishraLife insurance Policy Conditions by Dr. Amitabh Mishra
Life insurance Policy Conditions by Dr. Amitabh MishraDr. Amitabh Mishra
 
SCHEDULES OF CHANGES IN WORKING CAPITAL IN PREPARING FUND FLOW STATEMENT
SCHEDULES OF CHANGES IN WORKING CAPITAL IN PREPARING FUND FLOW STATEMENTSCHEDULES OF CHANGES IN WORKING CAPITAL IN PREPARING FUND FLOW STATEMENT
SCHEDULES OF CHANGES IN WORKING CAPITAL IN PREPARING FUND FLOW STATEMENTSundar B N
 
Income from business or profession
Income from business or professionIncome from business or profession
Income from business or professionParminder Kaur
 
Buy back of shares - introduction and accounting
Buy back of shares - introduction and accountingBuy back of shares - introduction and accounting
Buy back of shares - introduction and accountingMahesh Chandra Sharma
 
Forensic accounting ppt (2)
Forensic accounting ppt (2)Forensic accounting ppt (2)
Forensic accounting ppt (2)Shriya Gupta
 
Measurement and scaling techniques
Measurement  and  scaling  techniquesMeasurement  and  scaling  techniques
Measurement and scaling techniquesUjjwal 'Shanu'
 
Reconciliation of cost and financial accounts
Reconciliation of cost and financial accountsReconciliation of cost and financial accounts
Reconciliation of cost and financial accountsMahesh Chandra Sharma
 
Management accounting
Management accountingManagement accounting
Management accountingRaj vardhan
 
Receivable management presentation1
Receivable management presentation1Receivable management presentation1
Receivable management presentation1shruthi nair
 
Indian Accounting Standards Introduction and Relevance
Indian Accounting Standards Introduction and Relevance  Indian Accounting Standards Introduction and Relevance
Indian Accounting Standards Introduction and Relevance Saurabh Hanumant Jadhav
 

What's hot (20)

Auditing
AuditingAuditing
Auditing
 
Ppt on accounting standards
Ppt on accounting standardsPpt on accounting standards
Ppt on accounting standards
 
Amalgamation ppt
Amalgamation pptAmalgamation ppt
Amalgamation ppt
 
1 introduction to financial accounting
1 introduction to financial accounting1 introduction to financial accounting
1 introduction to financial accounting
 
Minimum Alternate Tax
Minimum Alternate Tax Minimum Alternate Tax
Minimum Alternate Tax
 
Role of sebi in market management and corporate governance
Role of sebi in market management and corporate governanceRole of sebi in market management and corporate governance
Role of sebi in market management and corporate governance
 
Life insurance Policy Conditions by Dr. Amitabh Mishra
Life insurance Policy Conditions by Dr. Amitabh MishraLife insurance Policy Conditions by Dr. Amitabh Mishra
Life insurance Policy Conditions by Dr. Amitabh Mishra
 
SCHEDULES OF CHANGES IN WORKING CAPITAL IN PREPARING FUND FLOW STATEMENT
SCHEDULES OF CHANGES IN WORKING CAPITAL IN PREPARING FUND FLOW STATEMENTSCHEDULES OF CHANGES IN WORKING CAPITAL IN PREPARING FUND FLOW STATEMENT
SCHEDULES OF CHANGES IN WORKING CAPITAL IN PREPARING FUND FLOW STATEMENT
 
Tax planning concepts
Tax planning conceptsTax planning concepts
Tax planning concepts
 
Income from business or profession
Income from business or professionIncome from business or profession
Income from business or profession
 
Buy back of shares - introduction and accounting
Buy back of shares - introduction and accountingBuy back of shares - introduction and accounting
Buy back of shares - introduction and accounting
 
Direct tax code
Direct tax codeDirect tax code
Direct tax code
 
Forensic accounting ppt (2)
Forensic accounting ppt (2)Forensic accounting ppt (2)
Forensic accounting ppt (2)
 
Tax planning
Tax planningTax planning
Tax planning
 
Measurement and scaling techniques
Measurement  and  scaling  techniquesMeasurement  and  scaling  techniques
Measurement and scaling techniques
 
Rights and obligations of banker
Rights and obligations of bankerRights and obligations of banker
Rights and obligations of banker
 
Reconciliation of cost and financial accounts
Reconciliation of cost and financial accountsReconciliation of cost and financial accounts
Reconciliation of cost and financial accounts
 
Management accounting
Management accountingManagement accounting
Management accounting
 
Receivable management presentation1
Receivable management presentation1Receivable management presentation1
Receivable management presentation1
 
Indian Accounting Standards Introduction and Relevance
Indian Accounting Standards Introduction and Relevance  Indian Accounting Standards Introduction and Relevance
Indian Accounting Standards Introduction and Relevance
 

Similar to Business statistics for NET-JRF SLET according to updated syllabus

Introduction to Elementary statistics
Introduction to Elementary statisticsIntroduction to Elementary statistics
Introduction to Elementary statisticskrizza joy dela cruz
 
Advance Statistics- An introduction
Advance  Statistics- An introductionAdvance  Statistics- An introduction
Advance Statistics- An introductionJohn Michael Gian
 
Indroduction to business statistics
Indroduction to business statisticsIndroduction to business statistics
Indroduction to business statisticsaishwaryarangarajan6
 
Introduction to biostatistic
Introduction to biostatisticIntroduction to biostatistic
Introduction to biostatisticJoshua Anish
 
BBA 2ND SEM STATISTIC.pdf
BBA 2ND SEM STATISTIC.pdfBBA 2ND SEM STATISTIC.pdf
BBA 2ND SEM STATISTIC.pdfRam Krishna
 
Business Statistics.pdf
Business Statistics.pdfBusiness Statistics.pdf
Business Statistics.pdfssuser25bd39
 
History of statistics #1
History of statistics #1History of statistics #1
History of statistics #1Sundar B N
 
What is Statistics
What is StatisticsWhat is Statistics
What is Statisticssidra-098
 
BUSINESS STATISTICS
BUSINESS STATISTICSBUSINESS STATISTICS
BUSINESS STATISTICSSelva529382
 
Types of data 3
Types of data 3Types of data 3
Types of data 3Sundar B N
 
Basics of Research Types of Data Classification
Basics of Research Types of Data ClassificationBasics of Research Types of Data Classification
Basics of Research Types of Data ClassificationHarshit Pandey
 
Statistics for Managers notes.pdf
Statistics for Managers notes.pdfStatistics for Managers notes.pdf
Statistics for Managers notes.pdfVelujv
 
Unit 001Stats (1).pdf
Unit 001Stats (1).pdfUnit 001Stats (1).pdf
Unit 001Stats (1).pdfSubratSingh23
 
What is the Major Power Linking Statistics & Data Mining? November 2013
What is the Major Power Linking Statistics & Data Mining? November 2013 What is the Major Power Linking Statistics & Data Mining? November 2013
What is the Major Power Linking Statistics & Data Mining? November 2013 Soaad Abd El-Badie
 
Chapter#01 Introduction.ppt
Chapter#01 Introduction.pptChapter#01 Introduction.ppt
Chapter#01 Introduction.pptMuntazirMehdi43
 

Similar to Business statistics for NET-JRF SLET according to updated syllabus (20)

Introduction to Elementary statistics
Introduction to Elementary statisticsIntroduction to Elementary statistics
Introduction to Elementary statistics
 
Advance Statistics- An introduction
Advance  Statistics- An introductionAdvance  Statistics- An introduction
Advance Statistics- An introduction
 
History of Statistics
History of StatisticsHistory of Statistics
History of Statistics
 
Indroduction to business statistics
Indroduction to business statisticsIndroduction to business statistics
Indroduction to business statistics
 
Introduction to biostatistic
Introduction to biostatisticIntroduction to biostatistic
Introduction to biostatistic
 
Status of Statistics
Status of StatisticsStatus of Statistics
Status of Statistics
 
BBA 2ND SEM STATISTIC.pdf
BBA 2ND SEM STATISTIC.pdfBBA 2ND SEM STATISTIC.pdf
BBA 2ND SEM STATISTIC.pdf
 
Business Statistics.pdf
Business Statistics.pdfBusiness Statistics.pdf
Business Statistics.pdf
 
Business Statistics.pdf
Business Statistics.pdfBusiness Statistics.pdf
Business Statistics.pdf
 
Overview of statistics
Overview of statisticsOverview of statistics
Overview of statistics
 
History of statistics #1
History of statistics #1History of statistics #1
History of statistics #1
 
origin.pptx
origin.pptxorigin.pptx
origin.pptx
 
What is Statistics
What is StatisticsWhat is Statistics
What is Statistics
 
BUSINESS STATISTICS
BUSINESS STATISTICSBUSINESS STATISTICS
BUSINESS STATISTICS
 
Types of data 3
Types of data 3Types of data 3
Types of data 3
 
Basics of Research Types of Data Classification
Basics of Research Types of Data ClassificationBasics of Research Types of Data Classification
Basics of Research Types of Data Classification
 
Statistics for Managers notes.pdf
Statistics for Managers notes.pdfStatistics for Managers notes.pdf
Statistics for Managers notes.pdf
 
Unit 001Stats (1).pdf
Unit 001Stats (1).pdfUnit 001Stats (1).pdf
Unit 001Stats (1).pdf
 
What is the Major Power Linking Statistics & Data Mining? November 2013
What is the Major Power Linking Statistics & Data Mining? November 2013 What is the Major Power Linking Statistics & Data Mining? November 2013
What is the Major Power Linking Statistics & Data Mining? November 2013
 
Chapter#01 Introduction.ppt
Chapter#01 Introduction.pptChapter#01 Introduction.ppt
Chapter#01 Introduction.ppt
 

More from Sundar B N

Capital structure theories - NI Approach, NOI approach & MM Approach
Capital structure theories - NI Approach, NOI approach & MM ApproachCapital structure theories - NI Approach, NOI approach & MM Approach
Capital structure theories - NI Approach, NOI approach & MM ApproachSundar B N
 
Sample and Population in Research - Meaning, Examples and Types
Sample and Population in Research - Meaning, Examples and TypesSample and Population in Research - Meaning, Examples and Types
Sample and Population in Research - Meaning, Examples and TypesSundar B N
 
Application of Univariate, Bivariate and Multivariate Variables in Business R...
Application of Univariate, Bivariate and Multivariate Variables in Business R...Application of Univariate, Bivariate and Multivariate Variables in Business R...
Application of Univariate, Bivariate and Multivariate Variables in Business R...Sundar B N
 
INDIAN FINANCIAL SYSTEM CODE
INDIAN FINANCIAL SYSTEM CODEINDIAN FINANCIAL SYSTEM CODE
INDIAN FINANCIAL SYSTEM CODESundar B N
 
NATIONAL ELECTRONIC FUND TRANSFER
NATIONAL ELECTRONIC FUND TRANSFER NATIONAL ELECTRONIC FUND TRANSFER
NATIONAL ELECTRONIC FUND TRANSFER Sundar B N
 
PRIVILEGE BANKING
PRIVILEGE BANKING PRIVILEGE BANKING
PRIVILEGE BANKING Sundar B N
 
ISLAMIC BANKING
ISLAMIC BANKING ISLAMIC BANKING
ISLAMIC BANKING Sundar B N
 
FOLLOW ON PUBLIC OFFER
FOLLOW ON PUBLIC OFFERFOLLOW ON PUBLIC OFFER
FOLLOW ON PUBLIC OFFERSundar B N
 
CROWD FUNDING
CROWD FUNDING CROWD FUNDING
CROWD FUNDING Sundar B N
 
VIDEO MARKETING
VIDEO MARKETING VIDEO MARKETING
VIDEO MARKETING Sundar B N
 
INTEGRATION OF FINANCIAL MARKET
INTEGRATION OF FINANCIAL MARKETINTEGRATION OF FINANCIAL MARKET
INTEGRATION OF FINANCIAL MARKETSundar B N
 
STARTUPS IN INDIA
STARTUPS IN INDIA STARTUPS IN INDIA
STARTUPS IN INDIA Sundar B N
 
National pension scheme
National pension scheme National pension scheme
National pension scheme Sundar B N
 

More from Sundar B N (20)

Capital structure theories - NI Approach, NOI approach & MM Approach
Capital structure theories - NI Approach, NOI approach & MM ApproachCapital structure theories - NI Approach, NOI approach & MM Approach
Capital structure theories - NI Approach, NOI approach & MM Approach
 
Sample and Population in Research - Meaning, Examples and Types
Sample and Population in Research - Meaning, Examples and TypesSample and Population in Research - Meaning, Examples and Types
Sample and Population in Research - Meaning, Examples and Types
 
Application of Univariate, Bivariate and Multivariate Variables in Business R...
Application of Univariate, Bivariate and Multivariate Variables in Business R...Application of Univariate, Bivariate and Multivariate Variables in Business R...
Application of Univariate, Bivariate and Multivariate Variables in Business R...
 
INDIAN FINANCIAL SYSTEM CODE
INDIAN FINANCIAL SYSTEM CODEINDIAN FINANCIAL SYSTEM CODE
INDIAN FINANCIAL SYSTEM CODE
 
NATIONAL ELECTRONIC FUND TRANSFER
NATIONAL ELECTRONIC FUND TRANSFER NATIONAL ELECTRONIC FUND TRANSFER
NATIONAL ELECTRONIC FUND TRANSFER
 
PRIVILEGE BANKING
PRIVILEGE BANKING PRIVILEGE BANKING
PRIVILEGE BANKING
 
ISLAMIC BANKING
ISLAMIC BANKING ISLAMIC BANKING
ISLAMIC BANKING
 
FOLLOW ON PUBLIC OFFER
FOLLOW ON PUBLIC OFFERFOLLOW ON PUBLIC OFFER
FOLLOW ON PUBLIC OFFER
 
TRADE MARKS
TRADE MARKS TRADE MARKS
TRADE MARKS
 
NET BANKING
NET BANKING NET BANKING
NET BANKING
 
CROWD FUNDING
CROWD FUNDING CROWD FUNDING
CROWD FUNDING
 
INFLATION
INFLATION INFLATION
INFLATION
 
VIDEO MARKETING
VIDEO MARKETING VIDEO MARKETING
VIDEO MARKETING
 
INTEGRATION OF FINANCIAL MARKET
INTEGRATION OF FINANCIAL MARKETINTEGRATION OF FINANCIAL MARKET
INTEGRATION OF FINANCIAL MARKET
 
STARTUPS IN INDIA
STARTUPS IN INDIA STARTUPS IN INDIA
STARTUPS IN INDIA
 
ATM
ATMATM
ATM
 
NABARD
NABARDNABARD
NABARD
 
UPI
UPIUPI
UPI
 
National pension scheme
National pension scheme National pension scheme
National pension scheme
 
Green banking
Green bankingGreen banking
Green banking
 

Recently uploaded

How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphNetziValdelomar1
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxEduSkills OECD
 

Recently uploaded (20)

How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a Paragraph
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
 

Business statistics for NET-JRF SLET according to updated syllabus

  • 1. “LAKSHYA: NET-JRF & KSET” STUDENT’S ACADEMIC DEVELOPMENT PROGRAMME AT SRI H. D. DEVEGOWDA G.F.G.COLLEGE, PADUVALAHIPPE
  • 2. Business Statistics in NET-JRF/KSET Presented by Sundar B. N. Assistant Professor I am the wisest man alive, for I know one thing, and that is that I know nothing. -Plato, The Republic
  • 3. Business Statistics and Research Methods  Measures of central tendency  Measures of dispersion  Measures of Skewness  Correlation and regression of two variables  Probability: Approaches to probability; Bayes’ theorem  Probability distributions: Binomial, Poisson and normal distributions  Research: Concept and types; Research designs  Data: Collection and classification of data  Sampling and estimation: Concepts; Methods of sampling - probability and nonprobability methods; Sampling distribution; Central limit theorem; Standard error; Statistical estimation  Hypothesis testing: z-test; t-test; ANOVA; Chi–square test; Mann-Whitney test (Utest); Kruskal-Wallis test (H-test); Rank correlation test  Report writing
  • 4. STATISTICS Measures of central tendency Measures of dispersion Measures of Skewness Correlation and regression of two variables Probability: Approaches to probability; Bayes’ theorem Probability distributions: Binomial, Poisson and normal distributions Hypothesis testing: z-test; t-test; ANOVA; Chi–square test; Mann-Whitney test (Utest); Kruskal-Wallis test (H-test); Rank correlation test Report writing
  • 5. Meaning of Research Research is an investigative process of finding reliable solution to a problem through a systematic selection, collection, analysis and interpretation of data relating to problem.
  • 7. Analysis of Data Data analysis is the process of systematically applying statistical and logical techniques to describe ad illustrate, condense and recap and evaluate data Technically speaking, processing implies editing, coding, classification and tabulation of collected data so that they are amenable to analysis.
  • 8. Statistics The word “Statistics’ seems to have been derived from Latin word ‘Status’ or Italian word ‘Statista’ or German word ‘Statistik’ But according to the observations of great John Graunt (1620-1674), the word ‘Statistics’ is of Italian origin and it is derived from the word ‘Stato’ and statista means a person who deals with affairs of the state. That is, initially kings or monarchs or governments used it to collect the information related to the population, agricultural land, wealth, etc. of the state. Their aim behind it was just to get an idea about the men power of the state, force needed for the purpose of a war and necessary taxes to be impose to meet the financial need of the state. So, it indicates that initially it was used by kings or monarchs or governments for administrative requirements of the state. That is why its origin lies in the State craft(The art of managing state affairs). On the basis of evidences form papyrus manuscripts and ancient monuments in pharaonic temples, it is assumed that first census in the world was carried out in Egypt in 3050 BC. Yet, China’s census data around 2000 BC is considered as the oldest surviving census data in the world.
  • 9. Statistics in India In the 3rd century BC when “Arthashastra” came into existence written by one of the greatest geniuses of political administration, Kautilya. In it, he had described the details related to conduct of population, agriculture and economic census. An efficient system of collecting official and administrative statistics was in use during the reign of Chandra Gupta Maurya ( 324-300 BC) under the guidance of Kautilya. Many things like taxation policy of the state, governance and administration, public finance, duties of a king, etc. had also been discussed in this celebrated Arthashastra. Another evidence that statistics was in use during Emperor Akbar’s empire (1556-1605) is in the form of “Ain-I-Akbari” written by Abul Fazl, one of the nine jems of Akbar. Raja Todar Mal, Akbar’s finance minister and another one of the nine jems of Akbar, used to keep very good records of land and revenue and he developed a very systematic revenue collections system in the kingdom of Akbar by using his expertise and the recorded data. Revenue collection system developed by Raja Todar Mal was so systematic that it became a model for future Mughals and later on for British. British Government, after transfer of the power from East India Company to it, started a publication entitled ‘Statistical Abstract of British India’ as a regular annual feature in 1868 in which all the useful statistical information related to local administrations to all the British Provinces was provided. In between some census reports were coming on based on a particular area, but not at the national level. The first attempt to get detailed information on the whole population of India was made between 1867 and 1872. First decennial census was undertaken on 17th February 1881 by W.W. Plowden, first census commissioner of India. After that a census has been carried out over a period of 10 years in India. 2011 census was the 15th census in India. Credit of establishing Statistics as a discipline in India goes to Prasanta Chandra Mahalanobis (P.C. Mahalanobis). He was a professor of physics in the Presidency College in Kolkata. During his study at Cambridge he got a chance to go through the work of Karl Pearson and R. A. Fisher. Continuing his interest in Statistics, he established a Statistical laboratory in the Presidency College Kolkata. On 17 December 1931, this statistical laboratory was given the nameIndian Statistical Institute (ISI). First post graduate course in Statistics was started by Kolkata University in 1941, while first under graduate course in Statistics was started by the Presidency College Kolkata.
  • 10. DEFINITION OF STATISTICS “Statistics is the science of counting”. – A.L. Bowly. “Statistics is the science of average.” – A.L. Bowly. Statistics is “The science of the measurement of the social organism, regarded as a whole, in all its manifestations.” – A.L. Bowly. “Statistics are the numerical statements of facts in any department of enquiry placed in relation to each other.” – A.L. Bowly. “By statistics we mean quantitative data affected to a marked extent by multiplicity of causes.” – Yule and Kendall. “Science of estimates and probabilities.” – Boddington. “The method of judging collective natural or social phenomena from the results obtained by the analysis of an enumeration or collection of estimaties.” – W.I. King. “Statistics is the science which deals with collection classification and tabulation of numerical facts as the basis for explanation description and comparison of phenomenon”. – Lovitt. “The science which deals with the collection, tabulation, analysis and interpretation of numerical data.” – Croxton and Cowden. From the above author definition comprehended as “Statistics is a branch of science which deals with collection, classification, tabulation, analysis and interpretation of data.”
  • 11. DATA Data play the role of raw material for any statistical investigation and defined in a single sentence as “The values of different objects collected in a survey or recorded values of an experiment over a time period taken together constitute what we call data in Statistics” Each value in the data is known as observation. Statistical data based on the characteristic, nature of the characteristic, level of measurement, time and ways of obtaining it may be classified as follows:
  • 12. Types of Data Based on the characteristic  Qualitative Data  Quantitative Data Based on nature of the characteristic  Discrete data  Continuous data Based on the level of measurement  Nominal Data  Ordinal Data  Interval Data  Ratio Data Based on the Time Component  Time Series data  Cross Sectional data Based on the ways of obtaining the data  Primary Data  Secondary Data
  • 13. Quantitative Data As the name quantitative itself suggests that it is related to the quantity. In fact, data are said to be quantitative data if a numerical quantity (which exactly measure the characteristic under study) is associated with each observation. Generally, interval or ratio scales are used as a measurement of scale in case of quantitative data. Data based on the following characteristics generally gives quantitative type of data. Such as weight, height, ages, length, area, volume, money, temperature, humidity, size, etc. For example, (i) Weights in kilogram (say) of students of a class. (ii)Height in centimetre (say) of the candidates appearing in a direct recruitment of Indian army organised by a particular cantonment. (iii)Age of the females at the time of marriage celebrated over a period of week in Delhi. (iv)Length (in cm) of different tables in a showroom of furniture.
  • 14. Qualitative Data As the name qualitative itself suggests that it is related to the quality of an object/thing. It is obvious that quality cannot be measured numerically in exact terms. Thus, if the characteristic/attribute under study is such that it is measured only on the bases of presence or absence then the data thus obtained is known as qualitative data. Generally nominal and ordinal scales are used as a measurement of scale in case of qualitative data. Data based on the following characteristics generally gives qualitative data. Such as gender, marital status, qualification, colour, religion, satisfaction, types of trees, beauty, honesty, etc. For example, i. If the characteristic under study is gender then objects can be divided into two categories, male and female. ii. If the characteristic under study is marital status then objects can be divided into four categories married, unmarried, divorcee, widower. iii. If the characteristic under study is qualification (say) ‘matriculation’ then objects can be divided into two categories as ‘Matriculation passed’ and ‘not passed’. iv. If the characteristic under study is ‘colour’ then the objects can be divided into a number of categories Violet, Indigo, Blue, Green, Yellow, Orange and Red.
  • 15. Discrete Data If the nature of the characteristic under study is such that values of observations may be at most countable between two certain limits then corresponding data are known as discrete data For example, (i) Number of books on the self of an Elmira in a library form discrete data. Because number of books may be 0 or 1 or 2 or 3,…. But number of books cannot take any real values such as 0.8, 1.32, 1.53245, etc. (ii)If there are 30 students in a class, then number of students presents in a lecture forms discrete data. Because number of present students may be 1 or 2 or 3 or 4 or…or 30. But number of present students cannot take any real values between 0 and 30 such as 1.8675, 22.56, 29.95, etc. (iii)Number of children in a family in a locality forms discrete data. Because number of children in a family may be 0 or 1 or 2 or 3 or 4 or…. But number of children cannot take any real values such as 2.3, 3.75, etc. (iv)Number of mistakes on a particular page of a book. Obviously number of mistakes may be 0 or 1 or 2 or 3…. But cannot be 6.74, 3.9832, etc.
  • 16. Continuous Data Data are said to be continuous if the measurement of the observations of a characteristic under study may be any real value between two certain limits. For example, (i)Data obtained by measuring the heights of the students of a class of say 30 students form continuous data, because if minimum and maximum heights are 152cm and 175 cm then heights of the students may take any possible values between 152 cm and 175 cm. For example, it may be 152.2375 cm, 160.31326… cm, etc. (ii)Data obtained by measuring weights of the students of a class also form continuous data because weights of students may be 48.25796… kg, 50.275kg, 42.314314314…kg, etc.
  • 17. Time Series Data Collection of data is done to solve a purpose in hand. The purpose may have its connection with time, geographical location or both. If the purpose of data collection has its connection with time then it is known as time series data. That is, in time series data, time is one of the main variables and the data collected usually at regular interval of time related to the characteristic(s) under study show how characteristic(s) changes over the time. For example, quarterly profit of a company for last eight quarters, yearly production of a crop in India for last six years, yearly expenditure of a family on different items for last five years, weekly rate of inflation for last ten weeks, etc. all form time series data. If the purpose of the data collection has its connection with geographical location then it is known as Spatial Data. For example, (i) Price of petrol in Delhi, Haryana, Punjab, Chandigarh at a particular time. (ii) Number of runs scored by a batsman in different matches in a one day series in different stadiums. If the purpose of the data collection has its connection with both time and geographical location then it is known as Spacio Temporal Data. For example, data related to population of different states in India in 2001 and 2011 will be Spacio Temporal Data. In time series data, spatial data and spacio temporal data we see that concept of frequency have no significance and hence known as non-frequency data. For instance, in the example discussed in case of time series data, expenditure of Rs 40000 on food in 2006 is itself important, here its frequency say 3 (repeated three times) does not make any sense. Now consider the case of marks of 40 students in a class out of 10 (say). Here we note that there may be more than one student who score same marks in the test. Suppose out of 40 students 5 score 10 out of 10, it means marks 10 have frequency 5. This type of data where frequency is meaningful is known as frequency data.
  • 18. Cross Sectional Data Sometimes we are interested to know that how is a characteristic (such as income or expenditure, population, votes in an election, etc.) under study at one point in time is distributed over different subjects (such as families, countries, political parties, etc.). This type of data which is collected at one point in time is known as cross sectional data. For example, annual income of different families of a locality, survey of consumer’s expenditure conducted by a research scholar, opinion polls conducted by an agency, salaries of all employees of an institute, etc.
  • 19. Primary Data Data which are collected by an investigator or agency or institution for a specific purpose and these people are first to use these data, are called primary data. That is, these data are originally collected by these people and they are first to use these data. For example, suppose a research scholar wants to know the mean age of students of M.Sc. Chemistry of a particular university. If he collects the data related to the age of each student of M.Sc. Chemistry of that particular university by contacting each student personally. The data so obtained by the research scholar is an example of primary data for the same research scholar. There are a number of methods of collection of primary data depending upon many factors such as geographical area of the field, money available, time period, accuracy needed, literacy of the respondents/informants, etc. Here we will discuss only following commonly used methods. (1) Direct Personal Investigation Method (2) Telephone Method (3) Indirect Oral Interviews Method (4) Local Correspondents Method (5) Mailed Questionnaires Method (6) Schedules Method Let us discuss these methods one by one with some examples, merits and demerits.
  • 20. SECONDARY DATA Discussion in the previous section shows that collection of primary data requires lot of time, money, manpower, etc. But sometimes some or all these resources are not sufficient to go for the collection of primary data. Also, in some situations it may not be feasible to collect primary data easily. To overcome these types of difficulties, there is another way of collecting data known as secondary data. The data obtained/gathered by an investigator or agency or institution from a source which already exists, are called secondary data. That is, these data were originally collected by an investigator or agency or institution and has been used by them at least once and now, these are going to be used at least second time. Already existed data in different sources may be in published or unpublished form. So sources of secondary data can broadly be classified under the following two heads. (1) Published Sources When an institution or organisation publishes its own collected data (primary data) in public domain either in printed form or in electronic form then these data are said to be secondary data in published form and the source where these data are available is known as published source of the secondary data of the corresponding institution or organisation. Some of the published sources of secondary data are given below:  International Publications  Government Publications in India  Published Reports of Commissions and Committees  Research Publications  Reports of Trade and Industry Associations  Published Printed Sources  Published Electronic Sources
  • 21. SECONDARY DATA(2) (2)Unpublished Sources- Collected information in term of data or data observed through own experience by an individual or by an organisation which is in unpublished form is known as unpublished source of secondary data. (i) Records and statistics maintained by different institutions or organisations whether they are government or non- government (ii)Unpublished projects works, field works or some other research related works submitted by students in their corresponding institutes (iii)Records of Central Bureau of Investigation (iv)Personal diaries, etc.
  • 22. MEASUREMENT SCALES Two words “counting” and “measurement” are very frequently used by everybody. For example, if you want to know the number of pages in a note book, you can easily count them. Also, if you want to know the height of a man, you can easily measure it. But, in Statistics, act of counting and measurement is divided into 4 levels of measurement scales known as (1) Nominal Scale In Latin, ‘Nomen’ means name. The word nominal has come from this Latin word, i.e. ‘Nomen’. Therefore, under nominal scale we divide the objects under study into two or more categories by giving them unique names. The classification of objects into atleast two or more categories is done in such a way that (a) Each object takes place only in one category, i.e. each object falls in a unique category, i.e. it either belongs to a category or not. Mathematically, we may use the symbol (“=”, “”) if an object falls in a category or not. (b) Number of categories must be sufficient to include all objects, i.e. there should not be scope for missing even a single object which does not fall in any of the categories. That is, in statistical language categories must be mutually exclusive and exhaustive. Generally nominal scale is used when we want to categories the data based on the characteristic such as gender, race, region, religion, etc.
  • 23. (2) Ordinal Scale We have seen that order does not make any sense in nominal scale. As the name ordinal itself suggests that other than the names or codes given to the different categories, it also provides the order among the categories. That is, we can place the objects in a series based on the orders or ranks given by using ordinal scale. But here we cannot find actual difference between the two categories. Generally ordinal scale is used when we want to measure the attitude scores towards the level of liking, satisfaction, preference, etc. Different designation in an institute can also be measured by using ordinal scale. For example Suppose, a school boy is asked to list the name of three ice-cream flavours according to his preference. Suppose he lists them in the following order: Vanilla Straw berry Tooty-frooty This indicates that he likes vanilla more compared to straw berry and straw berry more as compared to tooty-frooty. But the actual difference between his liking between vanilla and straw berry cannot be measured. In sixth pay commission, teachers of colleges and universities are designated as Assistant Professor, Associate Professor and Professor. The rank of Professor is higher than that of Associate Professor and designation of Associate Professor is higher than Assistant Professor. But you cannot find the actual difference between Professor and Associate Professor or Professor and Assistant Professor or Associate Professor and Assistant Professor. This is because, one teacher in a designation might have served certain number of years and have done a good quality of research work, etc. and other teacher in the same designation might have served for lesser number of years have done unsatisfactory research work, etc. So, the actual difference between one designation and other designation cannot be found. So one may be very near to his next higher designation and other may
  • 24. (3) Interval Scale If I = [4, 9] then length of this interval is 9-4 =5, i.e. difference between 4 and 9 is 5, i.e. we can find the difference between any two points of the interval. For example, 7, 7.3, difference between 7 and 7.3 is 0.3. Thus we see that property of difference holds in case of intervals. Similarly, third level of measurement, i.e. interval scale possesses the property of difference which was not satisfied in case of nominal and ordinal scales. Nominal scale gives only names to the different categories, ordinal scale moving one step further also provides the concept of order between the categories and interval scale moving one step ahead to ordinal scale also provides the characteristic of the difference between any two categories. Interval scale is used when we want to measure years/historical time/calendar time, temperature (except in the Kelvin scale), sea level, marks in the tests where there is negative marking also, etc. Mathematically, this scale includes +, – in addition to >, < and = and not equal. let us consider some examples: The measurement of time of an historical event comes under interval scale because there is no fixed origin of time (i.e. ‘0’ year). As’0’ year differ calendar to calendar or society/country to society/country e.g. Hindus, Muslim and Hebrew calendars have different origin of time, i.e. ‘0’ year is not defined. In Indian history also, we may find BC (Before Christ).
  • 25. (4) Ratio Scale Ratio scale is the highest level of measurement because nominal scale gives only names to the different categories, ordinal scale provides orders between categories other than names, interval scale provides the facility of difference between categories other than names and orders but ratio scale other than names, orders and characteristic of difference also provides natural zero (absolute zero). In ratio measurement scale values of characteristic cannot be negative. Ratio scale is used when we want to measure temperature in Kelvin, weight, height, length, age, mass, time, plane angle, etc. Ratio scale includes x, division in addition to +, –, >, <, =, not equal. But be careful never take ‘0’ in the denominator while finding ratios. For example, 0/4 is meaningless. let us consider some examples, Measurement of temperature in Kelvin scale comes under ratio scale because it has an absolute zero which is equivalent to C 15.273 0. This characteristic of origin allows us to make the statement like 50K (‘50K’ read as 50 degree Kelvin) is 5 time hot compare to 10K. Both height (in cm.) and age (in days) of students of M.Sc. Statistics of a particular university satisfy all the requirements of a ratio scale. Because height and age both cannot be negative (i.e have an absolute zero).
  • 26. Permissible Statistical Tools in measurement scales MEASUREMENT SCALE PERMISSIBLE STATISTICAL TOOLS LOGIC/REASON NOMINAL SCALE Mode, chi-square test and run test Here counting is only permissible operation. ORDINAL SCALE Median all positional averages like quartile, Decile, percentile, Spearman’s Rank correlation Here other than counting, order relation (less than or greater than) also exists. INTERVAL SCALE Mean , S.D., t-test, F-test, ANOVA, sample multiple and moment correlations, regression. Here counting, order and difference operations hold. RATIO SCALE Geometric mean (G.M.), Harmonic mean (H.M.), Coefficient of variation. Here counting, order, difference and natural zero exist.
  • 27. Types of Data Analysis 1. Descriptive Statistics - provide an overview of the attributes of a data set. These include measurements of central tendency (frequency, histograms, mean, median, & mode) and dispersion (range, variance & standard deviation) 2. Inferential Statistics - provide measures of how well your data support your hypothesis and if your data are generalizable beyond what was tested (significance tests)
  • 28. Types of Data Analysis Descriptive Measures of central tendency Measures of dispersion Measures of Skewness Correlation and regression of two variables Inferential Parametric tests- Hypothesis testing: z-test; t-test; ANOVA(1 Way); Chi–square test; Non-Parametric tests- Mann-Whitney test (U-test); Kruskal-Wallis test (H-test); Rank correlation test
  • 29. Measures of central tendency-According to Professor Bowley, averages are “statistical constants which enable us to comprehend in a single effort the significance of the whole”. They throw light as to how the values are concentrated in the central part of the distribution. For this reason as on last page that they are also called the measures of central tendency, an average is a single value which is considered as the most representative for a given set of data. Measures of central tendency show the tendency of some central value around which data tend to cluster. Significance of the Measure of Central Tendency The following are two main reasons for studying an average: 1. To get a single representative Measure of central tendency enables us to get a single value from the mass of data and also provide an idea about the entire data. For example it is impossible to remember the heights measurement of all students in a class. But if the average height is obtained, we get a single value that represents the entire class. 2. To facilitate comparison Measures of central tendency enable us to compare two or more than two populations by reducing the mass of data in one single figure. The comparison can be made either at the same time or over a period of time. For example, if a subject has been taught in more than two classes so by obtaining the average marks of those classes, comparison can be made.
  • 30. Properties of a Good Average 1. It should be simple to understand Since we use the measures of central tendency to simplify the complexity of a data, so an average should be understandable easily otherwise its use is bound to be very limited. 2. It should be easy to calculate An average not only should be easy to understand but also should be simple to compute, so that it can be used as widely as possible. 3. It should be rigidly defined A measure of central tendency should be defined properly so that it has an appropriate interpretation. It should also have an algebraic formula so that if different people compute the average from same figures, they get the same answer. 4. It should be liable for algebraic manipulations A measure of central tendency should be liable for the algebraic manipulations. If there are two sets of data and the individual information is available for both set, then one can be able to find the information regarding the combined set also then something is missing. 5. It should be least affected by sampling fluctuations We should prefer a tool which has a sampling stability. In other words, if we select 10 different groups of observations from same population and compute the average of each group, then we should expect to get approximately the same values. There may be little difference because of the sampling fluctuation only. 6. It should be based on all the observations If any measure of central tendency is used to analyse the data, it is desirable that each and every observation is used for its calculation. 7. It should be possible to calculate even for open-end class intervals A measure of central tendency should able to be calculated for the data with open end classes. 8. It should not be affected by extremely small or extremely large observations It is assumed that each and every observation influences the value of the average. If one or two very small or very large observations affect the average i.e. either increase or decrease its value largely, then the average cannot be consider as a good average.
  • 31. Different Measures of central tendency 1) Arithmetic Mean 2) Weighted Mean 3) Geometric Mean 4) Harmonic Mean 5) Median 6) Mode Partition Values 1) Quartiles 2) Deciles 3) Percentiles
  • 32. Arithmetic Mean Arithmetic mean (also called mean) is defined as the sum of all the observations divided by the number of observations. Arithmetic mean fulfils most of the properties of a good average except the last two. It is particularly useful when we are dealing with a sample as it is least affected by sampling fluctuations. It is the most popular average and should always be our first choice unless there is a strong reason for not using it. Calculate mean of the weights of five students 54, 56, 70, 45, 50 (in kg) Sum of the given value is 275/5=55 Therefore, average weight of students is 55 kg Merits 1. It utilizes all the observations; 2. It is rigidly defined; 3. It is easy to understand and compute; and 4. It can be used for further mathematical treatments. Demerits 1. It is badly affected by extremely small or extremely large values; 2. It cannot be calculated for open end class intervals; and 3. It is generally not preferred for highly skewed distributions.
  • 33. WEIGHTED MEAN Weight here refers to the importance of a value in a distribution. A simple logic is that a number is as important in the distribution as the number of times it appears. So, the frequency of a number can also be its weight. But there may be other situations where we have to determine the weight based on some other reasons. For example, the number of innings in which runs were made may be considered as weight because runs (50 or 100 or 200) show their importance. Calculating the weighted mean of scores of several innings of a player, we may take the strength of the opponent (as judged by the proportion of matches lost by a team against the opponent) as the corresponding weight. Higher the proportion stronger would be the opponent and hence more would be the weight.
  • 34. MEDIAN Median is that value of the variable which divides the whole distribution into two equal parts. Here, it may be noted that the data should be arranged in ascending or descending order of magnitude. When the number of observations is odd then the median is the middle value of the data. For even number of observations, there will be two middle values. So we take the arithmetic mean of these two middle values. Number of the observations below and above the median, are same. Median is not affected by extremely large or extremely small values (as it corresponds to the middle value) and it is also not affected by open end class intervals. In such situations, it is preferable in comparison to mean. It is also useful when the distribution is skewed (asymmetric). Find median of following observations: 6, 4, 3, 7, 8 First we arrange the given data in ascending order as 3, 4, 6, 7, 8 Since, the number of observations i.e. 5, is odd, so median would be the middle value that is 6. Merits 1. It is rigidly defined; 2. It is easy to understand and compute; 3. It is not affected by extremely small or extremely large values; and 4. It can be calculated even for open end classes (like “less than 10” or “50 and above”). Demerits 1. In case of even number of observations we get only an estimate of the median by taking the mean of the two middle values. We don’t get its exact value; 2. It does not utilize all the observations. The median of 1, 2, 3 is 2. If the observation 3 is replaced by any number higher than or equal to 2 and if the number 1 is replaced by any number lower than or equal to 2, the median value will be unaffected. This means 1 and 3 are not being utilized; 3. It is not amenable to algebraic treatment; and 4. It is affected by sampling fluctuations.
  • 35. MODE Highest frequent observation in the distribution is known as mode. In other words, mode is that observation in a distribution which has the maximum frequency. For example, when we say that the average size of shoes sold in a shop is 7 it is the modal size which is sold most frequently. Merits 1. Mode is the easiest average to understand and also easy to calculate; 2. It is not affected by extreme values; 3. It can be calculated for open end classes; 4. As far as the modal class is confirmed the pre-modal class and the post modal class are of equal width; and 5. Mode can be calculated even if the other classes are of unequal width Demerits 1. It is not rigidly defined. A distribution can have more than one mode; 2. It is not utilizing all the observations; 3. It is not amenable to algebraic treatment; and 4. It is greatly affected by sampling fluctuations.
  • 36. Relationship between Mean, Median and Mode For a symmetrical distribution the mean, median and mode coincide. But if the distribution is moderately asymmetrical, there is an empirical relationship between them. The relationship is Mean – Mode = 3 (Mean – Median) Mode = 3 Median – 2 Mean Note: Using this formula, we can calculate mean/median/mode if other two of them are known.
  • 37. GEOMETRIC MEAN The geometric mean (GM) of n observations is defined as the n-th root of the product of the n observations. It is useful for averaging ratios or proportions. It is the ideal average for calculating index numbers (index numbers are economic barometers which reflect the change in prices or commodity consumption in the current period with respect to some base period taken as standard). It fails to give the correct average if an observation is zero or negative. Merits 1. It is rigidly defined; 2. It utilizes all the observations; 3. It is amenable to algebraic treatment (the reader should verify that if GM1 and GM2 are Geometric Means of two series-Series 1 of size n and Series 2 of size m respectively, then Geometric Mean of the combined series is given by Log GM = (n GM1 + m GM2) / (n + m); 4. It gives more weight to small items; and 5. It is not affected greatly by sampling fluctuations. Demerits 1. Difficult to understand and calculate; and 2. It becomes imaginary for an odd number of negative observations and becomes zero or undefined if a single observation is zero.
  • 38. HARMONIC MEAN HM is defined as the value obtained when the number of values in the data set is divided by the sum of reciprocals The harmonic mean (HM) is defined as the reciprocal (inverse) of the arithmetic mean of the reciprocals of the observations of a set. Merits 1. It is rigidly defined; 2. It utilizes all the observations; 3. It is amenable to algebraic treatment; and 4. It gives greater importance to small items. Demerits 1. Difficult to understand and compute.
  • 39. PARTITION VALUES- Partition values are those values of variable which divide the distribution into a certain number of equal parts. Here it may be noted that the data should be arranged in ascending or descending order of magnitude. Commonly used partition values are quartiles, deciles and percentiles. For example, quartiles divide the data into four equal parts. Similarly, deciles and percentiles divide the distribution into ten and hundred equal parts, respectively. Quartiles Quartiles divide whole distribution in to four equal parts. There are three quartiles- 1st Quartile denoted as Q1, 2nd Quartile denoted as Q2 and 3rd Quartile as Q3, which divide the whole data in four parts. 1st Quartile contains the ¼ part of data, 2nd Quartile contains ½ of the data and 3rd Quartile contains the ¾ part of data. Here, it may be noted that the data should be arranged in ascending or descending order of magnitude. Deciles Deciles divide whole distribution in to ten equal parts. There are nine deciles. D1, D2,...,D9 are known as 1st Decile, 2nd Decile,...,9th Decile respectively and ith Decile contains the (iN/10)th part of data. Here, it may be noted that the data should be arranged in ascending or descending order of magnitude. Percentiles Percentiles divide whole distribution in to 100 equal parts. There are ninety nine percentiles. P1, P2, …,P99 are known as 1st percentile, 2nd percentile,…,99th percentile and ith percentile contains the (iN/100)th part of data. Here, it may be noted that the data should be arranged in ascending or descending order of magnitude.
  • 40. MEASURES OF DISPERSION Different measures of central tendency give a value around which the data is concentrated. But it gives no idea about the nature of scatter or spread. For example, the observations 10, 30 and 50 have mean 30 while the observations 28, 30, 32 also have mean 30. Both the distributions are spread around 30. But it is observed that the variability among units is more in the first than in the second. In other words, there is greater variability or dispersion in the first set of observations in comparison to other. Measure of dispersion is calculated to get an idea about the variability in the data. According to Spiegel, the degree to which numerical data tend to spread about an average value is called the variation or dispersion of data. Actually, there are two basic kinds of a measure of dispersion (i) Absolute measures and (ii) Relative measures. The absolute measures of dispersion are used to measure the variability of a given data expressed in the same unit, while the relative measures are used to compare the variability of two or more sets of observations. Following are the different measures of dispersion: 1. Range 2. Quartile Deviation 3. Mean Deviation 4. Standard Deviation and Variance
  • 41. Properties of Good Measure of Dispersion The properties of a good measure of dispersion are similar to the properties of a good measure of average. So, a good measure of dispersion should possess the following properties: 1. It should be simple to understand; 2. It should be easy to compute; 3. It should be rigidly defined; 4. It should be based on each and every observations of data; 5. It should be amenable to further algebraic treatment; 6. It should have sampling stability; and 7. It should not be unduly affected by extreme observations
  • 42. RANGE Range is the simplest measure of dispersion. It is defined as the difference between the maximum value of the variable and the minimum value of the variable in the distribution. Its merit lies in its simplicity. The demerit is that it is a crude measure because it is using only the maximum and the minimum observations of variable. However, it still finds applications in Order Statistics and Statistical Quality Control. R=X max-X min where, X max : Maximum value of variable and X min : Minimum value of variable Find the range of the distribution 6, 8, 2, 10, 15, 5, 1, 13. For the given distribution, the maximum value of variable is 15 and the minimum value of variable is 1. Hence range = 15 -1 = 14. Merits of Range 1. It is the simplest to understand; 2. It can be visually obtained since one can detect the largest and the smallest observations easily and can take the difference without involving much calculations; and 3. Though it is crude, it has useful applications in areas like order statistics and
  • 43. QUARTILE DEVIATION As you have already studied about quartile that Q1 and Q3 are the first quartile and the third quartile respectively. (Q3 – Q1) gives the inter quartile range. The semi inter quartile range which is also known as Quartile Deviation (QD) is given by Quartile Déviation (QD) = (Q3 – Q1) / 2 Relative measure of Q.D. known as Coefficient of Q.D. and is defined as Coefficient of Q.D = Q3-Q1/Q3+Q1
  • 44. MEAN DEVIATION Mean deviation is defined as average of the sum of the absolute values of deviation from any arbitrary value viz. mean, median, mode, etc. It is often suggested to calculate it from the median because it gives least value when measured from the median. The deviation of an observation xi from the assumed mean A is defined as (xi – A). Therefore, the mean deviation can be defined as MD = (xi – A)/n Merits of Mean Deviation 1. It utilizes all the observations; 2. It is easy to understand and calculate; and 3. It is not much affected by extreme values. Demerits of Mean Deviation 1. Negative deviations are straightaway made positive; 2. It is not amenable to algebraic treatment; and 3. It can not be calculated for open end classes
  • 45. VARIANCE In the previous section, we have seen that while calculating the mean deviation, negative deviations are straightaway made positive. To overcome this drawback we move towards the next measure of dispersion called variance. Variance is the average of the square of deviations of the values taken from mean. Taking a square of the deviation is a better technique to get rid of negative deviations. Variance is defined as And for frequency distribution, the formula is It should be noted that sum of squares of deviations is least when deviations are measured from the mean. This means Sum(xi – A)2 is least when A = Mean. Merits of Variance 1. It is rigidly defined; 2. It utilizes all the observations; 3. Amenable to algebraic treatment; 4. Squaring is a better technique to get rid of negative deviations; and 5. It is the most popular measure of dispersion. Demerits of Variance 1. In cases where mean is not a suitable average, standard deviation may not be the coveted measure of dispersion like when open end classes are present. In such cases quartile deviation may be used; 2.Although easy to understand, calculation may require a calculator or a computer; and 3.Its unit is square of the unit of the variable due to which it is difficult to judge the magnitude of dispersion compared to standard deviation.
  • 46. Standard Deviation Standard deviation (SD) is defined as the positive square root of variance. The formula is Merits of Standard Deviation 1. It is rigidly defined; 2. It utilizes all the observations; 3. It is amenable to algebraic treatment; 4. Squaring is a better technique to get rid of negative deviations; and 5. It is the most popular measure of dispersion. Demerits of Standard Deviation 1. In cases where mean is not a suitable average, standard deviation may not be the appropriate measure of dispersion like when open end classes are present. In such cases quartile deviation may be used; 2. It is not unit free; and 3. Although it is easy to understand but calculation may require a calculator or a computer.
  • 47. SKEWNESS We have talked about average and dispersion. They give the location and scale of the distribution. In addition to measures of central tendency and dispersion, we also need to have an idea about the shape of the distribution. Measure of Skewness gives the direction and the magnitude of the lack of symmetry whereas the kurtosis gives the idea of flatness. Lack of symmetry is called skewness for a frequency distribution. If the distribution is not symmetric, the frequencies will not be uniformly distributed about the centre of the distribution. CONCEPT OF SKEWNESS Skewness means lack of symmetry. In mathematics, a figure is called symmetric if there exists a point in it through which if a perpendicular is drawn on the X-axis, it divides the figure into two congruent parts i.e. identical in all respect or one part can be superimposed on the other i.e mirror images of each other. In Statistics, a distribution is called symmetric if mean, median and mode coincide. Otherwise, the distribution becomes asymmetric. If the right tail is longer, we get a positively skewed distribution for which mean > median > mode while if the left tail is longer, we get a negatively skewed distribution for which mean < median < mode. The example of the Symmetrical curve, Positive skewed curve and Negative skewed curve are given in the next slide
  • 49. VARIOUS MEASURES OF SKEWNESS Measures of skewness help us to know to what degree and in which direction (positive or negative) the frequency distribution has a departure from symmetry. Although positive or negative skewness can be detected graphically depending on whether the right tail or the left tail is longer but, we don’t get idea of the magnitude. Besides, borderline cases between symmetry and asymmetry may be difficult to detect graphically. Hence some statistical measures are required to find the magnitude of lack of symmetry. A good measure of skewness should 1. It should be a unit free number so that the shapes of different distributions, so far as symmetry is concerned, can be compared even if the unit of the underlying variables are different; 2. If the distribution is symmetric, the value of the measure should be zero. Similarly, the measure should give positive or negative values according as the distribution has positive or negative skewness respectively; and 3. As we move from extreme negative skewness to extreme positive skewness, the value of the measure should vary accordingly. possess three criteria: Measures of skewness can be both absolute as well as relative. Since in a symmetrical distribution mean, median and mode are identical more the mean moves away from the mode, the larger the asymmetry or skewness. An absolute measure of skewness can not be used for purposes of comparison because of the same amount of skewness has different meanings in distribution with small variation and in distribution with large variation.
  • 50. Absolute Measures of Skewness Following are the absolute measures of skewness: 1. Skewness (Sk) = Mean – Median 2. Skewness (Sk) = Mean – Mode 3. Skewness (Sk) = (Q3 - Q2) - (Q2 - Q1) For comparing to series, we do not calculate these absolute measures we calculate the relative measures which are called coefficient of skewness. Coefficient of skewness are pure numbers independent of units of measurements.
  • 51. Relative Measures of Skewness In order to make valid comparison between the skewness of two or more distributions we have to eliminate the distributing influence of variation. Such elimination can be done by dividing the absolute skewness by standard deviation. The following are the important methods of measuring relative skewness:  Karl Pearson’s coefficient of skewness= Sk=Mean-Mode/SD Sk(P)=3(Mean-Mode)/SD  Bowley’s coefficient of skewness Sk(B)=Q3+Q1-2Md/Q3-Q1
  • 52. CORRELATION CONCEPT In many practical applications, we might come across the situation where observations are available on two or more variables. The following examples will illustrate the situations clearly: 1. Heights and weights of persons of a certain group; 2. Sales revenue and advertising expenditure in business; and 3. Time spent on study and marks obtained by students in exam. If data are available for two variables, say X and Y, it is called bivariate distribution. Let us consider the example of sales revenue and expenditure on advertising in business. A natural question arises in mind that is there any connection between sales revenue and expenditure on advertising? Does sales revenue increase or decrease as expenditure on advertising increases or decreases? If we see the example of time spent on study and marks obtained by students, a natural question appears whether marks increase or decrease as time spent on study increase or decrease. In all these situations, we try to find out relation between two variables and correlation answers the question, if there is any relationship between one variable and another. When two variables are related in such a way that change in the value of one variable affects the value of another variable, then variables are said to be correlated or there is correlation between these two variables.
  • 53. TYPES OF CORRELATION 1. Positive Correlation Correlation between two variables is said to be positive if the values of the variables deviate in the same direction i.e. if the values of one variable increase (or decrease) then the values of other variable also increase (or decrease). Some examples of positive correlation are correlation between 1. Heights and weights of group of persons; 2. House hold income and expenditure; 3. Amount of rainfall and yield of crops; and 4. Expenditure on advertising and sales revenue In the last example, it is observed that as the expenditure on advertising increases, sales revenue also increases. Thus, the change is in the same direction. Hence the correlation is positive. In remaining three examples, usually value of the second variable increases (or decreases) as the value of the first variable increases (or decreases).
  • 54. 2. Negative Correlation Correlation between two variables is said to be negative if the values of variables deviate in opposite direction i.e. if the values of one variable increase (or decrease) then the values of other variable decrease (or increase). Some examples of negative correlations are correlation between 1. Volume and pressure of perfect gas; 2. Price and demand of goods; 3. Literacy and poverty in a country; and 4. Time spent on watching TV and marks obtained by students in examination. In the first example pressure decreases as the volume increases or pressure increases as the volume decreases. Thus the change is in opposite direction. Therefore, the correlation between volume and pressure is negative. In remaining three examples also, values of the second variable change in the opposite direction of the change in the values of first variable.
  • 55. SCATTER DIAGRAM Scatter diagram is a statistical tool for determining the potentiality of correlation between dependent variable and independent variable. Scatter diagram does not tell about exact relationship between two variables but it indicates whether they are correlated or not. Let (Xi, Yi); (1,2,.......n) be the bivariate distribution. If the values of the dependent variable Y are plotted against corresponding values of the independent variable X in the XY plane, such diagram of dots is called scatter diagram or dot diagram. It is to be noted that scatter diagram is not suitable for large number of observations. Interpretation from Scatter Diagram If dots are in the shape of a line and line rises from left bottom to the right top (Fig.1), then correlation is said to be perfect positive.
  • 56. If dots in the scatter diagram are in the shape of a line and line moves from left top to right bottom (Fig. 2), then correlation is perfect negative. If dots show some trend and trend is upward rising from left bottom to right top (Fig.3) correlation is positive.
  • 57. If dots show some trend and trend is downward from left top to the right bottom (Fig.4) correlation is said to be negative. If dots of scatter diagram do not show any trend (Fig. 5) there is no correlation between the variables.
  • 58. COEFFICIENT OF CORRELATION Scatter diagram tells us whether variables are correlated or not. But it does not indicate the extent of which they are correlated. Coefficient of correlation gives the exact idea of the extent of which they are correlated. If X and Y are two random variables then correlation coefficient between X and Y is denoted by r and defined as Coefficient of correlation measures the intensity or degree of linear relationship between two variables. It was given by British Biometrician Karl Pearson (1867-1936).
  • 59. Assumptions for Correlation Coefficient 1. Assumption of Linearity Variables being used to know correlation coefficient must be linearly related. You can see the linearity of the variables through scatter diagram. 2. Assumption of Normality Both variables under study should follow Normal distribution. They should not be skewed in either the positive or the negative direction. 3. Assumption of Cause and Effect Relationship There should be cause and effect relationship between both variables, for example, Heights and Weights of children, Demand and Supply of goods, etc. When there is no cause and effect relationship between variables then correlation coefficient should be zero. If it is non zero then correlation is termed as chance correlation or spurious correlation. For example, correlation coefficient between: 1. Weight and income of a person over periods of time; and 2. Rainfall and literacy in a state over periods of time.
  • 60. LINEAR REGRESSION Prediction or estimation is one of the major problems in most of the human activities. Like prediction of future production of any crop, consumption, price of any good, sales, income, profit, etc. are very important in business world. Similarly, prediction of population, consumption of agricultural product, rainfall, revenue, etc. have great importance to the government of any country for effective planning. If two variables are correlated significantly, then it is possible to predict or estimate the values of one variable from the other. This leads us to very important concept of regression analysis. In fact, regression analysis is a statistical technique which is used to investigate the relationship between variables. The effect of price increase on demand, the effect of change in the money supply on the increase rate, effect of change in expenditure on advertisement on sales and profit in business are such examples where investigators or researchers try to construct cause and affect relationship. To handle these type of situations, investigators collect data on variables of interest and apply regression method to estimate the quantitative effect of the causal variables upon the variable that they influence. Regression analysis describes how the independent variable(s) is (are) related to the dependent variable i.e. regression analysis measures the average relationship between independent variables and dependent variable. The literal meaning of regression is “stepping back towards the average” which was used by British Biometrician Sir Francis Galton (1822-1911) regarding the height of parents and their offspring’s. Regression analysis is a mathematical measure of the average relationship between two or more variables.
  • 61. Types of variables in regression analysis Independent variable The variable which is used for prediction is called independent variable. It is also known as regressor or predictor or explanatory variable. Dependent variable The variable whose value is predicted by the independent variable is called dependent variable. It is also known as regressed or explained variable. If scatter diagram shows some relationship between independent variable X and dependent variable Y, then the scatter diagram will be more or less concentrated round a curve, which may be called the curve of regression. When the curve is a straight line, it is known as line of regression and the regression is said to be linear regression. If the relationship between dependent and independent variables is not a straight line but curve of any other type then regression is known as nonlinear regression. Regression can also be classified according to number of variables being used. If only two variables are being used this is considered as simple regression whereas the involvement of more than two variables in regression is categorized as multiple regression.
  • 62. Formula of Linear Regression If regression line of y on x is and If regression line of x on y is as follows
  • 63. DISTINCTION BETWEEN CORRELATION AND REGRESSION Both correlation and regression have important role in relationship study but there are some distinctions between them which can be described as follow: (i) Correlation studies the linear relationship between two variables while regression analysis is a mathematical measure of the average relationship between two or more variables. (ii) Correlation has limited application because it gives the strength of linear relationship while the purpose of regression is to "predict" the value of the dependent variable for the given values of one or more independent variables. (iii) Correlation makes no distinction between independent and dependent variables while linear regression does it, i.e. correlation does not consider the concept of dependent and independent variables while in regression analysis one variable is considered as dependent variable and other(s) is/are as independent variable(s).
  • 64. CONCEPT OF HYPOTHESIS TESTING In our day-to-day life, we see different commercials advertisements in television, newspapers, magazines, etc. such as (i) The refrigerator of certain brand saves up to 20% electric bill, (ii)The motorcycle of certain brand gives 60 km/liter mileage, (iii)A detergent of certain brand produces the cleanest wash, (iv)Ninety nine out of hundred dentists recommend brand A toothpaste for their patients to save the teeth against cavity, etc. Now, the question may arise in our mind “can such types of claims be verified statistically?” Fortunately, in many cases the answer is “yes”. The technique of testing such type of claims or statements or assumptions is known as testing of hypothesis. The truth or falsity of a claim or statement is never known unless we examine the entire population. But practically it is not possible in mostly situations so we take a random sample from the population under study and use the information contained in this sample to take the decision whether a claim is true or false.
  • 65. CONCEPT OF HYPOTHESIS TESTING COUNTD In our day-to-day life, we see different commercials advertisements in television, newspapers, magazines, etc. and if someone may be interested to test such type of claims or statement then we come across the problem of testing of hypothesis. For example, (i) a customer of motorcycle wants to test whether the claim of motorcycle of certain brand gives the average mileage 60 km/liter is true or false (ii) the businessman of banana wants to test whether the average weight of a banana of Kerala is more than 200 gm, (iii) a doctor wants to test whether new medicine is really more effective for controlling blood pressure than old medicine, (iv) an economist wants to test whether the variability in incomes differ in two populations, (v) a psychologist wants to test whether the proportion of literates between two groups of people is same, etc. In all the cases discussed above, the decision maker is interested in making inference about the population parameter(s). However, he/she is not interested in estimating the value of parameter(s) but he/she is interested in testing a claim or statement or assumption about the value of population parameter(s). Such claim or statement is postulated in terms of hypothesis. In statistics, a hypothesis is a statement or a claim or an assumption about the value of a population parameter (e.g., mean, median, variance, proportion, etc.). Similarly, in case of two or more populations a hypothesis is comparative statement or a claim or an assumption about the values of population parameters. (e.g., means of two populations are equal, variance of one population is greater than other, etc.). The plural of hypothesis is hypotheses.
  • 66. GENERAL PROCEDURE OF TESTING A HYPOTHESIS Testing of hypothesis is a huge demanded statistical tool by many discipline and professionals. It is a step by step procedure as you will see in next three units through a large number of examples. The aim of this section is just give you flavour of that sequence which involves following steps: Step I: First of all, we have to setup null hypothesis H0 and alternative hypothesis H1. Suppose, we want to test the hypothetical / claimed / assumed value θ0 of parameter θ. So we can take the null and alternative hypotheses as Step II: After setting the null and alternative hypotheses, we establish a criteria for rejection or non-rejection of null hypothesis, that is, decide the level of significance (a), at which we want to test our hypothesis. Generally, it is taken as 5% or 1% (α = 0.05 or 0.01). Case I: If the alternative hypothesis is right-sided such as H1: θ > θ0 or H1: θ1 > θ2 then the entire critical or rejection region of size α lies on right tail of the probability curve of sampling distribution of the test statistic as shown
  • 67. Case II: If the alternative hypothesis is left-sided such as H1: θ < θ0 or H1: θ1 < θ2 then the entire critical or rejection region of size α lies on left tail of the probability curve of sampling distribution of the test statistic as shown Case III: If the alternative hypothesis is two sided such as H1: θ ≠ θ0 or H1: θ1 ≠ θ2 then critical or rejection regions of size α/2 lies on both tails of the probability curve of sampling distribution of the test statistic as shown
  • 68. GENERAL PROCEDURE OF TESTING A HYPOTHESIS(3) Step III: The third step is to choose an appropriate test statistic under H0 for testing the null hypothesis as given below: After that, specify the sampling distribution of the test statistic preferably in the standard form like Z (standard normal), Chi square, t, F or any other well-known in literature. Step IV: Calculate the value of the test statistic described in Step III on the basis of observed sample observations. Step V: Obtain the critical (or cut-off) value(s) in the sampling distribution of the test statistic and construct rejection (critical) region of size alpha. Generally, critical values for various levels of significance are putted in the form of a table for various standard sampling distributions of test statistic such as Z-table, chi square2-table, t-table, etc. Step VI: After that, compare the calculated value of test statistic obtained from Step IV, with the critical value(s) obtained in Step V and locates the position of the calculated test statistic, that is, it lies in rejection region or non-rejection region Step VII: In testing of hypothesis ultimately we have to reach at a conclusion. It is done as explained below: (i) If calculated value of test statistic lies in rejection region at  level of significance then we reject null hypothesis. It means that the sample data provide us sufficient evidence against the null hypothesis and there is a significant difference between hypothesized value and observed value of the parameter. (ii) If calculated value of test statistic lies in non-rejection region at  level of significance then we do not reject null hypothesis. Its means that the sample data fails to provide us sufficient evidence against the null hypothesis and the difference between hypothesized value and observed value of the parameter due to fluctuation of sample.
  • 69. TYPE-I AND TYPE-II ERRORS Type I Errors • A Type I error occurs when the sample data appear to show a treatment effect when, in fact, there is none. • In this case the researcher will reject the null hypothesis and falsely conclude that the treatment has an effect. • Type I errors are caused by unusual, unrepresentative samples. Just by chance the researcher selects an extreme sample with the result that the sample falls in the critical region even though the treatment has no effect. • The hypothesis test is structured so that Type I errors are very unlikely; specifically, the probability of a Type I error is equal to the alpha level. Type II Errors • A Type II error occurs when the sample does not appear to have been affected by the treatment when, in fact, the treatment does have an effect. • In this case, the researcher will fail to reject the null hypothesis and falsely conclude that the treatment does not have an effect. • Type II errors are commonly the result of a very small treatment effect. Although the treatment does have an effect, it is not large enough to show up in the research study.
  • 71. Difference between Statistic and Parameter Statistic  Statistic is a measure which describes a fraction of population  Numerical value Variable and Known  Statistical Notation s = Sample Standard Deviation x = Data Elements n = Size of sample r = Correlation coefficient Parameter  Parameter refers to a measure which describes population.  Numerical value Fixed and Unknown  Statistical Notation μ = Population Mean σ = Population Standard Deviation P = Population Proportion X = Data Elements N = Size of Population ρ = Correlation coefficient
  • 72. Parametric Stastical test Parametric statistic is a branch of statistic, which assumes that sample data comes from a population that follows a probability or normal distribution. When the assumption are correct, parametric methods will produce more accurate and precise estimates. Assumptions  The scores must be independent (In other words the selection of any particular score must not be bias the chance of any other case for inclusion).  The observations must be drawn from normally distributed populations(Follow ND)  The selected population is representative of general population  The data is in Interval or Ratio scale  The populations(If comparing two or more groups) must have the same variances Types of Parametric test 1. Z- test. 2. T-test. 3. ANOVA. 4. F-test. 5. Chi-Square test.
  • 73. Z-test A Z-test is given by Fisher. A Z-test is a type of hypothesis test or statistical test. It is used for testing the mean of a population versus a standard or comparing the means of two population with large sample (n>30). When we can run a Z-test  Your sample size is greater than 30.  Data point should be independent from each other.  Your data should be randomly selected from a population, where each item has an equal chance of being selected.  Data should follow normal distribution.  The standard deviation of the populations is known. There are two ways to calculate z-test a. one-sample z-test. b. two-sample z-test.
  • 74. One-sample z-test One-sample z-test we are comparing the mean, calculated on a single of score (one sample) with known standard deviation. Ex. The manager of a candy manufacture wants to know whether the mean weight of batch of candy boxes is equal to the target value of 10 pounds from historical data.
  • 75. Two-sample z-test When testing for the differences between two groups can imagine two separate situation. Comparing the proportion of two population. In two sample z-test both independent populations. Ex: 1. Comparing the average engineering salaries of men versus women. 2. Comparing the fraction defectives from two production line.
  • 76. T-test It is derived by W.S Gosset in 1908. It is also called student t-test. A t- test statistical significance indicates whether or not the difference between two groups. Assumption:  Samples must be random and independent.  When samples are small. n<30  Standard deviation is not known.  Population is Normal distributed. There are two ways to calculate T-test such as, a. Unpaired t-test.(independent) b. Paired t-test.
  • 77. Unpaired t-test: If there is no link between the data then use the unpaired t-test. When two separate set of independent sample are obtain one from each of the two population being compared. Ex:1. Compare the height of girls and boys. 2. Compare the 2 stress reduction intervention. When one group practiced mindfulness meditation, while other learned yoga.
  • 78. Paired t-test consists of a sample of matched pairs of similar units or one group of units that has been tested twice (a” repeated measures” t-test). If there is some link between the data then use the paired t-test.(e.g. Before and after) Ex: 1. where subject are tested prior to a treatment say for high blood pressure, and the same subject are tested again after treatment with a blood pressure lowering medication. 2. Test on person or any group before and after training. Paired t-test.
  • 79. ANOVA (Analysis of Variance) It is developed by Fisher in 1920. ANOVA is a collection of statistical model used to analyze the differences between groups. Compare multiple groups at one time. It is advanced technique for the experimental treatment of testing differences all of the mean which is not possible in case of t-test. Assumptions:  All population have same standard deviation.  Individuals in population are selected randomly.  Independent samples.  The population must be normal distribution. There are two ways to calculate ANOVA such as. One-way ANOVA: One-way anova compare three or more unmatched groups when data are categorized in one way. Ex: You might be studying the effect of tea on weight loss, from three groups, green tea, black tea, no tea. Two-way ANOVA Two way anova technique is used when the data are classified on the basis of two factors. And two way anova analyzed a 2 independent variable and 1 dependent variable. Ex: The agricultural output may be classified on the basis of different verities of seeds. and also on the basis of different verities of fertilizer used.
  • 80. Chi-Square test It is a test that measures how expectations compare to actual observed data. It is used to investigate whether distribution of categorical variables differ from one another Formula Chi Square= Summation(Oi-Ei)2/Ei It is drawn by Karl Pearson. Chi square test is a statistical test used as a parametric for testing for comparing variance . It is denoted as “ x²” Formula:
  • 81. Non-parametric statistics test Non-parametric statistics is the branch of statistics. It refers to a statistical method in which the data is not required to fit a normal distribution. Nonparametric statistics uses data that is often ordinal, meaning it does not rely on numbers, but rather a ranking or order of sorts. For example: a survey conveying consumer preferences ranging from like to dislike would be considered ordinal data. Nonparametric statistics does not assume that data is drawn from a normal distribution. Instead, the shape of the distribution is estimated under this form of statistical measurements like descriptive statistics, statistical test, inference statistics and models. There is no assumption of sample size because it’s observed data is quantitative. This type of statistics can be used without the mean, sample size, standard deviation or estimation of any other parameters. The non-parametric test are called as “distribution-free” test since they make no assumptions regarding the population distribution. It is test may be applied ranking test. They are easier to explain and easier to understand but one should not forget the fact that they usually less efficient/powerful as they are based on no assumptions. Non-parametric test is always valid, but not always efficient. Types of Non-parametric statistics test Rank sum test Chi-square test Spearman’s rank correlation
  • 82. Rank sum test Rank sum tests are U test (Wilcoxon-Mann-Whitney test) H test (Kruskal-Wallis test) U test: It is a non-parametric test. This test is determine whether two independent samples have been drawn from the same population. The data that can be ranked i.e., order from lowest to highest (ordinal data).
  • 83. U test For example The values of one sample 53, 38, 69, 57, 46 The values of another sample 44, 40, 61, 53, 32 We assign the ranks to all observations, adopting low to high ranking process and given items belong to a single sample. Size of sample in ascending order Rank 32 1 38 2 40 3 44 4 46 5 53 6.5 53 6.5 57 8 61 9 69 10
  • 84. Kruskal-Wallis H test H test: The Kruskal-Wallis H test (also called as the “one- Way ANOVA on ranks”) is a rank-based non parametric test that can be used to determine if there are statistically significant difference between two or more groups of an independent variable on a continuous or ordinal dependent variable. For example: H test to understand whether exam performance, measured on a continuous scale from 0-100, differed based on test anxiety levels(i.e., dependent variable would be “exam performance” and independent variable would be “test axiety level”, which has three independent groups: students with “low”, “medium” and “high” test anxiety levels).
  • 85. Chi square test The chi-square test is a non-parametric test. It is used mainly when dealing with a nominal variable. The chi-square test is mainly 2 methods. Goodness of fit: Goodness of fit refers to whether a significant difference exists between an observed number and an expected number of responses, people or other objects. For example: suppose that we flip a coin 20 times and record the frequency of occurrence of heads and tails. Then we should expect 10 heads and 10 tails. Let us suppose our coin-flipping experiment yielded 12 heads and 8 tails. Our expected frequencies (10-10) and our observed frequencies (12-8). Independence: the independence of test is difference between the frequencies of occurrence in two or more categories with two or more groups.
  • 86. Spearman’s rank correlation test-In this method a measure of association that is based on the ranks of the observations and not on the numerical values of the data. It was developed by famous Charles spearman in the early 1990s and such it is also known as spearman’s rank correlation co-efficient. English (marks) Maths (marks) Rank (English) Rank (maths) Difference of ranks 56 66 9 4 5 75 70 3 2 1 45 40 10 10 0 71 60 4 7 3 62 65 6 5 1 64 56 5 9 16 58 59 8 8 0 80 77 1 1 0 76 67 2 3 1 61 63 7 6 1
  • 87. PROBABILITY In our daily lives, we face many situations when we are unable to forecast the future with complete certainty. That is, in many decisions, the uncertainty is faced. Need to cope up with the uncertainty leads to the study and use of the probability theory. The first attempt to give quantitative measure of probability was made by Galileo (1564-1642), an Italian mathematician, when he was answering the following question on the request of his patron, the Grand Duke of Tuscany, who wanted to improve his performance at the gambling tables: “With three dice a total of 9 and 10 can each be produced by six different combinations, and yet experience shows that the number 10 is oftener thrown than the number 9?” To the mind of his patron the cases were (1, 2, 6), (1, 3, 5), (1, 4, 4), (2, 2, 5), (2, 3, 4), (3,3, 3) for 9 and (1, 3, 6), (1, 4, 5), (2, 2, 6), (2, 3, 5), (2, 4, 4), (3, 3, 4) for 10 and hence he was thinking that why they do not occur equally frequently i.e. why there chances are not the same? Galileo makes a careful analysis of all the cases which can occur, and he showed that out of the 216 possible cases 27 are favourable to the appearance of the number 10 since permutations of (1, 3, 6) are (1, 3, 6), (1, 6, 3), (3, 1, 6), (3, 6, 1), (6, 1, 3), (6, 3, 1) i.e. number of permutations of (1, 3, 6) is 6; similarly, the number of permutations of (1, 4, 5), (2, 2, 6), (2, 3, 5), (2, 4, 4), (3, 3, 4) is 6, 3, 6, 3, 3 respectively and hence the total number of cases come out to be 6 + 6 + 3 + 6 + 3 + 3 = 27 whereas the number of favourable cases for getting a total of 9 on three dice are 6 + 6 + 3 + 3 + 6 + 1 = 25. Hence, this was the reason for10 appearing oftener thrown than 9. But the first foundation was laid by the two mathematicians Pascal (1623-62) and Fermat (1601-65) due to a gambler's dispute in 1654 which led to the creation of a mathematical theory of probability by them. Later, important contributions were made by various researchers including Huyghens (1629 - 1695), Jacob Bernoulli (1654-1705), Laplace (1749-1827), Abraham De Moivre (1667-1754), and Markov (1856-1922). Thomas Bayes (died in 1761, at the age of 59) gave an important technical result known as Bayes’ theorem, published after his death in 1763, using which probabilities can be revised on the basis of some new information. Thereafter, the probability, an important branch of Statistics, is being used worldwide.
  • 104. Probability Distribution There are two types of Probability Distribution; 1)Discrete Probability Distribution- the set of all possible values is at most a finite or a countable infinite number of possible values  Poisson Distribution Binomial Distribution 1)Continuous Probability Distribution- takes on values at every point over a given interval Normal (Gaussian) Distribution
  • 105. Normal (Gaussian) Distribution • The normal distribution is a descriptive model that describes real world situations. • It is defined as a continuous frequency distribution of infinite range (can take any values not just integers as in the case of binomial and Poisson distribution). • This is the most important probability distribution in statistics and important tool in analysis of epidemiological data and management science. Characteristics of Normal Distribution • It links frequency distribution to probability distribution • Has a Bell Shape Curve and is Symmetric • It is Symmetric around the mean: Two halves of the curve are the same (mirror images) • Hence Mean = Median • The total area under the curve is 1 (or 100%) • Normal Distribution has the same shape as Standard Normal Distribution. • In a Standard Normal Distribution: The mean (μ ) = 0 and Standard deviation (σ) =1
  • 106. Normal (Gaussian) Distribution(2) Z Score (Standard Score) • Z = X - μ • Z indicates how many standard deviations away from the mean the point x lies. • Z score is calculated to 2 decimal places. Tables Areas under the standard normal curve
  • 107. 13.6% 2.2% 0.15% -3 -2 -1 μ 1 2 3 Diagram of Normal Distribution Curve (z distribution) 33.35%
  • 108. Normal (Gaussian) Distribution(4) Distinguishing Features • The mean ± 1 standard deviation covers 66.7% of the area under the curve • The mean ± 2 standard deviation covers 95% of the area under the curve • The mean ± 3 standard deviation covers 99.7% of the area under the curve Application/Uses of Normal Distribution • It’s application goes beyond describing distributions • It is used by researchers and modelers. • The major use of normal distribution is the role it plays in statistical inference. • The z score along with the t –score, chi-square and F-statistics is important in hypothesis testing. • It helps managers/management make decisions.
  • 109. Binomial Distribution A widely known discrete distribution constructed by determining the probabilities of X successes in n trials. Assumptions of the Binomial Distribution • The experiment involves n identical trials • Each trial has only two possible outcomes: success and failure • Each trial is independent of the previous trials • The terms p and q remain constant throughout the experiment • p is the probability of a success on any one trial • q = (1-p) is the probability of a failure on any one trial • In the n trials X is the number of successes possible where X is a whole number between 0 and n. • Applications • Sampling with replacement • Sampling without replacement causes p to change but if the sample size n < 5% N, the independence assumption is not a great concern.
  • 110. Binomial Distribution Formula • Probability function • Mean value • Variance and standard deviation ( ) P X n X n X X n X n X p q( ) ! ! ! = − ⋅ ≤ ≤ − for 0 µ = ⋅n p 2 2 σ σ σ = ⋅ ⋅ = = ⋅ ⋅ n p q n p q
  • 111. Poisson Distribution French mathematician Siméon Denis Poisson proposed Poisson DistributionThe Poisson distribution is popular for modelling the number of times an event occurs in an interval of time or space. It is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event. The Poisson distribution may be useful to model events such as • The number of meteorites greater than 1 meter diameter that strike Earth in a year • The number of patients arriving in an emergency room between 10 and 11 pm • The number of photons hitting a detector in a particular time interval • The number of mistakes committed per pages
  • 112. Poisson Distribution Assumptions of the Poisson Distribution • Describes discrete occurrences over a continuum or interval • A discrete distribution • Describes rare events • Each occurrence is independent any other occurrences. • The number of occurrences in each interval can vary from zero to infinity. • The expected number of occurrences must hold constant throughout the experiment.
  • 113. © 2002 Thomson / South-Western Slide 5-113 Poisson Distribution Formula • Probability function P X X X where long run average e X e( ) ! , , , ,... : . ... = = = − = − λ λ λ for (the base of natural logarithms) 0 1 2 3 2 718282 λ Mean valueMean value λ Standard deviationStandard deviationVarianceVariance λ

Editor's Notes

  1. This is the diagram of a normal distribution curve or z distribution. Note the bell shape of the curve and that its ends/tail don’t touch the horizontal axis below. As I mentioned earlier, the area under the curve equals 1 or 100%. Therefore, each half of the distribution from the center (that is from the mean is equal to 50%. Thus, the area from/above the mean up to 1 standard deviation is equal to 33.35%, area above +1 standard deviation is equal to 13.6%, the area above +2 standard deviation is equal to 2.2% and area above +3 standard deviations is equal to 0.1%. Since the other half is a mirror image, the percentage/proportion of area above -1 standard deviation is the same as the area above + 1 standard deviation i.e. it is 33.35%. And -2 standard deviation=+2 standard deviation and so forth….