SlideShare a Scribd company logo
1 of 108
Download to read offline
Refresher in
Statistics and
Analysis Skills
Dr Santam Chakraborty
Statistics - A subject which
most statisticians find
difficult but in which nearly
all physicians are expert.
- Stephen Senn, Statistical Issues in Drug Development
What you will find in this
presentation
● Only 1 calculation
● Only 1 formula
● Lots of Cartoons & Quotes !!!
Intro
Cartoon Number 731 xkcd.com by Randall Munroe
Data
Data Types
Data
Qualitative
data
Quantitative
Data
Nominal
Data
Ordinal
Data
Discrete
Data
Continuous
Data
Key Points
● Converting quantitative data to qualitative data
is not advisable as it leads to data loss.
● QoL data is always qualitative but analyzed
often as quantitative data
● Most medical researchers gather both
qualitative & quantitative data but disregard
qualitative data
Types of Measurements
BINARY
NOMINAL ORDINAL
COUNT CONTINOUS
Variable Types
Variable
Response
Variable
Non
Response
Variable
Independent
Variable
Experimental
Variable
Confounder
Variable
Data Collection
Collecting Data
● This is the most neglected yet most vital
part of the process.
● A structured way to collect data - Form
● Data collection instruments :
○ Surveys
○ Interviews
○ Focus Groups
Form Design Principles
● Be consistent in choice of font and layout
● Use checkboxes instead of allowing people to circle
answers.
● Provide visual cues to the format of data required.
● Instructions should be given in bold and italics
● Specify units of measurement and decimal places
● Use skips sparingly and clearly indicate locations
● Use precoded answer (e.g. Male / Female)
Resources
1. http://www.slideshare.net/psykoreactor/best-practices-for-form-design
2. https://www.lynda.com/Web-Interactive-User-Experience-tutorials/We
b-Form-Design-Best-Practices/83786-2.html?utm_medium=integrated-p
artnership&utm_source=slideshare
3. Bellary S, Krishnankutty B, Latha MS. Basics of case report form
designing in clinical research. Perspect Clin Res. 2014 Oct;5(4):159–66.
Data Storage
Databases : Advantages
● Allow multi-user access
● Respect data integrity
● Allow data validation
● Avoid data redundancy
● Allow flexible and customized queries
Databases : Disadvantages
● More difficult to learn
● May require an understanding of networking
related concepts
● Software maintenance and updates are an issue.
● Have a clear idea of the information that needs to
be included.
● Form design is required.
Spreadsheet Tips
1. Header row should be in the first row only. Don't make
fancy 2/3 row headers.
2. Set the locale to UK / India if you are planning to use
DD/MM/YYYY as the date scheme
3. Freeze the first row and first column to ease data entry.
4. Use conditional formatting to pick up mistakes while
doing data-entry.
5. Avoid extensive code books - it is easier to recode data
6. Use different sheets sparingly.
Spreadsheet Tips
1. Remember excel is not a relational database - so donot
use the sort option.
2. If using the sort option select all the coloumns before
using sort
3. If you use a formula during data entry make the cell
protected or hidden to avoid inadvertent changes
4. Stick to a case “UPPERCASE” or “lowercase”.
SPSS Tips
1. Never forget to use variable labels. Setting this at
design stage ensures that everyone remembers
what is to be entered.
2. Value labels are your friend - dont use this
sparingly.
3. Ensure that the data - type is chosen appropriately.
Resources
1. Disciplined use of spreadsheets for data entry :
http://www.reading.ac.uk/ssc/resource-packs/ILRI_2006-Nov/GoodStati
sticalPractice/publications/guides/topsde.html
2. Using an Excel data entry form :
https://www.pryor.com/blog/ease-the-pain-of-data-entry-with-an-excel-
forms-template/
3. SPSS data entry tips : https://www.youtube.com/watch?v=N-krh4EaELE
Data Analysis
A Statistical Analysis
Plan (SAP) is the
starting point of
your analysis
Tip
If you are at a loss when
it comes to writing your
SAP write the paper
results - it will help you
to visualize the analysis
plan.
Elements of a SAP
Define the research
hypothesis
Define
the end-
points
Define the
Statistical
methods
Research Hypothesis
1. Derives from the research question
2. Equally important for prospective or
retrospective studies.
3. Helps in choosing the correct endpoints for
the objectives appropriate to the hypothesis.
4. Often helps us to understand our underlying
motivation for the research
Research Question
A question that is designed to address a “perceived”
gap in the current state of knowledge about a
condition.
“I want to know how many new patients are seen by my
colleague instead of me”
“I want to know how many patients survive for 5 years
after coming to me”
PICO(T)
1. Population - To be defined for all studies
2. Intervention - Essential if you want to study the
effect of an intervention
3. Comparison Groups - Essential if you want to
define the benefit of an intervention
4. Outcome - To be defined for all studies
5. Time - Essential if a time to event endpoint is
chosen.
P New Patients presenting to my
hospital
New Patients presenting to my
Hospital
I Undergo a Consultation Treatment given by me
C Colleague or Me -
O Number of patients Survive their disease
(T) Over the last week Till 5 years
See other great examples of PICOs formulated from daily practice questions at PICO examples
provided by the Cochrane Library :
http://learntech.physiol.ox.ac.uk/cochrane_tutorial/cochlibd0e187.php
Always do a
systematic review
after formulating
the PICO
Tip
The Cochrane
Handbook is a great way
to understand the
systematic review
process
http://training.cochrane.
org/handbook
Alpha and Beta
1. Our research question is defined with the perspective of
the population but we can rarely study that.
2. The value of an observation in a representative and
random sample is considered to approximate the
population value.
3. Repeated samples from the same population will likely
yield different results for this value.
4. Alpha and Beta are measures of this uncertainty.
Researcher’s Decision
Reject Null Hypothesis Retain Null Hypothesis
Reality
Null
Hypothesis
is True
Type I Error (probability
of this occurring =
Alpha)
Correct
Null
Hypothesis
is False
Correct Type II Error (Probability of
this occurring is beta)
Ellis, P.D. (2010), “Effect Size FAQs,”: https://effectsizefaq.com/
Resources
1. Hypothesis Testing and statistical Power :
http://my.ilstu.edu/~wjschne/138/Psychology138Lab14.html (with
beautiful animated gifs !!!)
2. Errors in Hypothesis Testing :
http://www.psychstat.missouristate.edu/IntroBook3/sbk20.htm
Descriptive Stats
Before the Analysis
1. Ensure that you make a folder for the data file and take a
backup
2. If analyzing in SPSS ensure that the SPSS viewer file is
saved in the same folder
3. Ensure that the file version is correct if you have used
multiple versions of the same file.
4. Turn off the distractions and turn on some light music.
Describe the data
Always start with descriptives
1. Frequencies for Qualitative Variables
2. Mean and SD for Quantitative Variables.
3. Check for missing values
4. Check for outliers (graphs)
Measures of Central
Tendency
1. Mean : Heavily influenced by atypical values
2. Median: Heavily influenced by ties. Median is
also not amenable to further calculation and
rarely used in statistical procedures.
3. Mode : Also susceptible to ties. But the only
type of central tendency for nominal data.
Measures of Central Tendency
When do we prefer the median?
1. Extreme scores in the distribution
2. Count or ordinal measures
3. Some of the scores are undetermined
In case of skewed data / bimodal distribution it is better to
report the median and the trimmed mean.
Quantiles
● These are measures of variability as well as central
tendency. Each quantile has the same number of
observations.
● Median can be conceptualized as the 50% quantile
● Tertile: Split by 33% (3 parts)
● Quartile : Split by 25% (4 parts)
● Quintile : Split by 20% (5 parts)
● Decile : Split by 10% (10 parts)
Measures of Spread
● Range : Not useful when you have extreme values
● Interquartile Range : Usually reported along with median
- range between 25th - 75th quartile
● Standard deviation and Variance : Useful if the
distribution is symmetric
● 95% confidence interval of mean technically is a
measure of how closely your sample mean approximates
the “unknown” population mean. In case of normal
distribution this corresponds to ±1.96 standard deviation
Box Plot : http://www.physics.csbsju.edu/stats/box2.html
Box Whisker Plots
Data Distribution
1. Binary / Nominal / Ordinal : Frequencies of
categories
2. Continuous Variable:
a. Histogram
b. Cumulative Histogram
c. Quantiles
d. Moments (measures of central tendency & skewness)
3. Skewed data : Nonparametric methods of analysis
(i.e. methods that do not assume that the
distribution is normal).
Density Plots & Histograms
Quick R: Histograms & Density Plots : http://www.statmethods.net/graphs/density.html
Spaghetti Plots
Bar Charts : Best Practices
1. Give the count if your Y axis is in percentages
2. Start the Y axis from 0
3. Try to arrange categories by frequency
4. Use a consistent color scheme - dont use different
colors in the bars unless they represent different
categories.
5. Avoid stacked bar charts unless you want to show
part to whole relationships
6. Space between bars = 1/2 of the bar width
Dot Plots : A better alternative
Bivariate Associations
Missing Values
Missing Completely at Random (MCAR) : Missingness of a value is not
dependant on another variable (e.g. randomly patients forget to answer some
QOL items)
Missing at random (MAR) : Missingness of a value is dependant on another
variable (e.g. patients presenting in late afternoon do not fill QOL forms)
Missing not at random (MNAR) : Missingness depends on a particular
characteristic inherent in the variable (e.g. only patients with poor QOL do not
fill QOL forms).
Missing Values
1. Deletion methods : In this some form of the data is
deleted. Most common approach used in SPSS is listwise
deletion. Alternative is pairwise deletion.
2. Single Imputation: Most common method is mean /
median substitution. Alternatively dummy coding can be
used especially if a categorical variable.
3. Model based Imputation : Multiple imputation and
maximum likelihod based methods.
Missing Values
List wise Deletion Pairwise Deletion
Effect on Sample Size Reduced Mostly remains same
Effect on Power Reduced Mostly remains same
Simplicity Yes Yes
Model comparison Yes No
Bias if MCAR Yes Yes
Single Value Imputation with Mean
/ Median
Single Value Imputation with simple
regression
Resources
1. How to diagnose the missing data mechanism:
http://www.theanalysisfactor.com/missing-data-mechanism/
2. Missing data : Pairwise and Listwise Deletions which to use :
http://www-01.ibm.com/support/docview.wss?uid=swg21475199
3. Missing data and how to deal with it ( A nice presentation) :
https://liberalarts.utexas.edu/prc/_files/cs/Missing-Data.pdf
Inferential Stats
Inferential Statistics
1. Hypothesis Testing
2. Comparing 2 proportions
3. Non Parametric Statistical Tests
4. Correlation
5. Linear Models
Hypothesis testing
1. Formal testing if the null hypothesis is untrue i.e. disprove
the null hypothesis
2. The null hypothesis is equivalent to a straw man - a sham
argument set up to be defeated.
3. The type of “tail” depends on the nature of the alternate
hypothesis
Failure to reject the null hypothesis is not the proof of it’s truth - in
other words absence of evidence is not evidence of it’s absence
Hypothesis testing : Tails
● Bill gates is earning the same $$ per month as
me - H0
● Bill gates is earning less $$ per month than me -
H1
(one tailed)
● The $$ that Bill Gates earns is different from
what I earn - H1
(two tailed)
Classifications of “significant" or “highly significant"
are arbitrary, and treating a P-value between 0.05
and 0.1 as indicating a “trend towards significance"
is bogus. If the P-value is 0.08, for example, the
0.95 confidence interval for the effect includes a
“trend” in the opposite (harmful) direction.
- Harrell & Slaughter (2016)
Comparing
MeansWhich test is to be used for comparing means
T Test
1. Basically independent sample T - test tests the null
hypothesis that the two samples are coming from two
populations whose means are same.
2. The paired T test tests the special null hypothesis that
the difference between two related means is 0.
Requirements
● Data needs to be quantitative
● It is obtained from a simple random sample*
● Data is normally distributed
● Variances of the two samples need to be same.
Comparing Proportions
1. Chi Square test:
a. Compare dichotomous outcomes in 2 groups
b. 2 x 2 contingency tables
c. Unreliable if count in one cell < 5
d. Yates continuity correction required if cell frequency < 10
2. Fisher’s exact test
a. Exact test as exact p value calculated - not approximate from chi
square table - also more conservative estimate
b. Can do larger contingency tables
c. More computationally intensive
d. Does not have a quantity analogous to the Chi Square statistic
Odds Ratio
1. Measure of association between an outcome and exposure
2. Ratio of odds of the outcome in exposed to the odds of the outcome in non
exposed.
3. Can be easily obtained from a 2 x 2 contingency table.
Dead Alive
RT 10 100
No RT 5 10
Risk Ratio
1. Another measure of relative effect size
2. Ratio of risk of outcome in exposed to the risk of outcome in non exposed.
3. Can be easily obtained from a 2 x 2 contingency table.
Dead Alive
RT 10 100
No RT 5 10
Odds vs Risk
1. Odds is the ratio of the probability of an event occurring to
that of not occurring - in this case odds of dying in the RT
group is
2. Risk is the probability of an event occurring - in this case the
risk of dying in the RT group is 10/110.
Dead Alive
RT 10 100
No RT 5 10
Why Odds Ratio
1. Risk ratios are easier to interpret but applicable to a
limited range of prognoses - e.g. a risk factor that
doubles the risk of developing lung cancer cannot
apply to a patient whose baseline risk is 0.5.
2. It reduces the effect size in large studies as
compared to risk ratios - more conservative.
3. Confidence intervals of ORs can be calculated
Non Parametric Methods
1. Actually better than parametric alternatives as they do not need checking of
distributional assumptions
2. Response variable can be interval / ordinal - do not need any
transformations to account for non normal distributions and can handle
extreme values better
3. Being less susceptible to extreme values these are considered more robust
Nonparametric test alternatives
1. One Sample T test - Wilcoxon Signed Rank test
2. Two sample T test - Wilcoxon 2-sample Signed Rank Test
(Mann Whitney test)
3. ANOVA - Kruskal Wallis Test
4. Pearson test for Correlation - Spearman rho test
Correlation
1. A method to examine the association between a
continuous predictor and a continuous outcome.
2. A correlation coefficient can range between -1 to
+1 and measures the strength of association as
well as the direction.
3. Scatterplots are a graphical method for
evaluating correlation.
Pearson’s Correlation
1. Requires linear relationship between the two variables.
2. Requires that the variables be normally distributed - ideally bivariate
normality.
3. Outliers have a big impact on the correlation.
Spearman’s Correlation
1. The non parametric alternative - does not require the distribution of
variables to be normal.
2. Does not assume a linear relationship but a monotonic relationship
3. Is not affected as much by outliers
4. Quite easy to get completely opposite results with Spearman’s correlation
Correlation & Causation
Strength Major confounding factors may result in strong correlation
Consistency Assumes that causal factors are evenly distributed in population
Specificity No reason why a risk factor should be specific for a outcome
Temporality Directionality may not always imply causation e.g. Depression & Cancer
Biological Gradient Only true for events where there is a dose response gradient
Plausibility Depends on state of current scientific knowledge
Coherence Depends on quality of additional available information
Experimental Evidence Interventional research may not be always feasible
Analogy A subjective judgement
Correlation & Agreement
1. High correlation may not indicate agreement
e.g. 2 methods to measure height may be
correlated but give different measurements
2. A change in scale does not affect correlation
e.g. if one method measured height 2 x other
method correlation would still be strong
Linear Model
Y = a + βc
As you may remember the equation for a line.
The job of regression is to find a and β so that any value
of c can be used to predict Y
A statistical method to predict a variable is a model.
A Linear regression is a OLS fit
Linear Regression
Linear Regression : Assumptions
1. The 2 assumptions for correlation hold true - linear relationship & absence of
outliers
2. In addition residuals should be normally distributed
3. Homoscedasticity should be present
4. Observations should be independent - no autocorrelation
5. Multi-collinearity should be absent
Homoskedasticity
1. Plot the predictor variable against the linear
regression line
2. If the variables are distributed in a manner that
they are equidistant along the line
3. Essentially means that predictor variables values
have the same variance across the values of the
predictor variable
4. Practically determined from residuals
Residuals
1. Nothing but the difference between the
observed value of the outcome variable
and the predicted value from the model.
2. In other words it is a measure of the error
/ disagreement for the model predictions.
3. Plot of residuals vs the predicted value
should give a nearly straight line if there is
homoskedasticity
Alternatives to Linear Regression
Logistic regression : If your outcome variable in binary categorical (e.g. death /
alive)
Ordinal regression : Ordinal categorical data
Poisson regression : If you have count data
If a non linear relationship exists then a non linear regression model - alternative
use transformation of the outcome variable or use segmented regression
What about survival ?
This is a special regression problem where the outcome is the time survived.
Both linear and nonlinear methods are available.
Parametric and nonparametric tests are available.
A key point : These methods are required ONLY if all potential events have not
occurred in the time frame of observation - or all patients have not died.
N.B. These methods are applicable to any time to event end points
Defining the Time
Needs a baseline date from which observation starts - ideally time when exposure
starts - possible to know very rarely
In case of RCTs - classically the date of randomization
In retrospective studies - date of registration / date of diagnosis
IF patient has event then the date / time of the event is noted else the date / time
of last FU is noted. - Note logically it should be larger than 0.
The Censoring Problem
The censoring problem arises as all events do not occur in the observation time
frame (i.e. patients remain alive )
We do not know for sure that the remaining sample is not at risk for having the
event afterwards.
In absence of censoring you get an artificially inflated survival figure.
Right censoring is when the subject does not have the event before the time
observation ends. Left when the patient has event prior to study time.
Hazard
The effect size estimator obtained from survival methods - can be considered as
the risk of developing the event.
Hazard rate is the instantaneous probability of the occurrence of the event. It
ignores the accumulation of hazard uptil that time point
Hazard ratio is the ratio of hazard rates in two groups
Cumulative Hazard is the integration of the Hazard rate over a given interval of
time.
Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
The Kaplan Meier Estimate
Time Death
1 Yes
2 No
3 No
4 Yes
5 No
10 Yes
12 No
Interval Entered Deaths Censored Alive S Prob
0 - 1 7 1 0 6 6/7 86%
1 - 4 6 1 2 3 3/4* (3/4*6/7) 64%
4 -10 3 1 1 1 1/2* (1/2*3/4*6/7) = 31%
*censored individuals are removed from the denominator
The Kaplan Meier Estimate
Comparisons
The Kaplan Meier method can allow you to compare the survival among groups of
patients.
While the effect size is important and can be conceptualized as the risk ratio or
the hazard ratio we can test for the null hypothesis that the survival curves are
equal
The commonest is the Log Rank test
Log Rank Test
Calculates the observed number of deaths in each group at each time point where
there is a event and the number expected if there was no difference between the
groups.
E.g. 2 groups of 20 patients each & 1 death in 6 months - the expected number of
deaths in each group would be (1/40)*20 or 0.5 (note this is the number not %).
This process is repeated for all the time points where there is a event & total
number of observed and expected deaths in groups calculated - then a simple Chi -
Square test is used to determine if the difference is more than 0.
Alternatives
Since the log rank test gives equal weightage to all time points some alternatives
are available - e.g. Breslow which gives a weightage depending on the number of
cases at risk at each time point.
Breslow test is better when you have more deaths at the start of the KM curve
and misleading when you have more censoring --- best stick to the log rank
Assumptions for KM estimator
1. Patients who are censored have the same survival prospect as those who are
followed up
2. Survival for patients who present earlier is same as that of the patients
presenting later
However Kaplan Meier method is a nonparametric estimator which implies that
the estimate does not depend on the shape of the survival function.
The Cox Regression
1. Allows multivariable regression modelling for survival.
2. Unlike Kaplan meier allow continuous predictor variables
3. Is one of the most (ab)-used survival analysis techniques
4. Can be used to generate a predictive model
5. Ideal sample ? - 20 x predictors = Number of Events
Cox Regression
Cox Regression
Cox Regression: Output
Cox Regression : Output
Cox Regression: Graphs
Cox Regression : Assumptions
1. The proportional hazards assumption should be fulfilled - i.e. the hazard
function for the two strata should remain proportional.
2. Censoring should be non-informative i.e. censoring of one person should not
influence the outcome of another
3. There is a linear relationship between the log of the hazard and the
covariates
4. Overtly influential data (outliers) should not be present
There are diagnostic methods available for each of the above.
How to check for Proportional
hazards
1. If the predictor variable is categorical KM curves
can be generated and we can see if the lines
maintain the same separate.
2. Alternatively you can generate Schoenfeld
residuals in SPSS and plot these residuals against
the time for each covariate.
How to check for Proportional hazards
Cox Regression : Advantages
1. It is a semi-parametric model and is less affected by outliers.
2. Unlike parametric survival models does not require correct specification of
the underlying distribution
3. Lot of diagnostic procedures
However does not give baseline hazard which makes predictive modelling
difficult
What not do while modelling (regression)
1. Do not work with sample sizes that are clearly inadequate
2. Do not use univariate selection
3. Do not use stepwise forward / backward selection methods
4. Do not blindly assume linearity / proportional hazards - always understand
the underlying assumptions as well as the correct checks for the same
5. Read about residuals before jumping into regression
6. Don’t use split sample validation - instead use cross validation or
bootstrapping
DON’T FALL IN LOVE WITH YOUR MODEL
Resources
SAS Seminar: Introduction to Survival Analysis in SAS [Internet]. [cited 2016 Sep 9]. Available from:
http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
SPSS Library: Understanding contrasts [Internet]. [cited 2016 Sep 9]. Available from:
http://www.ats.ucla.edu/stat/spss/library/contrast.htm
Bian H. Survival Analysis Using SPSS. Available from:
http://core.ecu.edu/ofe/StatisticsResearch/Survival%20Analysis%20Using%20SPSS.pdf
Bland JM, Altman DG. The logrank test. BMJ. 2004 May 1;328(7447):1073.
Practical recommendations for statistical analysis and data presentation in Biochemia Medica journal | Biochemia Medica
[Internet]. [cited 2016 Sep 8]. Available from: http://www.biochemia-medica.com/2012/22/15
Manikandan S. Measures of dispersion. J Pharmacol Pharmacother. 2011 Oct;2(4):315–6.
Manikandan S. Measures of central tendency: Median and mode. J Pharmacol Pharmacother. 2011 Jul;2(3):214–5.
Utley M, Gallivan S, Young A, Cox N, Davies P, Dixey J, et al. Potential bias in Kaplan–Meier survival analysis applied to
rheumatology drug studies. Rheumatology. 2000 Jan 1;39(1):1–2.
Refresher in statistics and analysis skill

More Related Content

What's hot

Creating research questions
Creating research questionsCreating research questions
Creating research questionsshanburger
 
Systematic review ppt
Systematic review pptSystematic review ppt
Systematic review pptBasil Asay
 
Sample Size Estimation
Sample Size EstimationSample Size Estimation
Sample Size EstimationNayyar Kazmi
 
methods of data collection ppt 2.pptx
methods of data collection ppt 2.pptxmethods of data collection ppt 2.pptx
methods of data collection ppt 2.pptxDiksha Vashisht
 
Quantitative data 2
Quantitative data 2Quantitative data 2
Quantitative data 2Illi Elas
 
Introduction to Systematic Reviews
Introduction to Systematic ReviewsIntroduction to Systematic Reviews
Introduction to Systematic ReviewsLaura Koltutsky
 
Seminaar on meta analysis
Seminaar on meta analysisSeminaar on meta analysis
Seminaar on meta analysisPreeti Rai
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAiden Yeh
 
HEALTHCARE RESEARCH METHODS: the Research Question, the team and the study ap...
HEALTHCARE RESEARCH METHODS: the Research Question, the team and the study ap...HEALTHCARE RESEARCH METHODS: the Research Question, the team and the study ap...
HEALTHCARE RESEARCH METHODS: the Research Question, the team and the study ap...Dr. Khaled OUANES
 
Data analysis
Data analysisData analysis
Data analysisneha147
 
Data analysis presentation by Jameel Ahmed Qureshi
Data analysis presentation by Jameel Ahmed QureshiData analysis presentation by Jameel Ahmed Qureshi
Data analysis presentation by Jameel Ahmed QureshiJameel Ahmed Qureshi
 
Research Method
Research MethodResearch Method
Research MethodYeonYuRae
 

What's hot (20)

Creating research questions
Creating research questionsCreating research questions
Creating research questions
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Systematic review ppt
Systematic review pptSystematic review ppt
Systematic review ppt
 
Sample Size Estimation
Sample Size EstimationSample Size Estimation
Sample Size Estimation
 
methods of data collection ppt 2.pptx
methods of data collection ppt 2.pptxmethods of data collection ppt 2.pptx
methods of data collection ppt 2.pptx
 
Research methods
Research methods Research methods
Research methods
 
Lecture 2 study desig
Lecture 2 study desigLecture 2 study desig
Lecture 2 study desig
 
Quantitative data 2
Quantitative data 2Quantitative data 2
Quantitative data 2
 
Introduction to Systematic Reviews
Introduction to Systematic ReviewsIntroduction to Systematic Reviews
Introduction to Systematic Reviews
 
Seminaar on meta analysis
Seminaar on meta analysisSeminaar on meta analysis
Seminaar on meta analysis
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
Data analysis copy
Data analysis   copyData analysis   copy
Data analysis copy
 
LEAD 901 Chapter 8
LEAD 901 Chapter 8LEAD 901 Chapter 8
LEAD 901 Chapter 8
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
HEALTHCARE RESEARCH METHODS: the Research Question, the team and the study ap...
HEALTHCARE RESEARCH METHODS: the Research Question, the team and the study ap...HEALTHCARE RESEARCH METHODS: the Research Question, the team and the study ap...
HEALTHCARE RESEARCH METHODS: the Research Question, the team and the study ap...
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 
Research design
Research designResearch design
Research design
 
Data analysis
Data analysisData analysis
Data analysis
 
Data analysis presentation by Jameel Ahmed Qureshi
Data analysis presentation by Jameel Ahmed QureshiData analysis presentation by Jameel Ahmed Qureshi
Data analysis presentation by Jameel Ahmed Qureshi
 
Research Method
Research MethodResearch Method
Research Method
 

Viewers also liked

Beam Modification in Radiotherapy
Beam Modification in RadiotherapyBeam Modification in Radiotherapy
Beam Modification in RadiotherapySantam Chakraborty
 
New Techniques in Radiotherapy
New Techniques in RadiotherapyNew Techniques in Radiotherapy
New Techniques in RadiotherapySantam Chakraborty
 
Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...
Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...
Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...Santam Chakraborty
 
To use or not to use the LQ model at “high” radiation doses
To use or not to use the LQ model at “high” radiation doses To use or not to use the LQ model at “high” radiation doses
To use or not to use the LQ model at “high” radiation doses Santam Chakraborty
 
Evolving Role of Radiation Therapy in Hodgkins Disease
Evolving Role of Radiation Therapy in Hodgkins DiseaseEvolving Role of Radiation Therapy in Hodgkins Disease
Evolving Role of Radiation Therapy in Hodgkins DiseaseSantam Chakraborty
 
Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...
Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...
Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...Santam Chakraborty
 
Beam Directed Radiotherapy - methods and principles
Beam Directed Radiotherapy - methods and principlesBeam Directed Radiotherapy - methods and principles
Beam Directed Radiotherapy - methods and principlesSantam Chakraborty
 
Isodose curves RADIATION ONCOLOGY
Isodose curves RADIATION ONCOLOGYIsodose curves RADIATION ONCOLOGY
Isodose curves RADIATION ONCOLOGYPaul George
 
Radiation therapy and Types of Radiation therapy
Radiation therapy and Types of Radiation therapyRadiation therapy and Types of Radiation therapy
Radiation therapy and Types of Radiation therapySembian Nandagopal
 
Interaction of Radiation with Matter
Interaction of Radiation with MatterInteraction of Radiation with Matter
Interaction of Radiation with MatterSantam Chakraborty
 
R Project Website
R Project WebsiteR Project Website
R Project Websiteishka
 
contents of marketing research
contents of marketing researchcontents of marketing research
contents of marketing researchNishan Suprith
 
Chemotherapy for Hodgkins disease
Chemotherapy for Hodgkins diseaseChemotherapy for Hodgkins disease
Chemotherapy for Hodgkins diseaseSantam Chakraborty
 
Maths final compilation Statistic project
Maths final compilation Statistic projectMaths final compilation Statistic project
Maths final compilation Statistic projectchristinelee1996
 
statistic project for college level
statistic project for college levelstatistic project for college level
statistic project for college levelJohn Douglas
 
LDR and HDR Brachytherapy: A Primer for non radiation oncologists
LDR and HDR Brachytherapy: A Primer for non radiation oncologistsLDR and HDR Brachytherapy: A Primer for non radiation oncologists
LDR and HDR Brachytherapy: A Primer for non radiation oncologistsSantam Chakraborty
 
Fractionated radiation and dose rate effect
Fractionated radiation and dose rate effectFractionated radiation and dose rate effect
Fractionated radiation and dose rate effectParag Roy
 

Viewers also liked (20)

Beam Modification in Radiotherapy
Beam Modification in RadiotherapyBeam Modification in Radiotherapy
Beam Modification in Radiotherapy
 
New Techniques in Radiotherapy
New Techniques in RadiotherapyNew Techniques in Radiotherapy
New Techniques in Radiotherapy
 
Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...
Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...
Concurrent Chemoradiation in Postoperative Setting In LAHNC. A comparision of...
 
To use or not to use the LQ model at “high” radiation doses
To use or not to use the LQ model at “high” radiation doses To use or not to use the LQ model at “high” radiation doses
To use or not to use the LQ model at “high” radiation doses
 
How to register at Isocentre
How to register at IsocentreHow to register at Isocentre
How to register at Isocentre
 
Evolving Role of Radiation Therapy in Hodgkins Disease
Evolving Role of Radiation Therapy in Hodgkins DiseaseEvolving Role of Radiation Therapy in Hodgkins Disease
Evolving Role of Radiation Therapy in Hodgkins Disease
 
Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...
Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...
Induction chemotherapy followed by concurrent ct rt versus ct-rt in advanced ...
 
Beam Directed Radiotherapy - methods and principles
Beam Directed Radiotherapy - methods and principlesBeam Directed Radiotherapy - methods and principles
Beam Directed Radiotherapy - methods and principles
 
Isodose curves RADIATION ONCOLOGY
Isodose curves RADIATION ONCOLOGYIsodose curves RADIATION ONCOLOGY
Isodose curves RADIATION ONCOLOGY
 
Radiation therapy and Types of Radiation therapy
Radiation therapy and Types of Radiation therapyRadiation therapy and Types of Radiation therapy
Radiation therapy and Types of Radiation therapy
 
Interaction of Radiation with Matter
Interaction of Radiation with MatterInteraction of Radiation with Matter
Interaction of Radiation with Matter
 
R Project Website
R Project WebsiteR Project Website
R Project Website
 
contents of marketing research
contents of marketing researchcontents of marketing research
contents of marketing research
 
Chemotherapy for Hodgkins disease
Chemotherapy for Hodgkins diseaseChemotherapy for Hodgkins disease
Chemotherapy for Hodgkins disease
 
Maths final compilation Statistic project
Maths final compilation Statistic projectMaths final compilation Statistic project
Maths final compilation Statistic project
 
Evolution of radiation 2012
Evolution of radiation 2012Evolution of radiation 2012
Evolution of radiation 2012
 
statistic project for college level
statistic project for college levelstatistic project for college level
statistic project for college level
 
LDR and HDR Brachytherapy: A Primer for non radiation oncologists
LDR and HDR Brachytherapy: A Primer for non radiation oncologistsLDR and HDR Brachytherapy: A Primer for non radiation oncologists
LDR and HDR Brachytherapy: A Primer for non radiation oncologists
 
Fractionated radiation and dose rate effect
Fractionated radiation and dose rate effectFractionated radiation and dose rate effect
Fractionated radiation and dose rate effect
 
Radiation
RadiationRadiation
Radiation
 

Similar to Refresher in statistics and analysis skill

Statistics for DP Biology IA
Statistics for DP Biology IAStatistics for DP Biology IA
Statistics for DP Biology IAVeronika Garga
 
Statistics for IB Biology
Statistics for IB BiologyStatistics for IB Biology
Statistics for IB BiologyEran Earland
 
LAB 2 Descriptive Statistics1Descriptive statisti.docx
LAB 2 Descriptive Statistics1Descriptive statisti.docxLAB 2 Descriptive Statistics1Descriptive statisti.docx
LAB 2 Descriptive Statistics1Descriptive statisti.docxDIPESH30
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Sherri Gunder
 
Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingSOMASUNDARAM T
 
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxChapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxketurahhazelhurst
 
Need a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxNeed a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxlea6nklmattu
 
Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell COUNTDOWN on NTDs
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxswapnaraghav
 
Clinical case studies and SPSS
Clinical case studies and SPSSClinical case studies and SPSS
Clinical case studies and SPSSAmit Sharma
 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionalsNadir Mehmood
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatMarwa Zalat
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 
Will Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docx
Will Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docxWill Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docx
Will Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docxadolphoyonker
 
Selecting a sample: Writing Skill
Selecting a sample: Writing Skill Selecting a sample: Writing Skill
Selecting a sample: Writing Skill Kum Visal
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysisILRI-Jmaru
 

Similar to Refresher in statistics and analysis skill (20)

Statistics for DP Biology IA
Statistics for DP Biology IAStatistics for DP Biology IA
Statistics for DP Biology IA
 
Statistics for IB Biology
Statistics for IB BiologyStatistics for IB Biology
Statistics for IB Biology
 
LAB 2 Descriptive Statistics1Descriptive statisti.docx
LAB 2 Descriptive Statistics1Descriptive statisti.docxLAB 2 Descriptive Statistics1Descriptive statisti.docx
LAB 2 Descriptive Statistics1Descriptive statisti.docx
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report Writing
 
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxChapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
 
Need a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxNeed a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docx
 
Sampling
SamplingSampling
Sampling
 
SPSS FINAL.pdf
SPSS FINAL.pdfSPSS FINAL.pdf
SPSS FINAL.pdf
 
Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell
 
How to prepare a thesis
How to prepare a thesisHow to prepare a thesis
How to prepare a thesis
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
 
Clinical case studies and SPSS
Clinical case studies and SPSSClinical case studies and SPSS
Clinical case studies and SPSS
 
Audit and stat for medical professionals
Audit and stat for medical professionalsAudit and stat for medical professionals
Audit and stat for medical professionals
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Will Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docx
Will Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docxWill Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docx
Will Emerson pt EmpressedBP-15660O2 – 92 on 2LRR- 34.docx
 
Selecting a sample: Writing Skill
Selecting a sample: Writing Skill Selecting a sample: Writing Skill
Selecting a sample: Writing Skill
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
 
Qaulity tools
Qaulity toolsQaulity tools
Qaulity tools
 

More from Santam Chakraborty

More from Santam Chakraborty (18)

Adjuvant radiation based on genomic risk factors emerging scenarios
Adjuvant radiation based on genomic risk factors   emerging scenariosAdjuvant radiation based on genomic risk factors   emerging scenarios
Adjuvant radiation based on genomic risk factors emerging scenarios
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
 
IGRT in lung cancer
IGRT in lung cancerIGRT in lung cancer
IGRT in lung cancer
 
Hormone Resistant Prostate Cancer
Hormone Resistant Prostate CancerHormone Resistant Prostate Cancer
Hormone Resistant Prostate Cancer
 
How to upload presentation
How to upload presentationHow to upload presentation
How to upload presentation
 
Isocentre Help Forum
Isocentre Help   ForumIsocentre Help   Forum
Isocentre Help Forum
 
Isocentre Help Forum
Isocentre Help   ForumIsocentre Help   Forum
Isocentre Help Forum
 
Isocentre Help Edit Page
Isocentre Help   Edit PageIsocentre Help   Edit Page
Isocentre Help Edit Page
 
Isocentre How to Create a Page
Isocentre How to Create a PageIsocentre How to Create a Page
Isocentre How to Create a Page
 
Helical Tomotherapy
Helical TomotherapyHelical Tomotherapy
Helical Tomotherapy
 
IMRT and 3D CRT in cervical Cancers
IMRT and 3D CRT in cervical CancersIMRT and 3D CRT in cervical Cancers
IMRT and 3D CRT in cervical Cancers
 
Radiation Protection
Radiation ProtectionRadiation Protection
Radiation Protection
 
Using Styles In Bibus
Using Styles In BibusUsing Styles In Bibus
Using Styles In Bibus
 
Hormonal treatment of breast cancer
Hormonal treatment of breast cancerHormonal treatment of breast cancer
Hormonal treatment of breast cancer
 
Management of Gastrointestinal Lymphomas
Management of Gastrointestinal LymphomasManagement of Gastrointestinal Lymphomas
Management of Gastrointestinal Lymphomas
 
Medulloblastomas
MedulloblastomasMedulloblastomas
Medulloblastomas
 
Management of Wilms Tumors
Management of Wilms TumorsManagement of Wilms Tumors
Management of Wilms Tumors
 
Small Cell Carcinoma of Lung
Small Cell Carcinoma of LungSmall Cell Carcinoma of Lung
Small Cell Carcinoma of Lung
 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 

Recently uploaded (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 

Refresher in statistics and analysis skill

  • 1. Refresher in Statistics and Analysis Skills Dr Santam Chakraborty
  • 2. Statistics - A subject which most statisticians find difficult but in which nearly all physicians are expert. - Stephen Senn, Statistical Issues in Drug Development
  • 3. What you will find in this presentation ● Only 1 calculation ● Only 1 formula ● Lots of Cartoons & Quotes !!!
  • 5. Cartoon Number 731 xkcd.com by Randall Munroe
  • 8. Key Points ● Converting quantitative data to qualitative data is not advisable as it leads to data loss. ● QoL data is always qualitative but analyzed often as quantitative data ● Most medical researchers gather both qualitative & quantitative data but disregard qualitative data
  • 9. Types of Measurements BINARY NOMINAL ORDINAL COUNT CONTINOUS
  • 12. Collecting Data ● This is the most neglected yet most vital part of the process. ● A structured way to collect data - Form ● Data collection instruments : ○ Surveys ○ Interviews ○ Focus Groups
  • 13. Form Design Principles ● Be consistent in choice of font and layout ● Use checkboxes instead of allowing people to circle answers. ● Provide visual cues to the format of data required. ● Instructions should be given in bold and italics ● Specify units of measurement and decimal places ● Use skips sparingly and clearly indicate locations ● Use precoded answer (e.g. Male / Female)
  • 14.
  • 15.
  • 18. Databases : Advantages ● Allow multi-user access ● Respect data integrity ● Allow data validation ● Avoid data redundancy ● Allow flexible and customized queries
  • 19. Databases : Disadvantages ● More difficult to learn ● May require an understanding of networking related concepts ● Software maintenance and updates are an issue. ● Have a clear idea of the information that needs to be included. ● Form design is required.
  • 20.
  • 21. Spreadsheet Tips 1. Header row should be in the first row only. Don't make fancy 2/3 row headers. 2. Set the locale to UK / India if you are planning to use DD/MM/YYYY as the date scheme 3. Freeze the first row and first column to ease data entry. 4. Use conditional formatting to pick up mistakes while doing data-entry. 5. Avoid extensive code books - it is easier to recode data 6. Use different sheets sparingly.
  • 22. Spreadsheet Tips 1. Remember excel is not a relational database - so donot use the sort option. 2. If using the sort option select all the coloumns before using sort 3. If you use a formula during data entry make the cell protected or hidden to avoid inadvertent changes 4. Stick to a case “UPPERCASE” or “lowercase”.
  • 23. SPSS Tips 1. Never forget to use variable labels. Setting this at design stage ensures that everyone remembers what is to be entered. 2. Value labels are your friend - dont use this sparingly. 3. Ensure that the data - type is chosen appropriately.
  • 24. Resources 1. Disciplined use of spreadsheets for data entry : http://www.reading.ac.uk/ssc/resource-packs/ILRI_2006-Nov/GoodStati sticalPractice/publications/guides/topsde.html 2. Using an Excel data entry form : https://www.pryor.com/blog/ease-the-pain-of-data-entry-with-an-excel- forms-template/ 3. SPSS data entry tips : https://www.youtube.com/watch?v=N-krh4EaELE
  • 26. A Statistical Analysis Plan (SAP) is the starting point of your analysis Tip If you are at a loss when it comes to writing your SAP write the paper results - it will help you to visualize the analysis plan.
  • 27. Elements of a SAP Define the research hypothesis Define the end- points Define the Statistical methods
  • 28. Research Hypothesis 1. Derives from the research question 2. Equally important for prospective or retrospective studies. 3. Helps in choosing the correct endpoints for the objectives appropriate to the hypothesis. 4. Often helps us to understand our underlying motivation for the research
  • 29. Research Question A question that is designed to address a “perceived” gap in the current state of knowledge about a condition. “I want to know how many new patients are seen by my colleague instead of me” “I want to know how many patients survive for 5 years after coming to me”
  • 30. PICO(T) 1. Population - To be defined for all studies 2. Intervention - Essential if you want to study the effect of an intervention 3. Comparison Groups - Essential if you want to define the benefit of an intervention 4. Outcome - To be defined for all studies 5. Time - Essential if a time to event endpoint is chosen.
  • 31. P New Patients presenting to my hospital New Patients presenting to my Hospital I Undergo a Consultation Treatment given by me C Colleague or Me - O Number of patients Survive their disease (T) Over the last week Till 5 years See other great examples of PICOs formulated from daily practice questions at PICO examples provided by the Cochrane Library : http://learntech.physiol.ox.ac.uk/cochrane_tutorial/cochlibd0e187.php
  • 32. Always do a systematic review after formulating the PICO Tip The Cochrane Handbook is a great way to understand the systematic review process http://training.cochrane. org/handbook
  • 33. Alpha and Beta 1. Our research question is defined with the perspective of the population but we can rarely study that. 2. The value of an observation in a representative and random sample is considered to approximate the population value. 3. Repeated samples from the same population will likely yield different results for this value. 4. Alpha and Beta are measures of this uncertainty.
  • 34. Researcher’s Decision Reject Null Hypothesis Retain Null Hypothesis Reality Null Hypothesis is True Type I Error (probability of this occurring = Alpha) Correct Null Hypothesis is False Correct Type II Error (Probability of this occurring is beta)
  • 35. Ellis, P.D. (2010), “Effect Size FAQs,”: https://effectsizefaq.com/
  • 36. Resources 1. Hypothesis Testing and statistical Power : http://my.ilstu.edu/~wjschne/138/Psychology138Lab14.html (with beautiful animated gifs !!!) 2. Errors in Hypothesis Testing : http://www.psychstat.missouristate.edu/IntroBook3/sbk20.htm
  • 38. Before the Analysis 1. Ensure that you make a folder for the data file and take a backup 2. If analyzing in SPSS ensure that the SPSS viewer file is saved in the same folder 3. Ensure that the file version is correct if you have used multiple versions of the same file. 4. Turn off the distractions and turn on some light music.
  • 39. Describe the data Always start with descriptives 1. Frequencies for Qualitative Variables 2. Mean and SD for Quantitative Variables. 3. Check for missing values 4. Check for outliers (graphs)
  • 40. Measures of Central Tendency 1. Mean : Heavily influenced by atypical values 2. Median: Heavily influenced by ties. Median is also not amenable to further calculation and rarely used in statistical procedures. 3. Mode : Also susceptible to ties. But the only type of central tendency for nominal data.
  • 41. Measures of Central Tendency When do we prefer the median? 1. Extreme scores in the distribution 2. Count or ordinal measures 3. Some of the scores are undetermined In case of skewed data / bimodal distribution it is better to report the median and the trimmed mean.
  • 42. Quantiles ● These are measures of variability as well as central tendency. Each quantile has the same number of observations. ● Median can be conceptualized as the 50% quantile ● Tertile: Split by 33% (3 parts) ● Quartile : Split by 25% (4 parts) ● Quintile : Split by 20% (5 parts) ● Decile : Split by 10% (10 parts)
  • 43. Measures of Spread ● Range : Not useful when you have extreme values ● Interquartile Range : Usually reported along with median - range between 25th - 75th quartile ● Standard deviation and Variance : Useful if the distribution is symmetric ● 95% confidence interval of mean technically is a measure of how closely your sample mean approximates the “unknown” population mean. In case of normal distribution this corresponds to ±1.96 standard deviation
  • 44. Box Plot : http://www.physics.csbsju.edu/stats/box2.html Box Whisker Plots
  • 45. Data Distribution 1. Binary / Nominal / Ordinal : Frequencies of categories 2. Continuous Variable: a. Histogram b. Cumulative Histogram c. Quantiles d. Moments (measures of central tendency & skewness) 3. Skewed data : Nonparametric methods of analysis (i.e. methods that do not assume that the distribution is normal).
  • 46. Density Plots & Histograms Quick R: Histograms & Density Plots : http://www.statmethods.net/graphs/density.html
  • 48. Bar Charts : Best Practices 1. Give the count if your Y axis is in percentages 2. Start the Y axis from 0 3. Try to arrange categories by frequency 4. Use a consistent color scheme - dont use different colors in the bars unless they represent different categories. 5. Avoid stacked bar charts unless you want to show part to whole relationships 6. Space between bars = 1/2 of the bar width
  • 49. Dot Plots : A better alternative
  • 51. Missing Values Missing Completely at Random (MCAR) : Missingness of a value is not dependant on another variable (e.g. randomly patients forget to answer some QOL items) Missing at random (MAR) : Missingness of a value is dependant on another variable (e.g. patients presenting in late afternoon do not fill QOL forms) Missing not at random (MNAR) : Missingness depends on a particular characteristic inherent in the variable (e.g. only patients with poor QOL do not fill QOL forms).
  • 52. Missing Values 1. Deletion methods : In this some form of the data is deleted. Most common approach used in SPSS is listwise deletion. Alternative is pairwise deletion. 2. Single Imputation: Most common method is mean / median substitution. Alternatively dummy coding can be used especially if a categorical variable. 3. Model based Imputation : Multiple imputation and maximum likelihod based methods.
  • 53. Missing Values List wise Deletion Pairwise Deletion Effect on Sample Size Reduced Mostly remains same Effect on Power Reduced Mostly remains same Simplicity Yes Yes Model comparison Yes No Bias if MCAR Yes Yes
  • 54. Single Value Imputation with Mean / Median Single Value Imputation with simple regression
  • 55. Resources 1. How to diagnose the missing data mechanism: http://www.theanalysisfactor.com/missing-data-mechanism/ 2. Missing data : Pairwise and Listwise Deletions which to use : http://www-01.ibm.com/support/docview.wss?uid=swg21475199 3. Missing data and how to deal with it ( A nice presentation) : https://liberalarts.utexas.edu/prc/_files/cs/Missing-Data.pdf
  • 57. Inferential Statistics 1. Hypothesis Testing 2. Comparing 2 proportions 3. Non Parametric Statistical Tests 4. Correlation 5. Linear Models
  • 58. Hypothesis testing 1. Formal testing if the null hypothesis is untrue i.e. disprove the null hypothesis 2. The null hypothesis is equivalent to a straw man - a sham argument set up to be defeated. 3. The type of “tail” depends on the nature of the alternate hypothesis Failure to reject the null hypothesis is not the proof of it’s truth - in other words absence of evidence is not evidence of it’s absence
  • 59. Hypothesis testing : Tails ● Bill gates is earning the same $$ per month as me - H0 ● Bill gates is earning less $$ per month than me - H1 (one tailed) ● The $$ that Bill Gates earns is different from what I earn - H1 (two tailed)
  • 60. Classifications of “significant" or “highly significant" are arbitrary, and treating a P-value between 0.05 and 0.1 as indicating a “trend towards significance" is bogus. If the P-value is 0.08, for example, the 0.95 confidence interval for the effect includes a “trend” in the opposite (harmful) direction. - Harrell & Slaughter (2016)
  • 61. Comparing MeansWhich test is to be used for comparing means
  • 62. T Test 1. Basically independent sample T - test tests the null hypothesis that the two samples are coming from two populations whose means are same. 2. The paired T test tests the special null hypothesis that the difference between two related means is 0.
  • 63. Requirements ● Data needs to be quantitative ● It is obtained from a simple random sample* ● Data is normally distributed ● Variances of the two samples need to be same.
  • 64. Comparing Proportions 1. Chi Square test: a. Compare dichotomous outcomes in 2 groups b. 2 x 2 contingency tables c. Unreliable if count in one cell < 5 d. Yates continuity correction required if cell frequency < 10 2. Fisher’s exact test a. Exact test as exact p value calculated - not approximate from chi square table - also more conservative estimate b. Can do larger contingency tables c. More computationally intensive d. Does not have a quantity analogous to the Chi Square statistic
  • 65. Odds Ratio 1. Measure of association between an outcome and exposure 2. Ratio of odds of the outcome in exposed to the odds of the outcome in non exposed. 3. Can be easily obtained from a 2 x 2 contingency table. Dead Alive RT 10 100 No RT 5 10
  • 66. Risk Ratio 1. Another measure of relative effect size 2. Ratio of risk of outcome in exposed to the risk of outcome in non exposed. 3. Can be easily obtained from a 2 x 2 contingency table. Dead Alive RT 10 100 No RT 5 10
  • 67. Odds vs Risk 1. Odds is the ratio of the probability of an event occurring to that of not occurring - in this case odds of dying in the RT group is 2. Risk is the probability of an event occurring - in this case the risk of dying in the RT group is 10/110. Dead Alive RT 10 100 No RT 5 10
  • 68. Why Odds Ratio 1. Risk ratios are easier to interpret but applicable to a limited range of prognoses - e.g. a risk factor that doubles the risk of developing lung cancer cannot apply to a patient whose baseline risk is 0.5. 2. It reduces the effect size in large studies as compared to risk ratios - more conservative. 3. Confidence intervals of ORs can be calculated
  • 69. Non Parametric Methods 1. Actually better than parametric alternatives as they do not need checking of distributional assumptions 2. Response variable can be interval / ordinal - do not need any transformations to account for non normal distributions and can handle extreme values better 3. Being less susceptible to extreme values these are considered more robust
  • 70. Nonparametric test alternatives 1. One Sample T test - Wilcoxon Signed Rank test 2. Two sample T test - Wilcoxon 2-sample Signed Rank Test (Mann Whitney test) 3. ANOVA - Kruskal Wallis Test 4. Pearson test for Correlation - Spearman rho test
  • 71. Correlation 1. A method to examine the association between a continuous predictor and a continuous outcome. 2. A correlation coefficient can range between -1 to +1 and measures the strength of association as well as the direction. 3. Scatterplots are a graphical method for evaluating correlation.
  • 72. Pearson’s Correlation 1. Requires linear relationship between the two variables. 2. Requires that the variables be normally distributed - ideally bivariate normality. 3. Outliers have a big impact on the correlation.
  • 73. Spearman’s Correlation 1. The non parametric alternative - does not require the distribution of variables to be normal. 2. Does not assume a linear relationship but a monotonic relationship 3. Is not affected as much by outliers 4. Quite easy to get completely opposite results with Spearman’s correlation
  • 74. Correlation & Causation Strength Major confounding factors may result in strong correlation Consistency Assumes that causal factors are evenly distributed in population Specificity No reason why a risk factor should be specific for a outcome Temporality Directionality may not always imply causation e.g. Depression & Cancer Biological Gradient Only true for events where there is a dose response gradient Plausibility Depends on state of current scientific knowledge Coherence Depends on quality of additional available information Experimental Evidence Interventional research may not be always feasible Analogy A subjective judgement
  • 75. Correlation & Agreement 1. High correlation may not indicate agreement e.g. 2 methods to measure height may be correlated but give different measurements 2. A change in scale does not affect correlation e.g. if one method measured height 2 x other method correlation would still be strong
  • 76. Linear Model Y = a + βc As you may remember the equation for a line. The job of regression is to find a and β so that any value of c can be used to predict Y A statistical method to predict a variable is a model. A Linear regression is a OLS fit
  • 78. Linear Regression : Assumptions 1. The 2 assumptions for correlation hold true - linear relationship & absence of outliers 2. In addition residuals should be normally distributed 3. Homoscedasticity should be present 4. Observations should be independent - no autocorrelation 5. Multi-collinearity should be absent
  • 79. Homoskedasticity 1. Plot the predictor variable against the linear regression line 2. If the variables are distributed in a manner that they are equidistant along the line 3. Essentially means that predictor variables values have the same variance across the values of the predictor variable 4. Practically determined from residuals
  • 80. Residuals 1. Nothing but the difference between the observed value of the outcome variable and the predicted value from the model. 2. In other words it is a measure of the error / disagreement for the model predictions. 3. Plot of residuals vs the predicted value should give a nearly straight line if there is homoskedasticity
  • 81. Alternatives to Linear Regression Logistic regression : If your outcome variable in binary categorical (e.g. death / alive) Ordinal regression : Ordinal categorical data Poisson regression : If you have count data If a non linear relationship exists then a non linear regression model - alternative use transformation of the outcome variable or use segmented regression
  • 82. What about survival ? This is a special regression problem where the outcome is the time survived. Both linear and nonlinear methods are available. Parametric and nonparametric tests are available. A key point : These methods are required ONLY if all potential events have not occurred in the time frame of observation - or all patients have not died. N.B. These methods are applicable to any time to event end points
  • 83. Defining the Time Needs a baseline date from which observation starts - ideally time when exposure starts - possible to know very rarely In case of RCTs - classically the date of randomization In retrospective studies - date of registration / date of diagnosis IF patient has event then the date / time of the event is noted else the date / time of last FU is noted. - Note logically it should be larger than 0.
  • 84. The Censoring Problem The censoring problem arises as all events do not occur in the observation time frame (i.e. patients remain alive ) We do not know for sure that the remaining sample is not at risk for having the event afterwards. In absence of censoring you get an artificially inflated survival figure. Right censoring is when the subject does not have the event before the time observation ends. Left when the patient has event prior to study time.
  • 85. Hazard The effect size estimator obtained from survival methods - can be considered as the risk of developing the event. Hazard rate is the instantaneous probability of the occurrence of the event. It ignores the accumulation of hazard uptil that time point Hazard ratio is the ratio of hazard rates in two groups Cumulative Hazard is the integration of the Hazard rate over a given interval of time.
  • 86. Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
  • 87. Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
  • 88. Source: SAS Seminar Introduction to Survival Analysis in SAS Avaialble at http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/
  • 89. The Kaplan Meier Estimate Time Death 1 Yes 2 No 3 No 4 Yes 5 No 10 Yes 12 No Interval Entered Deaths Censored Alive S Prob 0 - 1 7 1 0 6 6/7 86% 1 - 4 6 1 2 3 3/4* (3/4*6/7) 64% 4 -10 3 1 1 1 1/2* (1/2*3/4*6/7) = 31% *censored individuals are removed from the denominator
  • 90. The Kaplan Meier Estimate
  • 91.
  • 92. Comparisons The Kaplan Meier method can allow you to compare the survival among groups of patients. While the effect size is important and can be conceptualized as the risk ratio or the hazard ratio we can test for the null hypothesis that the survival curves are equal The commonest is the Log Rank test
  • 93. Log Rank Test Calculates the observed number of deaths in each group at each time point where there is a event and the number expected if there was no difference between the groups. E.g. 2 groups of 20 patients each & 1 death in 6 months - the expected number of deaths in each group would be (1/40)*20 or 0.5 (note this is the number not %). This process is repeated for all the time points where there is a event & total number of observed and expected deaths in groups calculated - then a simple Chi - Square test is used to determine if the difference is more than 0.
  • 94. Alternatives Since the log rank test gives equal weightage to all time points some alternatives are available - e.g. Breslow which gives a weightage depending on the number of cases at risk at each time point. Breslow test is better when you have more deaths at the start of the KM curve and misleading when you have more censoring --- best stick to the log rank
  • 95. Assumptions for KM estimator 1. Patients who are censored have the same survival prospect as those who are followed up 2. Survival for patients who present earlier is same as that of the patients presenting later However Kaplan Meier method is a nonparametric estimator which implies that the estimate does not depend on the shape of the survival function.
  • 96. The Cox Regression 1. Allows multivariable regression modelling for survival. 2. Unlike Kaplan meier allow continuous predictor variables 3. Is one of the most (ab)-used survival analysis techniques 4. Can be used to generate a predictive model 5. Ideal sample ? - 20 x predictors = Number of Events
  • 100. Cox Regression : Output
  • 102. Cox Regression : Assumptions 1. The proportional hazards assumption should be fulfilled - i.e. the hazard function for the two strata should remain proportional. 2. Censoring should be non-informative i.e. censoring of one person should not influence the outcome of another 3. There is a linear relationship between the log of the hazard and the covariates 4. Overtly influential data (outliers) should not be present There are diagnostic methods available for each of the above.
  • 103. How to check for Proportional hazards 1. If the predictor variable is categorical KM curves can be generated and we can see if the lines maintain the same separate. 2. Alternatively you can generate Schoenfeld residuals in SPSS and plot these residuals against the time for each covariate.
  • 104. How to check for Proportional hazards
  • 105. Cox Regression : Advantages 1. It is a semi-parametric model and is less affected by outliers. 2. Unlike parametric survival models does not require correct specification of the underlying distribution 3. Lot of diagnostic procedures However does not give baseline hazard which makes predictive modelling difficult
  • 106. What not do while modelling (regression) 1. Do not work with sample sizes that are clearly inadequate 2. Do not use univariate selection 3. Do not use stepwise forward / backward selection methods 4. Do not blindly assume linearity / proportional hazards - always understand the underlying assumptions as well as the correct checks for the same 5. Read about residuals before jumping into regression 6. Don’t use split sample validation - instead use cross validation or bootstrapping DON’T FALL IN LOVE WITH YOUR MODEL
  • 107. Resources SAS Seminar: Introduction to Survival Analysis in SAS [Internet]. [cited 2016 Sep 9]. Available from: http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/ SPSS Library: Understanding contrasts [Internet]. [cited 2016 Sep 9]. Available from: http://www.ats.ucla.edu/stat/spss/library/contrast.htm Bian H. Survival Analysis Using SPSS. Available from: http://core.ecu.edu/ofe/StatisticsResearch/Survival%20Analysis%20Using%20SPSS.pdf Bland JM, Altman DG. The logrank test. BMJ. 2004 May 1;328(7447):1073. Practical recommendations for statistical analysis and data presentation in Biochemia Medica journal | Biochemia Medica [Internet]. [cited 2016 Sep 8]. Available from: http://www.biochemia-medica.com/2012/22/15 Manikandan S. Measures of dispersion. J Pharmacol Pharmacother. 2011 Oct;2(4):315–6. Manikandan S. Measures of central tendency: Median and mode. J Pharmacol Pharmacother. 2011 Jul;2(3):214–5. Utley M, Gallivan S, Young A, Cox N, Davies P, Dixey J, et al. Potential bias in Kaplan–Meier survival analysis applied to rheumatology drug studies. Rheumatology. 2000 Jan 1;39(1):1–2.