SlideShare a Scribd company logo
1 of 12
Download to read offline
Having fun with stats,
maths and games in life!
Adjunct, MoT and CS&E Department
Tandon School of Engineering
N e w Y o r k U n i v e r s i t y
1 / 1 / 2 0 1 6
Raman Kannan
A Study Plan
to become a
practicing data
scientist!
Outline for having fun with stats, maths and games in life
A Study Plan to become a practicing data scientist!
Raman Kannan
Adjunct, MoT and CS&E Departments
Tandon School of Engineering
NYU
Contents
Introduction ..................................................................................................................................................4
Basics: Khan Academy...............................................................................................................................4
why now, perfect storm........................................................................................................................4
advances for computing hardware, networking, tools for communication.........................................4
introduction to data..............................................................................................................................4
sample/population................................................................................................................................5
iid...........................................................................................................................................................5
bias........................................................................................................................................................5
Relationship ..........................................................................................................................................6
univariate regression ............................................................................................................................7
multivariate...........................................................................................................................................8
logistic regression .................................................................................................................................8
Linear Algebra...........................................................................................................................................8
matrices, identity,square, rectangular,symmetric................................................................................8
operations:transpose, inversion,decomposition..................................................................................8
roots, positive definiteness,eigen values..............................................................................................8
cholesky, principal components, singular value decomposition ..........................................................8
Applications...............................................................................................................................................8
analytics ................................................................................................................................................8
descriptive.............................................................................................................................................8
predictive ..............................................................................................................................................8
prescriptive ...........................................................................................................................................8
learning and intelligence need big data 3V...........................................................................................8
dimensionality reduction..........................................................................................................................8
unsupervised learning...............................................................................................................................8
clustering...............................................................................................................................................9
supervised.................................................................................................................................................9
classification,.........................................................................................................................................9
measures of classification: TP,TN,FP,FN, accuracy, precision, sensitivity ............................................9
semisupervised, hybrid.............................................................................................................................9
network, hidden, feedback, selfcorrecting...........................................................................................9
deep learning, Boltzman Machine, Markov Chain....................................................................................9
Information Retrieval Entropy, Gain.........................................................................................................9
Introduction
Paraphrasing Einstein, The problem of "qualified labor" shortfall cannot be solved if we continue with
the same mentality that created it. We need to be disruptive. There is no need for university or college
degree or any structure. Mathematics and analytics is universal and a basic language and anyone
(returning veterans, dropouts, can become proficient, if you are willing to be disruptive like
Gates,Zuckerburg). So with that hope, this document attempts to layout a path to become a practicing
data scientist.
It could at first be daunting. But, dont be intimidated! Even though I respect Malcolm Gladwell, I have to
encourage you to ignore Malcolm's 10000 hour rule. Anyone with passion, determination and discipline
can become a data scientist in less than 10000 hours...may be 6 months approximately 4 hours per day *
5 days per week * 4 weeks per month * 6 months = 480 hours. Because all this stuff is basic and mostly
intuitive and lurks in the subconscious realm of cognitive apparatus, even that of monkeys, dogs,
leopards and of course human beings. Otherwise, we could not catch a ball or frisbee or a prey. I assure
you none of this involves String theory, Reiman surface, Hilbert dimensions or Tichnoff Embedding
theorem. We already do so much of this subconsciously, we just have to transfer them to the conscious
realm of yourself.
Let us go!
Basics: Khan Academy
why now, perfect storm
advances for computing hardware, networking, tools for communication
introduction to data
operational filter> transactional vs master data
domain filter > what values can it hold >
categorical/qualitative (nominal,ordinal)
numerical/quantitative (interval, ratio)
Statistics Refresher
sample/population
iid
bias
randomness
outlier, anomaly, Bonferroni test
sample means, convergence to population mean
CLT Central Limit theorem
LLN Law of Large numbers
Benford Law small digits
central tendencies measures of, moments
mean (median,mode),variance (standard deviation),skew,kurtosis
comovement, relationship correlation, covariance
distributions normal (Gaussian),poisson,uniform
probability
basic properties,
certainity, uncertainity,
impossibility,
knowable,
unknowables,
known unknowables,
unknown unknowables
counting/frequentist
discrete, conditional, joint probabilities
Bayesian probability
continuous probability
Relationship
regression
parametric
nonparametric
independent vs dependent variables
dependent also known as response
independent aka regressors,predictors
univariate regression
linear relationship y=mx+c
quality of the relationship, goodness of fit
pvalue, null hypothesis, rsquare
assumptions
autocorrelation
multicollinearity
heteroskedasticity nonconstant variance
tests of normality
tests of randomness
transformation
mixtures
standard normal
lognormal
multivariate
logistic regression
odds ratio
Linear Algebra
matrices, identity,square, rectangular,symmetric
operations:transpose, inversion,decomposition
roots, positive definiteness,eigen values
cholesky, principal components, singular value decomposition
Applications
analytics
descriptive
predictive
prescriptive
learning and intelligence need big data 3V
dimensionality reduction
unsupervised learning
clustering
supervised
classification,
measures of classification: TP,TN,FP,FN, accuracy, precision, sensitivity
semisupervised, hybrid
network, hidden, feedback, selfcorrecting
deep learning, Boltzman Machine, Markov Chain
Information Retrieval
Entropy
Gain
References (2 B CONTD)
KhanAcademy.com
http://tutors4you.com/probabilitytutorial.htm
http://www.mathportal.org/linear-algebra/vectors/dot-product.php
http://www.stat.berkeley.edu/~brill/Stat153/tstests.pdf
http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
http://singhal.info/ieee2001.pdf Introduction to Information Retrieval
http://www.cs.columbia.edu/~gravano/Qual/Papers/singhal.pdf
http://times.cs.uiuc.edu/course/410/note/mle.pdf
http://www.dataschool.io/simple-guide-to-confusion-
matrix-term
Acknowledgements
To all those who have taught me everything I have learned in life, starting with my mother.

More Related Content

Viewers also liked

Essential Package of Health Services Country Snapshot: The Republic of South ...
Essential Package of Health Services Country Snapshot: The Republic of South ...Essential Package of Health Services Country Snapshot: The Republic of South ...
Essential Package of Health Services Country Snapshot: The Republic of South ...HFG Project
 
χριστουγεννιάτικες δημιουργίες
χριστουγεννιάτικες δημιουργίεςχριστουγεννιάτικες δημιουργίες
χριστουγεννιάτικες δημιουργίεςglykdeli
 
Imaging of the neck part i
Imaging of the neck part iImaging of the neck part i
Imaging of the neck part iWafik Ebrahim
 
10 arquitectura cristiana
10 arquitectura cristiana10 arquitectura cristiana
10 arquitectura cristianaManuelAGuerra
 

Viewers also liked (8)

Essential Package of Health Services Country Snapshot: The Republic of South ...
Essential Package of Health Services Country Snapshot: The Republic of South ...Essential Package of Health Services Country Snapshot: The Republic of South ...
Essential Package of Health Services Country Snapshot: The Republic of South ...
 
Programación de codigos de php
Programación de codigos de phpProgramación de codigos de php
Programación de codigos de php
 
Kalle
KalleKalle
Kalle
 
χριστουγεννιάτικες δημιουργίες
χριστουγεννιάτικες δημιουργίεςχριστουγεννιάτικες δημιουργίες
χριστουγεννιάτικες δημιουργίες
 
Burnham_Transcript
Burnham_TranscriptBurnham_Transcript
Burnham_Transcript
 
Imaging of the neck part i
Imaging of the neck part iImaging of the neck part i
Imaging of the neck part i
 
Cuadro comparativo
Cuadro comparativoCuadro comparativo
Cuadro comparativo
 
10 arquitectura cristiana
10 arquitectura cristiana10 arquitectura cristiana
10 arquitectura cristiana
 

Similar to A data scientist's study plan

basic statistics
basic statisticsbasic statistics
basic statisticsrosedelle
 
Schaum Outlines Of Beginning Statistics.pdf
Schaum Outlines Of Beginning Statistics.pdfSchaum Outlines Of Beginning Statistics.pdf
Schaum Outlines Of Beginning Statistics.pdfSahat Hutajulu
 
Dissertation%20FINAL%20SGS-3
Dissertation%20FINAL%20SGS-3Dissertation%20FINAL%20SGS-3
Dissertation%20FINAL%20SGS-3Meera Paleja, PhD
 
2004_Book_AllOfStatistics (1).pdf
2004_Book_AllOfStatistics (1).pdf2004_Book_AllOfStatistics (1).pdf
2004_Book_AllOfStatistics (1).pdfMauricioTalebi
 
Appreciationof mathematics:My observations and opinions
Appreciationof mathematics:My observations and opinionsAppreciationof mathematics:My observations and opinions
Appreciationof mathematics:My observations and opinionsNagasuri Bala Venkateswarlu
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Sherri Gunder
 
Josh magazine-ssc-higher-secondary-exam-2012-booklet-1
Josh magazine-ssc-higher-secondary-exam-2012-booklet-1Josh magazine-ssc-higher-secondary-exam-2012-booklet-1
Josh magazine-ssc-higher-secondary-exam-2012-booklet-1GIO
 
School Stuff - Clip Art Library. Online assignment writing service.
School Stuff - Clip Art Library. Online assignment writing service.School Stuff - Clip Art Library. Online assignment writing service.
School Stuff - Clip Art Library. Online assignment writing service.Stephanie Johnson
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
Emochat: Emotional instant messaging with the Epoc headset
Emochat:  Emotional instant messaging with the Epoc headsetEmochat:  Emotional instant messaging with the Epoc headset
Emochat: Emotional instant messaging with the Epoc headsetfwrigh2
 
A Modern Introduction To Probability And Statistics Understanding Why And How...
A Modern Introduction To Probability And Statistics Understanding Why And How...A Modern Introduction To Probability And Statistics Understanding Why And How...
A Modern Introduction To Probability And Statistics Understanding Why And How...Todd Turner
 
Bahan Mengajar Matematik Kepada Kanak-kanak (Nawal Nur Khalis - A174852)
Bahan Mengajar Matematik Kepada Kanak-kanak (Nawal Nur Khalis - A174852)Bahan Mengajar Matematik Kepada Kanak-kanak (Nawal Nur Khalis - A174852)
Bahan Mengajar Matematik Kepada Kanak-kanak (Nawal Nur Khalis - A174852)nawalnurkhalis
 
Keynote presents
Keynote presentsKeynote presents
Keynote presentsclimeguy
 
Interdepartmental working through stem clubs
Interdepartmental working through stem clubsInterdepartmental working through stem clubs
Interdepartmental working through stem clubsSTEMclubs
 

Similar to A data scientist's study plan (20)

Podstemic
PodstemicPodstemic
Podstemic
 
basic statistics
basic statisticsbasic statistics
basic statistics
 
dissertation
dissertationdissertation
dissertation
 
Schaum Outlines Of Beginning Statistics.pdf
Schaum Outlines Of Beginning Statistics.pdfSchaum Outlines Of Beginning Statistics.pdf
Schaum Outlines Of Beginning Statistics.pdf
 
Dissertation%20FINAL%20SGS-3
Dissertation%20FINAL%20SGS-3Dissertation%20FINAL%20SGS-3
Dissertation%20FINAL%20SGS-3
 
M4D-v0.4.pdf
M4D-v0.4.pdfM4D-v0.4.pdf
M4D-v0.4.pdf
 
Ml in genomics
Ml in genomicsMl in genomics
Ml in genomics
 
2004_Book_AllOfStatistics (1).pdf
2004_Book_AllOfStatistics (1).pdf2004_Book_AllOfStatistics (1).pdf
2004_Book_AllOfStatistics (1).pdf
 
Appreciationof mathematics:My observations and opinions
Appreciationof mathematics:My observations and opinionsAppreciationof mathematics:My observations and opinions
Appreciationof mathematics:My observations and opinions
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
 
Josh magazine-ssc-higher-secondary-exam-2012-booklet-1
Josh magazine-ssc-higher-secondary-exam-2012-booklet-1Josh magazine-ssc-higher-secondary-exam-2012-booklet-1
Josh magazine-ssc-higher-secondary-exam-2012-booklet-1
 
School Stuff - Clip Art Library. Online assignment writing service.
School Stuff - Clip Art Library. Online assignment writing service.School Stuff - Clip Art Library. Online assignment writing service.
School Stuff - Clip Art Library. Online assignment writing service.
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
Neuroscience
NeuroscienceNeuroscience
Neuroscience
 
Emochat: Emotional instant messaging with the Epoc headset
Emochat:  Emotional instant messaging with the Epoc headsetEmochat:  Emotional instant messaging with the Epoc headset
Emochat: Emotional instant messaging with the Epoc headset
 
A Modern Introduction To Probability And Statistics Understanding Why And How...
A Modern Introduction To Probability And Statistics Understanding Why And How...A Modern Introduction To Probability And Statistics Understanding Why And How...
A Modern Introduction To Probability And Statistics Understanding Why And How...
 
Bahan Mengajar Matematik Kepada Kanak-kanak (Nawal Nur Khalis - A174852)
Bahan Mengajar Matematik Kepada Kanak-kanak (Nawal Nur Khalis - A174852)Bahan Mengajar Matematik Kepada Kanak-kanak (Nawal Nur Khalis - A174852)
Bahan Mengajar Matematik Kepada Kanak-kanak (Nawal Nur Khalis - A174852)
 
Keynote presents
Keynote presentsKeynote presents
Keynote presents
 
Interdepartmental working through stem clubs
Interdepartmental working through stem clubsInterdepartmental working through stem clubs
Interdepartmental working through stem clubs
 

More from Raman Kannan

Essays on-civic-responsibilty
Essays on-civic-responsibiltyEssays on-civic-responsibilty
Essays on-civic-responsibiltyRaman Kannan
 
M12 boosting-part02
M12 boosting-part02M12 boosting-part02
M12 boosting-part02Raman Kannan
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01Raman Kannan
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cvRaman Kannan
 
M10 gradient descent
M10 gradient descentM10 gradient descent
M10 gradient descentRaman Kannan
 
M09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesM09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesRaman Kannan
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Chapter 04-discriminant analysis
Chapter 04-discriminant analysisChapter 04-discriminant analysis
Chapter 04-discriminant analysisRaman Kannan
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieeeRaman Kannan
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regressionRaman Kannan
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbookRaman Kannan
 
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Raman Kannan
 
A voyage-inward-02
A voyage-inward-02A voyage-inward-02
A voyage-inward-02Raman Kannan
 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Raman Kannan
 
Cognitive Assistants
Cognitive AssistantsCognitive Assistants
Cognitive AssistantsRaman Kannan
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysisRaman Kannan
 

More from Raman Kannan (20)

Essays on-civic-responsibilty
Essays on-civic-responsibiltyEssays on-civic-responsibilty
Essays on-civic-responsibilty
 
M12 boosting-part02
M12 boosting-part02M12 boosting-part02
M12 boosting-part02
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cv
 
M10 gradient descent
M10 gradient descentM10 gradient descent
M10 gradient descent
 
M09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesM09-Cross validating-naive-bayes
M09-Cross validating-naive-bayes
 
M06 tree
M06 treeM06 tree
M06 tree
 
M07 svm
M07 svmM07 svm
M07 svm
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Chapter 05 k nn
Chapter 05 k nnChapter 05 k nn
Chapter 05 k nn
 
Chapter 04-discriminant analysis
Chapter 04-discriminant analysisChapter 04-discriminant analysis
Chapter 04-discriminant analysis
 
M03 nb-02
M03 nb-02M03 nb-02
M03 nb-02
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieee
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbook
 
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
 
A voyage-inward-02
A voyage-inward-02A voyage-inward-02
A voyage-inward-02
 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923
 
Cognitive Assistants
Cognitive AssistantsCognitive Assistants
Cognitive Assistants
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 

Recently uploaded

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

A data scientist's study plan

  • 1. Having fun with stats, maths and games in life! Adjunct, MoT and CS&E Department Tandon School of Engineering N e w Y o r k U n i v e r s i t y 1 / 1 / 2 0 1 6 Raman Kannan A Study Plan to become a practicing data scientist!
  • 2. Outline for having fun with stats, maths and games in life A Study Plan to become a practicing data scientist! Raman Kannan Adjunct, MoT and CS&E Departments Tandon School of Engineering NYU
  • 3. Contents Introduction ..................................................................................................................................................4 Basics: Khan Academy...............................................................................................................................4 why now, perfect storm........................................................................................................................4 advances for computing hardware, networking, tools for communication.........................................4 introduction to data..............................................................................................................................4 sample/population................................................................................................................................5 iid...........................................................................................................................................................5 bias........................................................................................................................................................5 Relationship ..........................................................................................................................................6 univariate regression ............................................................................................................................7 multivariate...........................................................................................................................................8 logistic regression .................................................................................................................................8 Linear Algebra...........................................................................................................................................8 matrices, identity,square, rectangular,symmetric................................................................................8 operations:transpose, inversion,decomposition..................................................................................8 roots, positive definiteness,eigen values..............................................................................................8 cholesky, principal components, singular value decomposition ..........................................................8 Applications...............................................................................................................................................8 analytics ................................................................................................................................................8 descriptive.............................................................................................................................................8 predictive ..............................................................................................................................................8 prescriptive ...........................................................................................................................................8 learning and intelligence need big data 3V...........................................................................................8 dimensionality reduction..........................................................................................................................8 unsupervised learning...............................................................................................................................8 clustering...............................................................................................................................................9 supervised.................................................................................................................................................9 classification,.........................................................................................................................................9 measures of classification: TP,TN,FP,FN, accuracy, precision, sensitivity ............................................9 semisupervised, hybrid.............................................................................................................................9 network, hidden, feedback, selfcorrecting...........................................................................................9
  • 4. deep learning, Boltzman Machine, Markov Chain....................................................................................9 Information Retrieval Entropy, Gain.........................................................................................................9
  • 5. Introduction Paraphrasing Einstein, The problem of "qualified labor" shortfall cannot be solved if we continue with the same mentality that created it. We need to be disruptive. There is no need for university or college degree or any structure. Mathematics and analytics is universal and a basic language and anyone (returning veterans, dropouts, can become proficient, if you are willing to be disruptive like Gates,Zuckerburg). So with that hope, this document attempts to layout a path to become a practicing data scientist. It could at first be daunting. But, dont be intimidated! Even though I respect Malcolm Gladwell, I have to encourage you to ignore Malcolm's 10000 hour rule. Anyone with passion, determination and discipline can become a data scientist in less than 10000 hours...may be 6 months approximately 4 hours per day * 5 days per week * 4 weeks per month * 6 months = 480 hours. Because all this stuff is basic and mostly intuitive and lurks in the subconscious realm of cognitive apparatus, even that of monkeys, dogs, leopards and of course human beings. Otherwise, we could not catch a ball or frisbee or a prey. I assure you none of this involves String theory, Reiman surface, Hilbert dimensions or Tichnoff Embedding theorem. We already do so much of this subconsciously, we just have to transfer them to the conscious realm of yourself. Let us go! Basics: Khan Academy why now, perfect storm advances for computing hardware, networking, tools for communication introduction to data operational filter> transactional vs master data domain filter > what values can it hold > categorical/qualitative (nominal,ordinal) numerical/quantitative (interval, ratio) Statistics Refresher
  • 6. sample/population iid bias randomness outlier, anomaly, Bonferroni test sample means, convergence to population mean CLT Central Limit theorem LLN Law of Large numbers Benford Law small digits central tendencies measures of, moments mean (median,mode),variance (standard deviation),skew,kurtosis comovement, relationship correlation, covariance distributions normal (Gaussian),poisson,uniform
  • 7. probability basic properties, certainity, uncertainity, impossibility, knowable, unknowables, known unknowables, unknown unknowables counting/frequentist discrete, conditional, joint probabilities Bayesian probability continuous probability Relationship regression parametric nonparametric independent vs dependent variables dependent also known as response independent aka regressors,predictors
  • 8. univariate regression linear relationship y=mx+c quality of the relationship, goodness of fit pvalue, null hypothesis, rsquare assumptions autocorrelation multicollinearity heteroskedasticity nonconstant variance tests of normality tests of randomness transformation mixtures standard normal lognormal
  • 9. multivariate logistic regression odds ratio Linear Algebra matrices, identity,square, rectangular,symmetric operations:transpose, inversion,decomposition roots, positive definiteness,eigen values cholesky, principal components, singular value decomposition Applications analytics descriptive predictive prescriptive learning and intelligence need big data 3V dimensionality reduction unsupervised learning
  • 10. clustering supervised classification, measures of classification: TP,TN,FP,FN, accuracy, precision, sensitivity semisupervised, hybrid network, hidden, feedback, selfcorrecting deep learning, Boltzman Machine, Markov Chain Information Retrieval Entropy Gain References (2 B CONTD) KhanAcademy.com http://tutors4you.com/probabilitytutorial.htm http://www.mathportal.org/linear-algebra/vectors/dot-product.php http://www.stat.berkeley.edu/~brill/Stat153/tstests.pdf http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/ http://singhal.info/ieee2001.pdf Introduction to Information Retrieval http://www.cs.columbia.edu/~gravano/Qual/Papers/singhal.pdf http://times.cs.uiuc.edu/course/410/note/mle.pdf http://www.dataschool.io/simple-guide-to-confusion- matrix-term
  • 11.
  • 12. Acknowledgements To all those who have taught me everything I have learned in life, starting with my mother.