SlideShare a Scribd company logo
1 of 30
Machine Learning for Data
Science
A Brief Introduction
By
Vaibhav Kumar
Assistant Professor
DIT University, Dehradun
Email: Vaibhav.kumar@dituniversity.edu.in, vaibhav05cse@gmail.com
GitHub: https://github.com/vaibhav05cse/
Vaibhav Kumar@DIT University
Contents
• Introduction to Data Science
• Applications of Data Science
• Foundations of Data Science
• Machine Learning
• Supervised Learning
• Classification
• Logistic Regression
• Decision Tree
• Random Forest
• K-Nearest Neighbor
• Support Vector Machine
• Regression
• Simple Linear Regression
• Multiple Linear Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Unsupervised Learning
• Cluster Analysis
• Principal Component Analysis Vaibhav Kumar@DIT University
Introduction to Data Science
• Data science is a multi-disciplinary field which uses scientific
methods, processes, algorithms and systems to extract knowledge
and insights from structured and unstructured data [1].
• It is a blend of computer Science, Mathematics and business/domain
expertise.
Vaibhav Kumar@DIT University
Need of Data Science
• Size of data is growing at a faster rate [2].
• To find insights from this huge amount of data, perfect analytics
techniques are required.
• Data Science has the capacity to cater this requirement.
Vaibhav Kumar@DIT University
What Data Science can Do?
• It unifies statistics, data analysis, machine learning and their related
methods in order to understand and analyze actual phenomena with data
[3].
• It employs techniques and theories drawn from many fields within the
context of mathematics, statistics, computer science, and information
science [4].
• It does predictive analytics to predict the possibilities of a particular event
in the future.
• It does prescriptive analytics to find the best course of action for a given
situation.
• It employs machine learning techniques to discover patterns from the data.
Vaibhav Kumar@DIT University
Applications of Data Science
Vaibhav Kumar@DIT University
Foundations of Data Science
• Statistics: Descriptive, Inferential.
• Linear Algebra: Matrices, Planes, Vectors, etc.
• Computer Science: Algorithm, Graph Theory, Data Structure, DBMS, etc.
• Machine Learning: Supervised, Unsupervised, Reinforcement.
• Business Analytics: Predictive, Prescriptive, Descriptive, Decision.
• Programming: R/Python, SQL, NoSQL.
Vaibhav Kumar@DIT University
Machine Learning
• Machine learning is a subfield of computer science which focuses to
develop the computer algorithm to learn from examples and improve
the performance of a task.
• The algorithms in machine learning use training data which is the set
of past observations.
• There are three broad categories of machine learning:
 Supervised Learning: Which learns from labeled examples.
 Unsupervised Learning: Which learns from unlabeled examples.
 Reinforcement Learning: Which learns from environment through feedbacks.
• It develops predictive analytics models which allow researchers, data
scientists to predict about future based on past and current data.
Vaibhav Kumar@DIT University
Supervised Learning
• It is a category of machine learning algorithms. As name indicates, it is
supervised by the presence of output in the training data.
• It learns from the labelled data – input for which output is known.
• It builds a mathematical model of a set of data that contains both the
inputs and the desired outputs.
• A supervised learning algorithm analyzes the training data and
produces an inferred function, which can be used for mapping new
examples.
• Generally, all the supervised learning problems are classified into
Classification and Regression problems.
Vaibhav Kumar@DIT University
Classification
• Classification in machine learning is a supervised learning problem
where the output variable is a category, such as “yes” or “no” or
“disease” and “no disease”.
• In this problem, the dependent variable is categorical whose category
is predicted based on several independent variables.
• A classification model attempts to draw some conclusion from
observed values.
• Given one or more inputs a classification model will try to predict the
value of one or more outcomes.
• There are a number of classification models.
Vaibhav Kumar@DIT University
Classification through machine learning algorithms
Following are the popular machine learning algorithms which are used
in classification problems:-
• Logistic Regression
• Decision Tree
• Random Forest
• K-Nearest Neighbor
• Support Vector Machine
Vaibhav Kumar@DIT University
Logistic Regression
• This regression model is used when the dependent variable is
categorical.
• There are binary outputs of categories in this case.
Vaibhav Kumar@DIT University
Decision Tree
• A Decision tree is a flowchart like tree structure, where each internal
node denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node holds a class label.
Example:-
Vaibhav Kumar@DIT University
Random Forest
• Random forests or random decision forest is an ensemble learning
method that consists a large number of decision trees.
• Each individual tree in the random forest spits out a class prediction
and the class with the most votes becomes our model’s prediction.
Example:
Vaibhav Kumar@DIT University
K-Nearest Neighbor
• In k-NN classification, the output is a class membership of a new
observation.
• An object is classified by a plurality vote of its neighbors, with the
object being assigned to the class most common among its k nearest
neighbors.
• Example:
Vaibhav Kumar@DIT University
Support Vector Machine
• In Support Vector Machine (SVM), we plot each data item as a point
in n-dimensional space (where n is the number of features you have)
with the value of each feature being the value of a particular
coordinate.
• Then, we perform classification by finding the hyperplane
that differentiate the two classes very well.
• To identify the hyperplane, we try to maximize the distance between
boundary elements of separated classes.
• Variety of kernel functions are used to separate observations based
on whether they are linear separable or non-linearly separable.
Vaibhav Kumar@DIT University
Vaibhav Kumar@DIT University
Regression
• Regression in machine learning is supervised learning problem where
the output variable is a real or continuous value, such as “salary” or
“weight”.
• Many different models can be used, the simplest is the linear
regression.
• It tries to fit data with the best hyper-plane which goes through the
points.
• There are various techniques used for regression analysis such as
Linear Regression, Decision Tree Regression, Random Forest
Regression etc.
Vaibhav Kumar@DIT University
Simple Linear Regression
• Simple linear regression allows us to summarize and study relationships
between two continuous variables where,
• One variable, denoted by x, is regarded as the predictor, explanatory,
or independent variable.
• The other variable, denoted by y, is regarded as the response, outcome,
or dependent variable.
• Mathematically, it is expressed as:
y = b0 + b1*x + e, where:
•b0 and b1 are known as the regression beta coefficients or parameters:
•b0 is the intercept of the regression line; that is the predicted value when x = 0.
•b1 is the slope of the regression line.
•e is the error term.
Vaibhav Kumar@DIT University
Vaibhav Kumar@DIT University
Multiple Linear Regression
• The multiple linear regression is used to explain the relationship
between one continuous dependent variable and two or more
independent variables.
Vaibhav Kumar@DIT University
Support Vector Regression
• In case of regression, where continuous value to be generated as
output, a non-linear function is learned by linear learning machine
mapping into high dimensional kernel induced feature space.
• The capacity of the system is controlled by parameters that do not
depend on the dimensionality of feature space.
Vaibhav Kumar@DIT University
Vaibhav Kumar@DIT University
Decision Tree Regression
• The core algorithm for building decision trees called ID3.
• This ID3 algorithm uses the method of Standard Deviation Reduction in
case of regression.
• The standard deviation reduction is based on the decrease in standard
deviation after a dataset is split on an attribute.
• Constructing a decision tree is all about finding attribute that returns the
highest standard deviation reduction.
• The dataset is divided based on the values of the selected attribute. This
process is run recursively on the non-leaf branches, until all data is
processed.
• When the number of instances is more than one at a leaf node we calculate
the average as the final value for the target.
Vaibhav Kumar@DIT University
Random Forest Regression
• The random forest model is ensemble learning method where
multiple decision trees are used to generate an output.
• As we seen in the decision tree regression, a decision tree generates
the output as average of all the values generated by its leaf nodes.
• In random forest model, the output is generated by taking the mean
of all the outputs generated by decision trees used in this ensemble
model.
Vaibhav Kumar@DIT University
Unsupervised Learning
• Unsupervised learning is performed on the unlabeled data – there are
no input output labels (categories) are given in the data.
• Here the task of machine is to group unsorted information according
to similarities, patterns and differences without any prior training of
data.
• Two of the main methods used in unsupervised learning are:
• Principal component Analysis, and
• Cluster analysis.
Vaibhav Kumar@DIT University
Cluster Analysis
• Cluster analysis or clustering is the task of grouping a set of objects in
such a way that objects in the same group (called a cluster) are more
similar (in some sense) to each other than to those in other groups
(clusters).
• Cluster analysis can be achieved by various algorithms that differ
significantly in their understanding of what constitutes a cluster and
how to efficiently find them.
Example:
Vaibhav Kumar@DIT University
Principal Component Analysis
• Principal component analysis is a method of extracting important
variables from a large set of variables available in a data set.
• It extracts low dimensional set of features from a high dimensional
data set with a motive to capture as much information as possible.
Vaibhav Kumar@DIT University
References
1. Dhar, V. (2013). "Data science and prediction". Communications of the
ACM. 56 (12): 64–73.
2. Seth Familian (2016), “Context: What’s Big Data? Big in Growth too”,
slideshare.net.
3. Hayashi, Chikio (1 January 1998). "What is Data Science? Fundamental
Concepts and a Heuristic Example". In Hayashi, Chikio; Yajima, Keiji; Bock,
Hans-Hermann; Ohsumi, Noboru; Tanaka, Yutaka; Baba, Yasumasa (eds.).
Data Science, Classification, and Related Methods. Studies in
Classification, Data Analysis, and Knowledge Organization. Springer
Japan. pp. 40–51.
4. Stewart Tansley; Kristin Michele Tolle (2009). The Fourth Paradigm: Data-
intensive Scientific Discovery. Microsoft Research. ISBN 978-0-9825442-
0-4.
Vaibhav Kumar@DIT University
Thanking You
Vaibhav Kumar@DIT University

More Related Content

What's hot

Data science lecture1_doaa_mohey
Data science lecture1_doaa_moheyData science lecture1_doaa_mohey
Data science lecture1_doaa_moheyDoaa Mohey Eldin
 
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemIJSRD
 
Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey Doaa Mohey Eldin
 
Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey Doaa Mohey Eldin
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive ModellingAmit Kumar
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extractionAnmol Dwivedi
 
Object-Oriented Design Fundamentals.pptx
Object-Oriented Design Fundamentals.pptxObject-Oriented Design Fundamentals.pptx
Object-Oriented Design Fundamentals.pptxRaflyRizky2
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees Nikolaos Vergos
 
ICELW Conference Slides
ICELW Conference SlidesICELW Conference Slides
ICELW Conference Slidestoolboc
 
IRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET Journal
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Machine Learning for Aerospace Training
Machine Learning for Aerospace TrainingMachine Learning for Aerospace Training
Machine Learning for Aerospace TrainingMikhail Klassen
 
Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...IJCNCJournal
 

What's hot (19)

Data science lecture1_doaa_mohey
Data science lecture1_doaa_moheyData science lecture1_doaa_mohey
Data science lecture1_doaa_mohey
 
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODEL
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey Data science lecture3_doaa_mohey
Data science lecture3_doaa_mohey
 
Active learning
Active learningActive learning
Active learning
 
Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey Data science lecture2_doaa_mohey
Data science lecture2_doaa_mohey
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATIONRESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
RESULT MINING: ANALYSIS OF DATA MINING TECHNIQUES IN EDUCATION
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extraction
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
Object-Oriented Design Fundamentals.pptx
Object-Oriented Design Fundamentals.pptxObject-Oriented Design Fundamentals.pptx
Object-Oriented Design Fundamentals.pptx
 
[Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees [Women in Data Science Meetup ATX] Decision Trees
[Women in Data Science Meetup ATX] Decision Trees
 
ICELW Conference Slides
ICELW Conference SlidesICELW Conference Slides
ICELW Conference Slides
 
IRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career PredictionIRJET - A Study on Student Career Prediction
IRJET - A Study on Student Career Prediction
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Machine Learning for Aerospace Training
Machine Learning for Aerospace TrainingMachine Learning for Aerospace Training
Machine Learning for Aerospace Training
 
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...Correlation based feature selection (cfs) technique to predict student perfro...
Correlation based feature selection (cfs) technique to predict student perfro...
 

Similar to Machine learning for Data Science

ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptxhiblooms
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxmuhammadsamroz
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptxSandeepAgrawal84
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Machine Learning techniques used in AI.
Machine Learning  techniques used in AI.Machine Learning  techniques used in AI.
Machine Learning techniques used in AI.ArchanaT32
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxnarmeen11
 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Researc-paper_Project Work Phase-1 PPT (21CS09).pptx
Researc-paper_Project Work Phase-1 PPT (21CS09).pptxResearc-paper_Project Work Phase-1 PPT (21CS09).pptx
Researc-paper_Project Work Phase-1 PPT (21CS09).pptxAdityaKumar993506
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroSi Krishan
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptxssuser6654de1
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Machine learning
Machine learningMachine learning
Machine learninghplap
 

Similar to Machine learning for Data Science (20)

ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptx
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Machine Learning techniques used in AI.
Machine Learning  techniques used in AI.Machine Learning  techniques used in AI.
Machine Learning techniques used in AI.
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
Researc-paper_Project Work Phase-1 PPT (21CS09).pptx
Researc-paper_Project Work Phase-1 PPT (21CS09).pptxResearc-paper_Project Work Phase-1 PPT (21CS09).pptx
Researc-paper_Project Work Phase-1 PPT (21CS09).pptx
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Machine learning
Machine learningMachine learning
Machine learning
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 

Recently uploaded

LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Machine learning for Data Science

  • 1. Machine Learning for Data Science A Brief Introduction By Vaibhav Kumar Assistant Professor DIT University, Dehradun Email: Vaibhav.kumar@dituniversity.edu.in, vaibhav05cse@gmail.com GitHub: https://github.com/vaibhav05cse/ Vaibhav Kumar@DIT University
  • 2. Contents • Introduction to Data Science • Applications of Data Science • Foundations of Data Science • Machine Learning • Supervised Learning • Classification • Logistic Regression • Decision Tree • Random Forest • K-Nearest Neighbor • Support Vector Machine • Regression • Simple Linear Regression • Multiple Linear Regression • Support Vector Regression • Decision Tree Regression • Random Forest Regression • Unsupervised Learning • Cluster Analysis • Principal Component Analysis Vaibhav Kumar@DIT University
  • 3. Introduction to Data Science • Data science is a multi-disciplinary field which uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data [1]. • It is a blend of computer Science, Mathematics and business/domain expertise. Vaibhav Kumar@DIT University
  • 4. Need of Data Science • Size of data is growing at a faster rate [2]. • To find insights from this huge amount of data, perfect analytics techniques are required. • Data Science has the capacity to cater this requirement. Vaibhav Kumar@DIT University
  • 5. What Data Science can Do? • It unifies statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data [3]. • It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science [4]. • It does predictive analytics to predict the possibilities of a particular event in the future. • It does prescriptive analytics to find the best course of action for a given situation. • It employs machine learning techniques to discover patterns from the data. Vaibhav Kumar@DIT University
  • 6. Applications of Data Science Vaibhav Kumar@DIT University
  • 7. Foundations of Data Science • Statistics: Descriptive, Inferential. • Linear Algebra: Matrices, Planes, Vectors, etc. • Computer Science: Algorithm, Graph Theory, Data Structure, DBMS, etc. • Machine Learning: Supervised, Unsupervised, Reinforcement. • Business Analytics: Predictive, Prescriptive, Descriptive, Decision. • Programming: R/Python, SQL, NoSQL. Vaibhav Kumar@DIT University
  • 8. Machine Learning • Machine learning is a subfield of computer science which focuses to develop the computer algorithm to learn from examples and improve the performance of a task. • The algorithms in machine learning use training data which is the set of past observations. • There are three broad categories of machine learning:  Supervised Learning: Which learns from labeled examples.  Unsupervised Learning: Which learns from unlabeled examples.  Reinforcement Learning: Which learns from environment through feedbacks. • It develops predictive analytics models which allow researchers, data scientists to predict about future based on past and current data. Vaibhav Kumar@DIT University
  • 9. Supervised Learning • It is a category of machine learning algorithms. As name indicates, it is supervised by the presence of output in the training data. • It learns from the labelled data – input for which output is known. • It builds a mathematical model of a set of data that contains both the inputs and the desired outputs. • A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. • Generally, all the supervised learning problems are classified into Classification and Regression problems. Vaibhav Kumar@DIT University
  • 10. Classification • Classification in machine learning is a supervised learning problem where the output variable is a category, such as “yes” or “no” or “disease” and “no disease”. • In this problem, the dependent variable is categorical whose category is predicted based on several independent variables. • A classification model attempts to draw some conclusion from observed values. • Given one or more inputs a classification model will try to predict the value of one or more outcomes. • There are a number of classification models. Vaibhav Kumar@DIT University
  • 11. Classification through machine learning algorithms Following are the popular machine learning algorithms which are used in classification problems:- • Logistic Regression • Decision Tree • Random Forest • K-Nearest Neighbor • Support Vector Machine Vaibhav Kumar@DIT University
  • 12. Logistic Regression • This regression model is used when the dependent variable is categorical. • There are binary outputs of categories in this case. Vaibhav Kumar@DIT University
  • 13. Decision Tree • A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node holds a class label. Example:- Vaibhav Kumar@DIT University
  • 14. Random Forest • Random forests or random decision forest is an ensemble learning method that consists a large number of decision trees. • Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction. Example: Vaibhav Kumar@DIT University
  • 15. K-Nearest Neighbor • In k-NN classification, the output is a class membership of a new observation. • An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors. • Example: Vaibhav Kumar@DIT University
  • 16. Support Vector Machine • In Support Vector Machine (SVM), we plot each data item as a point in n-dimensional space (where n is the number of features you have) with the value of each feature being the value of a particular coordinate. • Then, we perform classification by finding the hyperplane that differentiate the two classes very well. • To identify the hyperplane, we try to maximize the distance between boundary elements of separated classes. • Variety of kernel functions are used to separate observations based on whether they are linear separable or non-linearly separable. Vaibhav Kumar@DIT University
  • 18. Regression • Regression in machine learning is supervised learning problem where the output variable is a real or continuous value, such as “salary” or “weight”. • Many different models can be used, the simplest is the linear regression. • It tries to fit data with the best hyper-plane which goes through the points. • There are various techniques used for regression analysis such as Linear Regression, Decision Tree Regression, Random Forest Regression etc. Vaibhav Kumar@DIT University
  • 19. Simple Linear Regression • Simple linear regression allows us to summarize and study relationships between two continuous variables where, • One variable, denoted by x, is regarded as the predictor, explanatory, or independent variable. • The other variable, denoted by y, is regarded as the response, outcome, or dependent variable. • Mathematically, it is expressed as: y = b0 + b1*x + e, where: •b0 and b1 are known as the regression beta coefficients or parameters: •b0 is the intercept of the regression line; that is the predicted value when x = 0. •b1 is the slope of the regression line. •e is the error term. Vaibhav Kumar@DIT University
  • 21. Multiple Linear Regression • The multiple linear regression is used to explain the relationship between one continuous dependent variable and two or more independent variables. Vaibhav Kumar@DIT University
  • 22. Support Vector Regression • In case of regression, where continuous value to be generated as output, a non-linear function is learned by linear learning machine mapping into high dimensional kernel induced feature space. • The capacity of the system is controlled by parameters that do not depend on the dimensionality of feature space. Vaibhav Kumar@DIT University
  • 24. Decision Tree Regression • The core algorithm for building decision trees called ID3. • This ID3 algorithm uses the method of Standard Deviation Reduction in case of regression. • The standard deviation reduction is based on the decrease in standard deviation after a dataset is split on an attribute. • Constructing a decision tree is all about finding attribute that returns the highest standard deviation reduction. • The dataset is divided based on the values of the selected attribute. This process is run recursively on the non-leaf branches, until all data is processed. • When the number of instances is more than one at a leaf node we calculate the average as the final value for the target. Vaibhav Kumar@DIT University
  • 25. Random Forest Regression • The random forest model is ensemble learning method where multiple decision trees are used to generate an output. • As we seen in the decision tree regression, a decision tree generates the output as average of all the values generated by its leaf nodes. • In random forest model, the output is generated by taking the mean of all the outputs generated by decision trees used in this ensemble model. Vaibhav Kumar@DIT University
  • 26. Unsupervised Learning • Unsupervised learning is performed on the unlabeled data – there are no input output labels (categories) are given in the data. • Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. • Two of the main methods used in unsupervised learning are: • Principal component Analysis, and • Cluster analysis. Vaibhav Kumar@DIT University
  • 27. Cluster Analysis • Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). • Cluster analysis can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Example: Vaibhav Kumar@DIT University
  • 28. Principal Component Analysis • Principal component analysis is a method of extracting important variables from a large set of variables available in a data set. • It extracts low dimensional set of features from a high dimensional data set with a motive to capture as much information as possible. Vaibhav Kumar@DIT University
  • 29. References 1. Dhar, V. (2013). "Data science and prediction". Communications of the ACM. 56 (12): 64–73. 2. Seth Familian (2016), “Context: What’s Big Data? Big in Growth too”, slideshare.net. 3. Hayashi, Chikio (1 January 1998). "What is Data Science? Fundamental Concepts and a Heuristic Example". In Hayashi, Chikio; Yajima, Keiji; Bock, Hans-Hermann; Ohsumi, Noboru; Tanaka, Yutaka; Baba, Yasumasa (eds.). Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Japan. pp. 40–51. 4. Stewart Tansley; Kristin Michele Tolle (2009). The Fourth Paradigm: Data- intensive Scientific Discovery. Microsoft Research. ISBN 978-0-9825442- 0-4. Vaibhav Kumar@DIT University