SlideShare a Scribd company logo
1 of 41
© 2019 Cloudera, Inc. All rights reserved. 1
Introduction to Machine Learning
© 2019 Cloudera, Inc. All rights reserved. 2
Topics
• What is Data?
• What is machine learning?
• Learning system model
• Training and testing
• Performance
• Applications
• Types of Machine learning
• Learning techniques
• Linear Regression – Step By Step
© 2019 Cloudera, Inc. All rights reserved. 3
What is Data
• Traditionally, Data is collection of Raw facts and Figures.
• Now we are getting data from web logs, mobile devices, sensors,
instruments, and transactions.
• IBM estimates that 90 percent of the data in the world today has been created
in the past two years.
• Businesses today are accumulating new data at a rate that exceeds their
capacity to extract value from it.
• how to use this data effectively — not just their own data, but all of the data
that's available and relevant.
© 2019 Cloudera, Inc. All rights reserved. 4
What is Data continued
• The data is more heterogeneous than data of the past.
• Like: Digitized text, audio, and visual content, like sensor and blog data
• Working with this data requires distinctive new skills and tools
© 2019 Cloudera, Inc. All rights reserved. 5
How to get Meaning out of Data
• Look at data with a mathematical mind-set. Learning skills such as machine
learning, data mining, data analysis and statistics are crucial. A data scientist
will need to interpret and represent data mathematically.
• Use a common language to access, explore and model data. Languages
like R, Python, Matlab, SparkML and a database querying language like SQL
are some of the most popular skills in demand.
• Data extraction, exploration and hypothesis testing are central to the data
science practice
• Develop strong computer science and software engineering
backgrounds. This involves developing a skill set which could include Java,
C++ or knowledge of algorithms and Hadoop. These skills will be used to
leverage data to architect systems.
© 2019 Cloudera, Inc. All rights reserved. 6
What is Machine Learning
• A branch of artificial intelligence, concerned with the design and development of algorithms
that allow computers to evolve behaviors based on empirical data.
• It is very hard to write programs that solve problems like recognizing a face. We don’t know
what program to write because we don’t know how our brain does it. Even if we had a good
idea about how to do it, the program might be horrendously complicated.
• Instead of writing a program by hand, we collect lots of examples that specify the correct
output for a given input.
• A machine learning algorithm then takes these examples and produces a program that does
the job.
• The program produced by the learning algorithm may look very different from a typical hand-
written program. It may contain millions of numbers.
• If we do it right, the program works for new cases as well as the ones we trained it on
© 2019 Cloudera, Inc. All rights reserved. 7
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
© 2019 Cloudera, Inc. All rights reserved. 8
Role of Data Engineer
• They use technology and skills to increase awareness, clarity and direction for
those working with data.
• Data scientists use their data and analytical ability to find and interpret rich data
sources;
• manage large amounts of data despite hardware, software, and bandwidth
constraints
• merge data sources.
• ensure consistency of datasets.
• create visualizations to aid in understanding data.
• build mathematical models using the data.
• present and communicate the data insights/findings.
• Conduct undirected research and frame open-ended industry questions
• Extract huge volumes of data from multiple internal and external sources
© 2019 Cloudera, Inc. All rights reserved. 9
Role of Data Engineer
• Employ sophisticated analytics programs, machine learning and statistical methods to prepare data
for use in predictive and prescriptive modelling
• Thoroughly clean and prune data to discard irrelevant information
• Explore and examine data from a variety of angles to determine hidden weaknesses, trends and/or
opportunities
• Devise data-driven solutions to the most pressing challenges
• Invent new algorithms to solve problems and build new tools to automate work
• Communicate predictions and findings to management and IT departments through effective data
visualizations and reports
• Recommend cost-effective changes to existing procedures and strategies
© 2019 Cloudera, Inc. All rights reserved. 10
Required Skills for Machine Learning
• SAS and/or R – In-depth knowledge of at least one of these analytical tools,
for data science R is generally preferred.
• Python Coding – Python is the most common coding language I typically see
required in data science roles, along with Java, Perl, or C/C++.
• Hadoop Platform – Although this isn’t always a requirement, it is heavily
preferred in many cases. Having experience with Hive or Pig is also a strong
selling point.
• SQL Database/Coding –it is expected that a candidate will be able to write
and execute complex queries in SQL.
© 2019 Cloudera, Inc. All rights reserved. 11
Applications of Machine Learning
Recognizing patterns:
• Facial identities or facial expressions
• Handwritten or spoken words
• Medical images
Generating patterns:
• Generating images or motion sequences
Recognizing anomalies:
• Unusual sequences of credit card transactions
• Unusual patterns of sensor readings in a nuclear power plant or unusual
sound in your car engine.
Prediction:
• Future stock prices or currency exchange rates
© 2019 Cloudera, Inc. All rights reserved. 12
Applications of Machine Learning
Spam filtering, fraud detection:
• The enemy adapts so we must adapt too.
Recommendation systems:
• Lots of noisy data. Million dollar prize!
Information retrieval:
• Find documents or images with similar content.
Data Visualization:
• Display a huge database in a revealing way
© 2019 Cloudera, Inc. All rights reserved. 13
How machine learns?
Input Samples Learning Method
System
Training
Testing
© 2019 Cloudera, Inc. All rights reserved. 14
How to use available Data
Training set
(observed)
Universal set
(unobserved)
Testing set
(unobserved)
Data acquisition Practical usage
© 2019 Cloudera, Inc. All rights reserved. 15
Training and Testing Data
Training is the process of making the system able to learn.
Training set and testing set come from the same distribution
Need to make some assumptions or bias
© 2019 Cloudera, Inc. All rights reserved. 16
Performance of Analysis
There are several factors affecting the performance:
• Types of training provided
• The form and extent of any initial background knowledge
• The type of feedback provided
• The learning algorithms used
© 2019 Cloudera, Inc. All rights reserved. 17
Types of Algorithms
Supervised learning Unsupervised learning
Semi-supervised learning
© 2019 Cloudera, Inc. All rights reserved. 18
Types of Algorithms
The success of machine learning system also depends on the algorithms.
The algorithms control the search to find and build the knowledge structures.
The learning algorithms should extract useful information from training examples.
There are 4 types of Machine Learning
Supervised - Training data includes desired outputs
Unsupervised - Training data does not include desired outputs
Semi-Supervised - Training data includes a few desired outputs
Reinforcement - Rewards from sequence of actions
© 2019 Cloudera, Inc. All rights
reserved.
19
© 2019 Cloudera, Inc. All rights reserved. 20
Supervised Learning
Supervised learning is the machine learning task of learning a function that maps an
input to an output based on example input-output pairs.
It infers a function from labeled training data consisting of a set of training examples.
In supervised learning, each example is a pair consisting of an input object and a
desired output value.
© 2019 Cloudera, Inc. All rights reserved. 21
Supervised Learning
Classification : output is having discrete value Regression: output is having continuous value.
Example of Supervised Learning Algorithms:
• Linear Regression
• Nearest Neighbor
• Guassian Naive Bayes
• Decision Trees
• Support Vector Machine (SVM)
• Random Forest
© 2019 Cloudera, Inc. All rights reserved. 22
Supervised Learning
• Learn to predict output when given an input vector
© 2019 Cloudera, Inc. All rights reserved. 23
T
Clustering: finding a structure or pattern in a
collection of uncategorized data
• K-means clustering
• K-NN (k nearest neighbors)
• Principal Component Analysis
Association: Finding association between
finite elements of a dataset.
• Association rules
© 2019 Cloudera, Inc. All rights reserved. 24
Unsupervised Learning
• Create an internal representation of the input e.g. form clusters; extract features
© 2019 Cloudera, Inc. All rights reserved. 25
Semi-Supervised Learning
Some data is labeled but most of it is unlabeled and a mixture of supervised and
unsupervised techniques can be used.
• Speech Recognition
• Internet Content Classification
• DNA Sequence Classification
© 2019 Cloudera, Inc. All rights reserved. 26
Reinforcement Learning
Output depends on the state of the current input and the next input depends on the
output of the previous input
Example:
• Robotics for industrial automation
• Driverless cars
Types:
• Q-Learning;
• Temporal Difference (TD);
• Monte-Carlo Tree Search (MCTS)
© 2019 Cloudera, Inc. All rights reserved. 27
Machine Learning in Action
© 2019 Cloudera, Inc. All rights reserved. 28
Linear Regression
• In statistics, linear
regression is an
approach for
modelling the
relationship between a
scalar dependent
variable y and one or
more explanatory
variables (independent
variables) denoted X.
In Statistical experiment, the dependent variable is the event expected to change when
the independent variable is manipulated
© 2019 Cloudera, Inc. All rights reserved. 29
Linear RegressionApplications
• Effect of fertilizer on plant growth:
• Effect of drug dosage on symptom severity:
• Effect of temperature on pigmentation:
• Per capita crime rate by town
• Average number of rooms per dwelling
• Student teacher ratio by town
© 2019 Cloudera, Inc. All rights reserved. 30
Types of Linear Regression
simple linear regression : The case
of one explanatory variable
multiple linear regression :
more than one explanatory
variable
© 2019 Cloudera, Inc. All rights reserved. 31
Linear Regression - Implementation
X Y
1.00 1.00
2.00 2.00
3.00 1.30
4.00 3.75
5.00 2.25
Input table with sample Data Output Regression Analysis with Regression Line
© 2019 Cloudera, Inc. All rights reserved. 32
T
• A simple scatter plot for given Data will look like
© 2019 Cloudera, Inc. All rights reserved. 33
T
• However, the regression line is typically computed with statistical
software.
• But the calculations are based on
• MX = mean of X
• MY = mean of Y
• sX = standard deviation of X
• sY = standard deviation of Y
• r = correlation between X and Y.
• Result statistics are will be
MX MY sX sY r
3 2.06 1.581 1.072 0.627
© 2019 Cloudera, Inc. All rights reserved. 34
T
• The formula for a regression line is
Y' = bX + A
where
Y' = predicted score
b = slope of the line
A = Y intercept
Slope (b) can be calculated as:
b = r sY/sX
Intercept (A) can be calculated as:
A = MY - bMX
b = (0.627)(1.072)/1.581 = 0.425
A = 2.06 - (0.425)(3) = 0.785
For Given Table
MX MY sX sY r
3 2.06 1.581 1.072 0.627
© 2019 Cloudera, Inc. All rights reserved. 35
• We know that formula for a regression line is
• Y' = bX + A
• Then predicted values will be
• Y' = 0.425X + 0.785 (Putting values of X)
• Like X = 1,
• Y' = (0.425)(1) + 0.785 = 1.21.
• and X = 2,
• Y' = (0.425)(2) + 0.785 = 1.64.
predicted values (Y') are as shown in following table
X Y Y'
1.00 1.00 1.210
2.00 2.00 1.635
3.00 1.30 2.060
4.00 3.75 2.485
5.00 2.25 2.910
© 2019 Cloudera, Inc. All rights reserved. 36
The error of prediction for a point is the value of the point minus the predicted value (the value on the
line). Following Table shows the predicted values (Y') and the errors of prediction (Y-Y').
X Y Y' Y-Y' (Y-Y')2
1.00 1.00 1.210 -0.210 0.044
2.00 2.00 1.635 0.365 0.133
3.00 1.30 2.060 -0.760 0.578
4.00 3.75 2.485 1.265 1.600
5.00 2.25 2.910 -0.660 0.436
© 2019 Cloudera, Inc. All rights
reserved.
37
CONCLUSION
© 2019 Cloudera, Inc. All rights reserved. 38
Test Your Understanding
Problem Statement: Last year, five randomly selected students took a math aptitude test before
they began their statistics course. If a student made an 80 on the aptitude test, what grade would
we expect her to make in statistics?
Student xi yi
1 95 85
2 85 95
3 80 70
4 70 65
5 60 70
Xi = scores on the aptitude test.
Yi = statistics grades
© 2019 Cloudera, Inc. All rights reserved. 39
• Required Values
Student xi yi (xi - x) (yi - y) (xi - x)2
(yi - y)2
(xi - x)(yi - y)
1 95 85 17 8 289 64 136
2 85 95 7 18 49 324 126
3 80 70 2 -7 4 49 -14
4 70 65 -8 -12 64 144 96
5 60 70 -18 -7 324 49 126
Sum 390 385 730 630 470
Mean 78 77
© 2019 Cloudera, Inc. All rights reserved. 40
The regression equation is a linear equation of the form:
ŷ = b0 + b1x
we need to find values for b0 and b1
b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)2]
b1 = 470/730 = 0.644
b0 = y - b1 * x
b0 = 77 - (0.644)(78) = 26.768
So
Regression Equation is: ŷ = 26.768 + 0.644x
© 2019 Cloudera, Inc. All rights reserved. 41
If a student scored 80 on the aptitude test, the estimated statistics grade would be:
– ŷ = 26.768 + 0.644 * x
=> 26.768 + 0.644 * 80
=> 26.768 + 51.52
= 78.288

More Related Content

Similar to Introduction to Machine Learning

Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)Data Driven Innovation
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationAn AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiProfessor Lili Saghafi
 
Unit 9 Technological trends in Information Technology By Sulav Acharya
Unit 9 Technological trends in Information Technology By Sulav AcharyaUnit 9 Technological trends in Information Technology By Sulav Acharya
Unit 9 Technological trends in Information Technology By Sulav AcharyaAchSulav
 
Unit 9 Technological trends in Information Technology By Sulav Acharya
Unit 9 Technological trends in Information Technology By Sulav AcharyaUnit 9 Technological trends in Information Technology By Sulav Acharya
Unit 9 Technological trends in Information Technology By Sulav AcharyaAchSulav
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Cloudera, Inc.
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaNeo4j
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningCloudera, Inc.
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Machine Learning in Customer Analytics
Machine Learning in Customer AnalyticsMachine Learning in Customer Analytics
Machine Learning in Customer AnalyticsCourse5i
 
Manoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentManoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentAgile Impact Conference
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudJuarez Junior
 
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoTTop 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoTAmazon Web Services
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise deteo
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and MLQuantUniversity
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceData Science Milan
 

Similar to Introduction to Machine Learning (20)

Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
Developing Game-Changing Embedded Intelligence (Francesca Perino, MathWorks)
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationAn AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven Organization
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
 
Unit 9 Technological trends in Information Technology By Sulav Acharya
Unit 9 Technological trends in Information Technology By Sulav AcharyaUnit 9 Technological trends in Information Technology By Sulav Acharya
Unit 9 Technological trends in Information Technology By Sulav Acharya
 
Unit 9 Technological trends in Information Technology By Sulav Acharya
Unit 9 Technological trends in Information Technology By Sulav AcharyaUnit 9 Technological trends in Information Technology By Sulav Acharya
Unit 9 Technological trends in Information Technology By Sulav Acharya
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
A journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercialization
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Machine Learning in Customer Analytics
Machine Learning in Customer AnalyticsMachine Learning in Customer Analytics
Machine Learning in Customer Analytics
 
Manoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning DevelopmentManoj Shanmugasundaram - Agile Machine Learning Development
Manoj Shanmugasundaram - Agile Machine Learning Development
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
 
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoTTop 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
Top 4 Ways to Build Machine Learning Prediction on the Edge for Mobile & IoT
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

Introduction to Machine Learning

  • 1. © 2019 Cloudera, Inc. All rights reserved. 1 Introduction to Machine Learning
  • 2. © 2019 Cloudera, Inc. All rights reserved. 2 Topics • What is Data? • What is machine learning? • Learning system model • Training and testing • Performance • Applications • Types of Machine learning • Learning techniques • Linear Regression – Step By Step
  • 3. © 2019 Cloudera, Inc. All rights reserved. 3 What is Data • Traditionally, Data is collection of Raw facts and Figures. • Now we are getting data from web logs, mobile devices, sensors, instruments, and transactions. • IBM estimates that 90 percent of the data in the world today has been created in the past two years. • Businesses today are accumulating new data at a rate that exceeds their capacity to extract value from it. • how to use this data effectively — not just their own data, but all of the data that's available and relevant.
  • 4. © 2019 Cloudera, Inc. All rights reserved. 4 What is Data continued • The data is more heterogeneous than data of the past. • Like: Digitized text, audio, and visual content, like sensor and blog data • Working with this data requires distinctive new skills and tools
  • 5. © 2019 Cloudera, Inc. All rights reserved. 5 How to get Meaning out of Data • Look at data with a mathematical mind-set. Learning skills such as machine learning, data mining, data analysis and statistics are crucial. A data scientist will need to interpret and represent data mathematically. • Use a common language to access, explore and model data. Languages like R, Python, Matlab, SparkML and a database querying language like SQL are some of the most popular skills in demand. • Data extraction, exploration and hypothesis testing are central to the data science practice • Develop strong computer science and software engineering backgrounds. This involves developing a skill set which could include Java, C++ or knowledge of algorithms and Hadoop. These skills will be used to leverage data to architect systems.
  • 6. © 2019 Cloudera, Inc. All rights reserved. 6 What is Machine Learning • A branch of artificial intelligence, concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data. • It is very hard to write programs that solve problems like recognizing a face. We don’t know what program to write because we don’t know how our brain does it. Even if we had a good idea about how to do it, the program might be horrendously complicated. • Instead of writing a program by hand, we collect lots of examples that specify the correct output for a given input. • A machine learning algorithm then takes these examples and produces a program that does the job. • The program produced by the learning algorithm may look very different from a typical hand- written program. It may contain millions of numbers. • If we do it right, the program works for new cases as well as the ones we trained it on
  • 7. © 2019 Cloudera, Inc. All rights reserved. 7 Traditional Programming Machine Learning Computer Data Program Output Computer Data Output Program
  • 8. © 2019 Cloudera, Inc. All rights reserved. 8 Role of Data Engineer • They use technology and skills to increase awareness, clarity and direction for those working with data. • Data scientists use their data and analytical ability to find and interpret rich data sources; • manage large amounts of data despite hardware, software, and bandwidth constraints • merge data sources. • ensure consistency of datasets. • create visualizations to aid in understanding data. • build mathematical models using the data. • present and communicate the data insights/findings. • Conduct undirected research and frame open-ended industry questions • Extract huge volumes of data from multiple internal and external sources
  • 9. © 2019 Cloudera, Inc. All rights reserved. 9 Role of Data Engineer • Employ sophisticated analytics programs, machine learning and statistical methods to prepare data for use in predictive and prescriptive modelling • Thoroughly clean and prune data to discard irrelevant information • Explore and examine data from a variety of angles to determine hidden weaknesses, trends and/or opportunities • Devise data-driven solutions to the most pressing challenges • Invent new algorithms to solve problems and build new tools to automate work • Communicate predictions and findings to management and IT departments through effective data visualizations and reports • Recommend cost-effective changes to existing procedures and strategies
  • 10. © 2019 Cloudera, Inc. All rights reserved. 10 Required Skills for Machine Learning • SAS and/or R – In-depth knowledge of at least one of these analytical tools, for data science R is generally preferred. • Python Coding – Python is the most common coding language I typically see required in data science roles, along with Java, Perl, or C/C++. • Hadoop Platform – Although this isn’t always a requirement, it is heavily preferred in many cases. Having experience with Hive or Pig is also a strong selling point. • SQL Database/Coding –it is expected that a candidate will be able to write and execute complex queries in SQL.
  • 11. © 2019 Cloudera, Inc. All rights reserved. 11 Applications of Machine Learning Recognizing patterns: • Facial identities or facial expressions • Handwritten or spoken words • Medical images Generating patterns: • Generating images or motion sequences Recognizing anomalies: • Unusual sequences of credit card transactions • Unusual patterns of sensor readings in a nuclear power plant or unusual sound in your car engine. Prediction: • Future stock prices or currency exchange rates
  • 12. © 2019 Cloudera, Inc. All rights reserved. 12 Applications of Machine Learning Spam filtering, fraud detection: • The enemy adapts so we must adapt too. Recommendation systems: • Lots of noisy data. Million dollar prize! Information retrieval: • Find documents or images with similar content. Data Visualization: • Display a huge database in a revealing way
  • 13. © 2019 Cloudera, Inc. All rights reserved. 13 How machine learns? Input Samples Learning Method System Training Testing
  • 14. © 2019 Cloudera, Inc. All rights reserved. 14 How to use available Data Training set (observed) Universal set (unobserved) Testing set (unobserved) Data acquisition Practical usage
  • 15. © 2019 Cloudera, Inc. All rights reserved. 15 Training and Testing Data Training is the process of making the system able to learn. Training set and testing set come from the same distribution Need to make some assumptions or bias
  • 16. © 2019 Cloudera, Inc. All rights reserved. 16 Performance of Analysis There are several factors affecting the performance: • Types of training provided • The form and extent of any initial background knowledge • The type of feedback provided • The learning algorithms used
  • 17. © 2019 Cloudera, Inc. All rights reserved. 17 Types of Algorithms Supervised learning Unsupervised learning Semi-supervised learning
  • 18. © 2019 Cloudera, Inc. All rights reserved. 18 Types of Algorithms The success of machine learning system also depends on the algorithms. The algorithms control the search to find and build the knowledge structures. The learning algorithms should extract useful information from training examples. There are 4 types of Machine Learning Supervised - Training data includes desired outputs Unsupervised - Training data does not include desired outputs Semi-Supervised - Training data includes a few desired outputs Reinforcement - Rewards from sequence of actions
  • 19. © 2019 Cloudera, Inc. All rights reserved. 19
  • 20. © 2019 Cloudera, Inc. All rights reserved. 20 Supervised Learning Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value.
  • 21. © 2019 Cloudera, Inc. All rights reserved. 21 Supervised Learning Classification : output is having discrete value Regression: output is having continuous value. Example of Supervised Learning Algorithms: • Linear Regression • Nearest Neighbor • Guassian Naive Bayes • Decision Trees • Support Vector Machine (SVM) • Random Forest
  • 22. © 2019 Cloudera, Inc. All rights reserved. 22 Supervised Learning • Learn to predict output when given an input vector
  • 23. © 2019 Cloudera, Inc. All rights reserved. 23 T Clustering: finding a structure or pattern in a collection of uncategorized data • K-means clustering • K-NN (k nearest neighbors) • Principal Component Analysis Association: Finding association between finite elements of a dataset. • Association rules
  • 24. © 2019 Cloudera, Inc. All rights reserved. 24 Unsupervised Learning • Create an internal representation of the input e.g. form clusters; extract features
  • 25. © 2019 Cloudera, Inc. All rights reserved. 25 Semi-Supervised Learning Some data is labeled but most of it is unlabeled and a mixture of supervised and unsupervised techniques can be used. • Speech Recognition • Internet Content Classification • DNA Sequence Classification
  • 26. © 2019 Cloudera, Inc. All rights reserved. 26 Reinforcement Learning Output depends on the state of the current input and the next input depends on the output of the previous input Example: • Robotics for industrial automation • Driverless cars Types: • Q-Learning; • Temporal Difference (TD); • Monte-Carlo Tree Search (MCTS)
  • 27. © 2019 Cloudera, Inc. All rights reserved. 27 Machine Learning in Action
  • 28. © 2019 Cloudera, Inc. All rights reserved. 28 Linear Regression • In statistics, linear regression is an approach for modelling the relationship between a scalar dependent variable y and one or more explanatory variables (independent variables) denoted X. In Statistical experiment, the dependent variable is the event expected to change when the independent variable is manipulated
  • 29. © 2019 Cloudera, Inc. All rights reserved. 29 Linear RegressionApplications • Effect of fertilizer on plant growth: • Effect of drug dosage on symptom severity: • Effect of temperature on pigmentation: • Per capita crime rate by town • Average number of rooms per dwelling • Student teacher ratio by town
  • 30. © 2019 Cloudera, Inc. All rights reserved. 30 Types of Linear Regression simple linear regression : The case of one explanatory variable multiple linear regression : more than one explanatory variable
  • 31. © 2019 Cloudera, Inc. All rights reserved. 31 Linear Regression - Implementation X Y 1.00 1.00 2.00 2.00 3.00 1.30 4.00 3.75 5.00 2.25 Input table with sample Data Output Regression Analysis with Regression Line
  • 32. © 2019 Cloudera, Inc. All rights reserved. 32 T • A simple scatter plot for given Data will look like
  • 33. © 2019 Cloudera, Inc. All rights reserved. 33 T • However, the regression line is typically computed with statistical software. • But the calculations are based on • MX = mean of X • MY = mean of Y • sX = standard deviation of X • sY = standard deviation of Y • r = correlation between X and Y. • Result statistics are will be MX MY sX sY r 3 2.06 1.581 1.072 0.627
  • 34. © 2019 Cloudera, Inc. All rights reserved. 34 T • The formula for a regression line is Y' = bX + A where Y' = predicted score b = slope of the line A = Y intercept Slope (b) can be calculated as: b = r sY/sX Intercept (A) can be calculated as: A = MY - bMX b = (0.627)(1.072)/1.581 = 0.425 A = 2.06 - (0.425)(3) = 0.785 For Given Table MX MY sX sY r 3 2.06 1.581 1.072 0.627
  • 35. © 2019 Cloudera, Inc. All rights reserved. 35 • We know that formula for a regression line is • Y' = bX + A • Then predicted values will be • Y' = 0.425X + 0.785 (Putting values of X) • Like X = 1, • Y' = (0.425)(1) + 0.785 = 1.21. • and X = 2, • Y' = (0.425)(2) + 0.785 = 1.64. predicted values (Y') are as shown in following table X Y Y' 1.00 1.00 1.210 2.00 2.00 1.635 3.00 1.30 2.060 4.00 3.75 2.485 5.00 2.25 2.910
  • 36. © 2019 Cloudera, Inc. All rights reserved. 36 The error of prediction for a point is the value of the point minus the predicted value (the value on the line). Following Table shows the predicted values (Y') and the errors of prediction (Y-Y'). X Y Y' Y-Y' (Y-Y')2 1.00 1.00 1.210 -0.210 0.044 2.00 2.00 1.635 0.365 0.133 3.00 1.30 2.060 -0.760 0.578 4.00 3.75 2.485 1.265 1.600 5.00 2.25 2.910 -0.660 0.436
  • 37. © 2019 Cloudera, Inc. All rights reserved. 37 CONCLUSION
  • 38. © 2019 Cloudera, Inc. All rights reserved. 38 Test Your Understanding Problem Statement: Last year, five randomly selected students took a math aptitude test before they began their statistics course. If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics? Student xi yi 1 95 85 2 85 95 3 80 70 4 70 65 5 60 70 Xi = scores on the aptitude test. Yi = statistics grades
  • 39. © 2019 Cloudera, Inc. All rights reserved. 39 • Required Values Student xi yi (xi - x) (yi - y) (xi - x)2 (yi - y)2 (xi - x)(yi - y) 1 95 85 17 8 289 64 136 2 85 95 7 18 49 324 126 3 80 70 2 -7 4 49 -14 4 70 65 -8 -12 64 144 96 5 60 70 -18 -7 324 49 126 Sum 390 385 730 630 470 Mean 78 77
  • 40. © 2019 Cloudera, Inc. All rights reserved. 40 The regression equation is a linear equation of the form: ŷ = b0 + b1x we need to find values for b0 and b1 b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x)2] b1 = 470/730 = 0.644 b0 = y - b1 * x b0 = 77 - (0.644)(78) = 26.768 So Regression Equation is: ŷ = 26.768 + 0.644x
  • 41. © 2019 Cloudera, Inc. All rights reserved. 41 If a student scored 80 on the aptitude test, the estimated statistics grade would be: – ŷ = 26.768 + 0.644 * x => 26.768 + 0.644 * 80 => 26.768 + 51.52 = 78.288

Editor's Notes

  1. In simpler terms, a machine “learns” by looking for patterns among massive data loads, and when it sees one, it adjusts the program to reflect the “truth” of what it found. The more data you expose the machine to, the “smarter” it gets. And when it sees enough patterns, it begins to make predictions. Unlike humans, however, machines cannot generalize knowledge or transfer learning from one application to another
  2. in the 20th century, computer programmers had to get their electronic charges to do things by tapping out lines of code specifying exactly what needed to be done. Machine learning shifts some of that work away from humans, forcing the computer to figure things out for itself. .
  3. Why do we use train and test sets? Creating a train and test split of your dataset is one method to quickly evaluate the performance of an algorithm on your problem. The training dataset is used to prepare a model, to train it. We pretend the test dataset is new data where the output values are retained from the algorithm. We gather predictions from the trained model on the inputs from the test dataset and compare them to the existing output values of the test set. Comparing the predictions and existing outputs on the test dataset allows us to compute a performance measure for the model on the test dataset. This is an estimate of the skill of the algorithm trained on the problem when making predictions on unseen data. When we evaluate an algorithm, we are in fact evaluating all steps in the procedure, including how the training data was prepared (e.g. scaling), the choice of algorithm (e.g. kNN), and how the chosen algorithm was configured (e.g. k=3). The performance measure calculated on the predictions is an estimate of the skill of the whole procedure. We generalize the performance measure from: “the skill of the procedure on the test set“ to “the skill of the procedure on unseen data“. This is quite a leap and requires that: The procedure is sufficiently robust that the estimate of skill is close to what we actually expect on unseen data. The choice of performance measure accurately captures what we are interested in measuring in predictions on unseen data. The choice of data preparation is well understood and repeatable on new data, and reversible if predictions need to be returned to their original scale or related to the original input values. The choice of algorithm makes sense for its intended use and operational environment (e.g. complexity or chosen programming language). A lot rides on the estimated skill of the whole procedure on the test set. In fact, using the train/test method of estimating the skill of the procedure on unseen data often has a high variance (unless we have a heck of a lot of data to split). This means that when it is repeated, it gives different results, often very different results. The outcome is that we may be quite uncertain about how well the procedure actually performs on unseen data and how one procedure compares to another
  4. Supervised learning is a learning in which we teach or train the machine using data which is well labeled that means some data is already tagged with the correct answer. After that, the machine is provided with a new set of examples(data) so that supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data. For instance, suppose you are given a basket filled with different kinds of fruits. Now the first step is to train the machine with all different fruits one by one like this: If shape of object is rounded and depression at top having color Red then it will be labelled as –Apple. If shape of object is long curving cylinder having color Green-Yellow then it will be labelled as –Banana. Now suppose after training the data, you have given a new separate fruit say Banana from basket and asked to identify it.
  5. Supervised learning classified into two categories of algorithms: Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” or “disease” and “no disease”. Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
  6. Unsupervised learning is the training of machine using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. Unlike supervised learning, no teacher is provided that means no training will be given to the machine. Therefore machine is restricted to find the hidden structure in unlabeled data by our-self. For instance, suppose it is given an image having both dogs and cats which have not seen ever. Thus the machine has no idea about the features of dogs and cat so we can’t categorize it in dogs and cats. But it can categorize them according to their similarities, patterns, and differences i.e., we can easily categorize the above picture into two parts. First first may contain all pics having dogs in it and second part may contain all pics having cats in it. Here you didn’t learn anything before, means no training data or examples. Unsupervised learning classified into two categories of algorithms: Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
  7. The basic difference between Supervised and Unsupervised learning is that Supervised Learning datasets have an output label associated with each tuple while Unsupervised Learning datasets do not. The most basic disadvantage of any Supervised Learning algorithm is that the dataset has to be hand-labeled either by a Machine Learning Engineer or a Data Scientist. This is a very costly process, especially when dealing with large volumes of data. The most basic disadvantage of any Unsupervised Learning is that it’s application spectrum is limited. To counter these disadvantages, the concept of Semi-Supervised Learning was introduced. In this type of learning, the algorithm is trained upon a combination of labeled and unlabeled data. Typically, this combination will contain a very small amount of labeled data and a very large amount of unlabeled data. The basic procedure involved is that first, the programmer will cluster similar data using an unsupervised learning algorithm and then use the existing labeled data to label the rest of the unlabeled data. one may imagine the three types of learning algorithms as Supervised learning where a student is under the supervision of a teacher at both home and school, Unsupervised learning where a student has to figure out a concept himself and Semi-Supervised learning where a teacher teaches a few concepts in class and gives questions as homework which are based on similar concepts.
  8. Reinforcement Learning is a type of Machine Learning paradigms in which a learning algorithm is trained not on preset data but rather based on a feedback system. These algorithms are touted as the future of Machine Learning as these eliminate the cost of collecting and cleaning the data.