SlideShare a Scribd company logo
1 of 30
Digg Data
Support Vector Machine
Ankit Sharma
www.diggdata.in
without tears
Digg Data
Content
SVM and its application
Basic SVM
•Hyperplane
•Understanding of basics
•Optimization
Soft margin SVM
Non-linear decision boundary
SVMs in “loss + penalty” form
Kernel method
•Gaussian kernel
SVM usage beyond classification
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 2
Digg Data
• In machine learning, support vector machines are supervised
learning models with associated learning algorithms that analyze
data and recognize patterns, used for classification and regression
analysis.
• Properties of SVM :
Duality
Kernels
Margin
Convexity
Sparseness
SVM : Support Vector Machine
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 3
Digg Data
Time Series
analysis
Classification
Anomaly
detection
Regression
Machine
Vision
Text
categorization
Application of SVM
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 4
Digg Data
Basic concept of SVM
Find a linear decision surface (“hyperplane”) that can separate classes and has the largest
distance (i.e., largest “gap” or “margin”) between border-line patients (i.e., “support vectors”)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 5
Digg Data
Hyperplane as a Decision boundary
• A hyperplane is a linear decision surface that splits the space into two parts;
• It is obvious that a hyperplane is a binary classifier
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 6
Digg Data
Equation of a hyperplane
An equation of a hyperplane is defined by a
point (P0) and a perpendicular vector to the
plane (𝑤) at that point.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 7
Digg Data
• g(x) is a linear function:
x1
x2
w x + b < 0
w x + b > 0
 A hyper-plane in the feature
space
 (Unit-length) normal vector of
the hyper-plane:

w
n
w
n
Understanding the basics
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 8
Digg Data
x1
x2How to classify these points using
a linear discriminant function in
order to minimize the error rate?
 Infinite number of answers!
 Which one is the best?
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 9
Digg Data
• The linear discriminant
function (classifier) with the
maximum margin is the best
“safe zone”
 Margin is defined as the width
that the boundary could be
increased by before hitting a
data point
 Why it is the best?
Robust to outliners and thus
strong generalization ability
Margin
x1
x2
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 10
Digg Data
• Given a set of data points:
 With a scale transformation on
both w and b, the above is
equivalent to
x1
x2
{( , )}, 1,2, ,i iy i nx , where
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> 0
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < 𝟎
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 11
Digg Data
• We know that
 The margin width is:
x1
x2
Margin
x+
x+
x-
( )
2
( )
M  
 
  
   
x x n
w
x x
w w
n
Support Vectors
𝑾 𝑿+ + 𝒃 = +𝟏
𝑾 𝑿− + 𝒃 = −𝟏
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 12
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
such that
2
maximize
w
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 13
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
21
minimize
2
w
such that
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 14
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
21
minimize
2
w
such that
𝐲𝐢 𝐖 𝐗 + 𝐛 ≥ 𝟏
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 15
Digg Data
Basics of optimization: Convex functions
• A function is called convex if the function lies below the straight line
segment connecting two points, for any two points in the interval.
• Property: Any local minimum is a global minimum!
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 16
Digg Data
Basics of optimization: Quadratic programming
• Quadratic programming (QP) is a special optimization problem: the function to
optimize (“objective”) is quadratic, subject to linear constraints.
• Convex QP problems have convex objective functions.
• These problems can be solved easily and efficiently by greedy algorithms (because
every local minimum is a global minimum).
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 17
Digg Data
SVM optimization problem: Primal formulation
• This is called “primal formulation of linear SVMs”
• It is a convex quadratic programming (QP) optimization problem with n
variables (wi, i= 1,…,n), where n is the number of features in the dataset.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 18
Digg Data
SVM optimization problem: Dual formulation
• The previous problem can be recast in the so-called “dual form” giving rise to
“dual formulation of linear SVMs”.
• Apply the method of Lagrange multipliers.
• We need to minimize this Lagrangian with respect to and simultaneously require
that the derivative with respect to vanishes , all subject to the constraints that
αi > 0
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 19
Digg Data
SVM optimization problem: Dual formulation
Cond…
It is also a convex quadratic programming problem but with N variables (αi, i= 1,…,N), where N is
the number of samples.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 20
Digg Data
SVM optimization problem: Benefits of using
dual formulation
1) No need to access original data, need to access only dot products.
2) Number of free parameters is bounded by the number of support vectors
and not by the number of variables (beneficial for high-dimensional
problems).
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 21
Digg Data
Non linearly separable data: “Soft-margin” linear SVM
Assign a “slack variable” to each instance ,
ξi > 0 which can be thought of distance from the
separating hyperplane if an instance is misclassified
and 0 otherwise.
Primal formulation:
Dual formulation:
• When C is very large, the soft-margin SVM is equivalent
to hard-margin SVM;
• When C is very small, we admit misclassifications in the
training data at the expense of having w-vector with
small norm;
• C has to be selected for the distribution at hand as it will
be discussed later in this tutorial.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 22
Digg Data
SVMs in “loss + penalty” form
• Many statistical learning algorithms (including SVMs) search for a decision function by solving the
following optimization problem:
Minimize (Loss+ λ Penalty)
– Loss measures error of fitting the data
– Penalty penalizes complexity of the learned function
– λ is regularization parameter that balances Loss and Penalty
• Overfitting → Poor generalization
Can also be stated as
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 23
Digg Data
Nonlinear decision boundary
Non Linear
Decision
Boundary
Kernel
method
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 24
Digg Data
Kernel method
• Kernel methods involve
– Nonlinear transformation of data to a higher dimensional feature space induced by a Mercer kernel
– Detection of optimal linear solutions in the kernel feature space
• Transformation to a higher dimensional space is expected to be helpful in conversion of nonlinear relations
into linear relations (Cover’s theorem)
– Nonlinearly separable patterns to linearly separable patterns
– Nonlinear regression to linear regression
– Nonlinear separation of clusters to linear separation of clusters
• Pattern analysis methods are implemented in such a way that the kernel feature space representation is
not explicitly required. They involve computation of pair-wise inner-products only.
• The pair-wise inner-products are computed efficiently directly from the original representation of data
using a kernel function (Kernel trick)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 25
Digg Data
Kernel trick
Not every function RN×RN -> R can be a valid kernel; it has to satisfy so-called Mercer conditions.
Otherwise, the underlying quadratic program may not be solvable.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 26
Digg Data
Popular kernels
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 27
Digg Data
Gaussian kernel
Consider the Gaussian kernel:
Geometrically, this is a “bump” or “cavity”
centered at the training data point 𝑥j :
The resulting mapping function is a
combination of bumps and cavities.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 28
Digg Data
SVM usage beyond classification
Regression analysis
(ε-Support vector
regression)
Anomaly detection
(One-class SVM)
Clustering analysis
(Support Vector
Domain
Description)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 29
Digg Data
Thank you
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 30

More Related Content

What's hot

Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
skylian
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
Gichelle Amon
 

What's hot (20)

An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
Activation function
Activation functionActivation function
Activation function
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Back propagation
Back propagationBack propagation
Back propagation
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 

Similar to Support Vector Machine without tears

Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 

Similar to Support Vector Machine without tears (20)

background.pptx
background.pptxbackground.pptx
background.pptx
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
OM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfOM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdf
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Machine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesMachine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practices
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVM
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 

Recently uploaded

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

Support Vector Machine without tears

  • 1. Digg Data Support Vector Machine Ankit Sharma www.diggdata.in without tears
  • 2. Digg Data Content SVM and its application Basic SVM •Hyperplane •Understanding of basics •Optimization Soft margin SVM Non-linear decision boundary SVMs in “loss + penalty” form Kernel method •Gaussian kernel SVM usage beyond classification Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 2
  • 3. Digg Data • In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. • Properties of SVM : Duality Kernels Margin Convexity Sparseness SVM : Support Vector Machine Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 3
  • 5. Digg Data Basic concept of SVM Find a linear decision surface (“hyperplane”) that can separate classes and has the largest distance (i.e., largest “gap” or “margin”) between border-line patients (i.e., “support vectors”) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 5
  • 6. Digg Data Hyperplane as a Decision boundary • A hyperplane is a linear decision surface that splits the space into two parts; • It is obvious that a hyperplane is a binary classifier Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 6
  • 7. Digg Data Equation of a hyperplane An equation of a hyperplane is defined by a point (P0) and a perpendicular vector to the plane (𝑤) at that point. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 7
  • 8. Digg Data • g(x) is a linear function: x1 x2 w x + b < 0 w x + b > 0  A hyper-plane in the feature space  (Unit-length) normal vector of the hyper-plane:  w n w n Understanding the basics Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 8
  • 9. Digg Data x1 x2How to classify these points using a linear discriminant function in order to minimize the error rate?  Infinite number of answers!  Which one is the best? Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 9
  • 10. Digg Data • The linear discriminant function (classifier) with the maximum margin is the best “safe zone”  Margin is defined as the width that the boundary could be increased by before hitting a data point  Why it is the best? Robust to outliners and thus strong generalization ability Margin x1 x2 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 10
  • 11. Digg Data • Given a set of data points:  With a scale transformation on both w and b, the above is equivalent to x1 x2 {( , )}, 1,2, ,i iy i nx , where 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> 0 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < 𝟎 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 11
  • 12. Digg Data • We know that  The margin width is: x1 x2 Margin x+ x+ x- ( ) 2 ( ) M            x x n w x x w w n Support Vectors 𝑾 𝑿+ + 𝒃 = +𝟏 𝑾 𝑿− + 𝒃 = −𝟏 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 12
  • 13. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n such that 2 maximize w 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 13
  • 14. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n 21 minimize 2 w such that 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 14
  • 15. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n 21 minimize 2 w such that 𝐲𝐢 𝐖 𝐗 + 𝐛 ≥ 𝟏 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 15
  • 16. Digg Data Basics of optimization: Convex functions • A function is called convex if the function lies below the straight line segment connecting two points, for any two points in the interval. • Property: Any local minimum is a global minimum! Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 16
  • 17. Digg Data Basics of optimization: Quadratic programming • Quadratic programming (QP) is a special optimization problem: the function to optimize (“objective”) is quadratic, subject to linear constraints. • Convex QP problems have convex objective functions. • These problems can be solved easily and efficiently by greedy algorithms (because every local minimum is a global minimum). Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 17
  • 18. Digg Data SVM optimization problem: Primal formulation • This is called “primal formulation of linear SVMs” • It is a convex quadratic programming (QP) optimization problem with n variables (wi, i= 1,…,n), where n is the number of features in the dataset. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 18
  • 19. Digg Data SVM optimization problem: Dual formulation • The previous problem can be recast in the so-called “dual form” giving rise to “dual formulation of linear SVMs”. • Apply the method of Lagrange multipliers. • We need to minimize this Lagrangian with respect to and simultaneously require that the derivative with respect to vanishes , all subject to the constraints that αi > 0 Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 19
  • 20. Digg Data SVM optimization problem: Dual formulation Cond… It is also a convex quadratic programming problem but with N variables (αi, i= 1,…,N), where N is the number of samples. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 20
  • 21. Digg Data SVM optimization problem: Benefits of using dual formulation 1) No need to access original data, need to access only dot products. 2) Number of free parameters is bounded by the number of support vectors and not by the number of variables (beneficial for high-dimensional problems). Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 21
  • 22. Digg Data Non linearly separable data: “Soft-margin” linear SVM Assign a “slack variable” to each instance , ξi > 0 which can be thought of distance from the separating hyperplane if an instance is misclassified and 0 otherwise. Primal formulation: Dual formulation: • When C is very large, the soft-margin SVM is equivalent to hard-margin SVM; • When C is very small, we admit misclassifications in the training data at the expense of having w-vector with small norm; • C has to be selected for the distribution at hand as it will be discussed later in this tutorial. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 22
  • 23. Digg Data SVMs in “loss + penalty” form • Many statistical learning algorithms (including SVMs) search for a decision function by solving the following optimization problem: Minimize (Loss+ λ Penalty) – Loss measures error of fitting the data – Penalty penalizes complexity of the learned function – λ is regularization parameter that balances Loss and Penalty • Overfitting → Poor generalization Can also be stated as Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 23
  • 24. Digg Data Nonlinear decision boundary Non Linear Decision Boundary Kernel method Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 24
  • 25. Digg Data Kernel method • Kernel methods involve – Nonlinear transformation of data to a higher dimensional feature space induced by a Mercer kernel – Detection of optimal linear solutions in the kernel feature space • Transformation to a higher dimensional space is expected to be helpful in conversion of nonlinear relations into linear relations (Cover’s theorem) – Nonlinearly separable patterns to linearly separable patterns – Nonlinear regression to linear regression – Nonlinear separation of clusters to linear separation of clusters • Pattern analysis methods are implemented in such a way that the kernel feature space representation is not explicitly required. They involve computation of pair-wise inner-products only. • The pair-wise inner-products are computed efficiently directly from the original representation of data using a kernel function (Kernel trick) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 25
  • 26. Digg Data Kernel trick Not every function RN×RN -> R can be a valid kernel; it has to satisfy so-called Mercer conditions. Otherwise, the underlying quadratic program may not be solvable. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 26
  • 27. Digg Data Popular kernels Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 27
  • 28. Digg Data Gaussian kernel Consider the Gaussian kernel: Geometrically, this is a “bump” or “cavity” centered at the training data point 𝑥j : The resulting mapping function is a combination of bumps and cavities. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 28
  • 29. Digg Data SVM usage beyond classification Regression analysis (ε-Support vector regression) Anomaly detection (One-class SVM) Clustering analysis (Support Vector Domain Description) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 29
  • 30. Digg Data Thank you Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 30