Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lecture12 - SVM


Published on

  • Login to see the comments

Lecture12 - SVM

  1. 1. Introduction to Machine Learning Lecture 12 Support Vector Machines Albert Orriols i Puig i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  2. 2. Recap of Lecture 11 1st generation NN: Perceptrons and others g p Also multi-layer percetrons Slide 2 Artificial Intelligence Machine Learning
  3. 3. Recap of Lecture 11 2nd generation NN g Some people figure it out how to adapt the weights of internal layers aye s Seemed to be very powerful and able to solve almost anything The reality showed that this was not exactly true Slide 3 Artificial Intelligence Machine Learning
  4. 4. Today’s Agenda Moving to SVM g Linear SVM The separable case The non-separable case Non-Linear Non Linear SVM Slide 4 Artificial Intelligence Machine Learning
  5. 5. Introduction SVM (Vapnik, 1995) (p, ) Clever type of perceptron Instead f h d di the layer of non-adaptive f t I t d of hand-coding th l f d ti features, each h training example is used to create a new feature using a fixed recipe ec pe A clever optimization technique is used to select the best subset o features of eatu es Many NNs researchers switched to SVM in the 1990s because they work better Here, we’ll take a slow path into SVM concepts Slide 5 Artificial Intelligence Machine Learning
  6. 6. Shattering Points with Oriented Hyperplanes Remember the idea I want to build hyperplanes that separate points of two classes In a two-dimensional space lines E.g.: Linear Classifier Which is the best separating line? Remember, a hyperplane is represented by th equation t d b the ti WX + b = 0 Slide 6 Artificial Intelligence Machine Learning
  7. 7. Linear SVM I want the line that maximizes the margin between g examples of both classes! Support Vectors Slide 7 Artificial Intelligence Machine Learning
  8. 8. Linear SVM In more detail Let’s assume two classes yi = {-1 1} {-1, Each example described by a set of features x (x is a vector; for clarity, we will mark vectors in bold in the remainder of the slides) The problem can be formulated as follows All training must satisfy ( (in the separable case) ) This can be combined Slide 8 Artificial Intelligence Machine Learning
  9. 9. Linear SVM What are the support vectors? pp Let’s find the points that lay on the hyper plane H1 Their perpendicular distance to the origin is Let’s find the points that lay on the hyper plane H2 Their perpendicular distance to the origin is The margin is: Slide 9 Artificial Intelligence Machine Learning
  10. 10. Linear SVM Therefore, the problem is , p Find the hyper plane that minimizes Subject to But let us change to the Lagrange formulation because The constraints will be placed on the Lagrange multipliers themselves (easier to handle) Training data will appear only in form of dot products between vectors Slide 10 Artificial Intelligence Machine Learning
  11. 11. Linear SVM The Lagrangian formulation comes to be g g Where αi are the Lagrange multipliers So, So now we need to Minimize Lp w.r.t w, b Simultaneously require that the derivatives of Lp w.r.t to α vanish All subject to the constraints αi ≥ 0 Slide 11 Artificial Intelligence Machine Learning
  12. 12. Linear SVM Transformation to the dual problem p This is a convex problem We W can equivalently solve th d l problem i l tl l the dual bl That is, maximize LD W.r.t αi Subject to constraints And with αi ≥ 0 Slide 12 Artificial Intelligence Machine Learning
  13. 13. Linear SVM This is a quadratic programming problem. You can solve it with many methods such as gradient descent We’ll not see these methods in class Slide 13 Artificial Intelligence Machine Learning
  14. 14. The Non-Separable case What if I can not separate the two classes p We will not be able to solve the Lagrangian formulation proposed Any idea? Slide 14 Artificial Intelligence Machine Learning
  15. 15. The Non-Separable Case Just relax the constraints by p y permitting some errors g Slide 15 Artificial Intelligence Machine Learning
  16. 16. The Non-Separable Case That means that the Lagrangian is rewritten g g We change the objective function to be minimized to uco o ed o Therefore, we are maximizing the margin and minimizing the error C i a constant to be chosen b th user is t tt b h by the The dual problem becomes Subject to and Slide 16 Artificial Intelligence Machine Learning
  17. 17. Non-Linear SVM What happens if the decision function is a linear function of pp the data? In our equations data appears in form of dot products xi · xj equations, Wouldn’t you like to have polynomials, logarithmics, … functions to fit the data? Slide 17 Artificial Intelligence Machine Learning
  18. 18. Non-Linear SVM The kernel trick Map the data into a higher-dimensional space Mercer theorem: any continuous, symmetric, positive semi- definite kernel function K(x, y) can be expressed as a dot product in a high dimensional space high-dimensional Now, we have a kernel function An example All we have talked about still holds when using the kernel function The only difference is that now my function will be Slide 18 Artificial Intelligence Machine Learning
  19. 19. Non-Linear SVM Some typical kernels A visual example of a polynomial kernel with p=3 i l lf l i lk l ith 3 Slide 19 Artificial Intelligence Machine Learning
  20. 20. Some Further Issues We have to classify data y Described by nominal attributes and continuous attributes Probably ith i i P b bl with missing values l That may have more than two classes How SVM deal with them? SVM defined over continuous attributes No problem! attributes. Nominal attributes Map into continuous space Multiple classes Build S SVM that discriminate each pair of f classes Slide 20 Artificial Intelligence Machine Learning
  21. 21. Some Further Issues I’ve seen lots of formulas… But I want to program a SVM pg builder. How I get my SVM? We have already mentioned that there are many methods to solve the quadratic programming problem Many algorithms designed for SVM One of the most significant: Sequential Minimal Optimization Currently, there are many new algorithms C lh l ih Slide 21 Artificial Intelligence Machine Learning
  22. 22. Next Class Association Rules Slide 22 Artificial Intelligence Machine Learning
  23. 23. Introduction to Machine Learning Lecture 12 Support Vector Machines Albert Orriols i Puig i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull