aka SVM
A method for discriminative Supervised learning, that is for Classification, and Regression analysis. It allows to learn non-linear functions, like Artificial neural networks.
They correspond to Kernel methods which use the Hinge loss function. See this video. Because loss function is non-differentiable we use the Subgradient method (See here).
Andrew Ng lec intro – Notes – Wiki – Notation
Support Vector Machine Intro and Application - Practical Machine Learning Tutorial with Python p.20
See page 39 here, and here, and these notes
Generalization bounds – These bounds seem to not apply to what we observe (see Deep learning theory)
Relationships between Gaussian processes, Support Vector machines and Smoothing Splines
You can intuitively see that Hinge loss + Tikhonov regularization (norm squared) pushes you towards low margin. Hinge loss penalizes things far from the boundary more, the smaller the norm of the parameter vector. Therefore, if you push towards low parameter vector norms, the classifier is pushed toward larger margin so that points are less close to the boundary.
Definition of the optimization (learning) problem: choose classification hyperplane (parametrized by ), that maximizes the Geometric margin wrt to training data set, with the contratint .
Alternative formulationt, that however, leaves arbitrary.
Here is a constraint that fixes the arbitrariness of the magnitude of : (Functional margin), we get the following optimization problem:
s.t.
This is an example of a quadratic program
This is the same as maximizing the margin, under the constraint See page 8 here, and divide consrtaints by norm of , giving the geometric margin.
An SVM works by solving the dual problem to the optimization problem of OMCs defined above. The good thing is that the resulting optimization problem is convex, for which there are very efficient methods for solving it, as there are no local minima!
Primal and dual Optimization problems: intro –> primal problem –> Dual problem (Skip some algebra)
Support vectors
(Definition of support vectors, the s are Lagrange multipliers)
Support vectors satisfy the contraint inequality with equality.
The approach for deriving the optimal margin classifier for a support vector machine, is that we'll solve the dual optimization problem above.
Prediction can be written down using inner products: see here
See Kernel methods
Kernels Introduction - Practical Machine Learning Tutorial with Python p.29
Allow the computation of the inner products, that appear in the formulas for SVMs, more effectively
How to check if a function is a valid Kernel
If is a valid kernel, then the matrix with elements , for any set of , must be symmetric positive semi-definite. It turns out the converse is true.
How to use a SVM with a kernel
Examples of kernels
Soft Margin SVM - Practical Machine Learning Tutorial with Python p.31
Soft margin is a method to deal with non-linearly separable decision boundaries.
s.t.
basically weight how each point contributes, so that if there is some outlier it doesn't break the algorithm.. Explanation
We need so that we don't reward easy to classify terms, with negative .
This constrained optimization problem can be reformulated as an unconstrained optimization problem with a regularized Hinge loss function: |
To see this we have to minimize w.r.t first, and then w.r.t (which can always be done). So we consider fixing the parameters , and minimizing w.r.t , which gives , which gives the Hing loss.
Can also work out the dual of this optimization problem (see more on Optimization and Linear programming about dual optimization problems). See slides here
Sequential minimal optimization is an optimization algorithm, that is a variation of Coordinate descent: Application to SVM dual optimization problem. This is the reason we use it
– Description of the algorithm – algorithm (we are maximizing a function