ruk·si

Support Vector Machines

Updated at 2017-06-19 20:44

SVMs can be seen as a linear classifier that maximizes separation. Support vector machine is sometimes called Support Vector Classification (SVC).

Think of a minefield and you have a map of the mines. To maximize safety, you create a path between those mines, but you still want to maximize margin between you and ANY of the mines.

SVMs create (d - 1) dimensional hyper-planes to group samples of d features. Think of two groups of 2D data points, SVM aims to divide the groups with a linear line that maximizes margin between either of the groups.

SVMs are used for classification, regression and outliers detection. In regression context, one common method is Support Vector Regression (SVR). In outliers detection, one common method is One-class SVM.

SVMs are effective in high dimensional spaces. Complexity of the model comes from the number of support vectors, not dimensionality of the training data.

SVMs don't give probability estimates. To get probabilities, you must extend SVM with some expensive validation methods.

Neural networks usually outperform SVMs. But SVMs can resist overfitting even in high dimensions. SVMs always have only one layer.

SVMs have been used for:

  • text categorization
  • image classification
  • protein classification and other biological science problems

SVMs are versatile. Changing the decision function (a kernel function) allows utilizing nonlinear classifiers. This effectively changes the shape of the resulting hyper-planes. This increases generalization error but performs well with more training data.

In machine learning context, kernel usually refers to "kernel trick". A method of using a linear classifier to solve a non-linear problem where kernel function is used to transform features to higher dimension (d) representation so they can be separated with a d - 1 hyper-planes.

  • Kernel function k(x, .) defines the distribution of similarities of points around a given point x.
  • Kernel function k(x, y) defines the similarity of point x with point y.

Common kernel functions:

  • Linear
  • Polynomial
  • RBF (Gaussian)
  • Sigmoid

Sources