*Support Vector Machines*

*Support Vector Machines*

**SVMs can be seen as a linear classifier that maximizes separation.** Support vector machine is sometimes called Support Vector Classification (SVC).

Think of a minefield and you have a map of the mines. To maximize safety, you create a path between those mines, but you still want to maximize margin between you and ANY of the mines.

**SVMs create (d - 1) dimensional hyper-planes to group samples of d features.** Think of two groups of 2D data points, SVM aims to divide the groups with a linear line that maximizes margin between either of the groups.

**SVMs are used for classification, regression and outliers detection.** In regression context, one common method is Support Vector Regression (SVR). In outliers detection, one common method is One-class SVM.

**SVMs are effective in high dimensional spaces.** Complexity of the model comes from the number of support vectors, not dimensionality of the training data.

**SVMs don't give probability estimates.** To get probabilities, you must extend SVM with some expensive validation methods.

**Neural networks usually outperform SVMs.** But SVMs can resist overfitting even in high dimensions. SVMs always have only one layer.

SVMs have been used for:

- text categorization
- image classification
- protein classification and other biological science problems

**SVMs are versatile.** Changing the decision function (a kernel function) allows utilizing nonlinear classifiers. This effectively changes the shape of the resulting hyper-planes. This increases generalization error but performs well with more training data.

**In machine learning context, kernel usually refers to "kernel trick".** A method of using a linear classifier to solve a non-linear problem where kernel function is used to transform features to higher dimension (`d`

) representation so they can be separated with a `d - 1`

hyper-planes.

- Kernel function
`k(x, .)`

defines the distribution of similarities of points around a given point`x`

. - Kernel function
`k(x, y)`

defines the similarity of point`x`

with point`y`

.

Common kernel functions:

- Linear
- Polynomial
- RBF (Gaussian)
- Sigmoid

# Sources

- Wikipedia - Support vector machine
- sklearn - Support Vector Machines
- The Master Algorithm, Pedro Domingos