Nonlinear Dimensionality Reduction

Updated at 2017-06-18 13:12

High-dimensional data is hard to interpret by itself. Nonlinear dimensionality reduction aims to find a lower-dimension embedded non-linear manifold that represents the data.

Manifold is a space that locally resembles Euclidean space near each point but not globally e.g. in spacetime, at small enough scales, Earth appears flat and rules of Euclidean work well enough, but that is not how universe is on the greater scale.

Manifold learning is an approach to non-linear dimensionality reduction. Manifold learning is used for visualization and rarely generate more than two new features. We assume that the dimensionality of the dataset is only artificially high.

t-Distributed Stochastic Neighbor Embedding (TSNE): find a two-dimensional representation of the data that preserves distances between points as best as possible.

Source

sklearn - Manifold learning
Wikipedia - Nonlinear dimensionality reduction
Introduction to Machine Learning with Python, Andreas C. Müller, Sarah Guido