ruk·si

Machine Learning
Basics

Updated at 2018-03-09 04:25

80% of machine learning goes to collecting and cleaning the data.

Machine learning is a subfield of artificial intelligence that gives computers the ability to learn without being explicitly programmed.

Machine learning is using data to answer questions. Machine learning is about building parameterized programs that are tuned automatically so as to improve their behavior by adapting to previously seen data.

Machine learning algorithms are used daily around you without out even noticing:

  • Gmail detects spam based on machine learning algorithms.
  • Facebook uses machine learning to choose which content to show.
  • Netflix recommends shows based on your watch history.
  • Amazon recommended books and other products based on your history.
  • Hospitals detect skin cancer from pictures.

Machine learning has 4 major use-cases:

  • Prediction: e.g. predicting price of an object.
  • Classification: e.g. which bird is in the picture.
  • Clustering: e.g. grouping similar customers together.
  • Anomaly Detection: e.g. detecting increasing accident risk.
  • All of these are commonly called "predictions" though.

There are main 5 ways to think about machine learning on theoretical level:

  • Symbolism: Learning is the inverse of deduction, philosophy, logic and psychology. Learning is achievable with interconnected symbolic (human-readable) representations of problems and logic. (Inverse Deduction, Induction, Decision Trees)
  • Connectionism: Learning can be achieved by reverse engineering how the brains work. Knowledge is stored in connections between simple units like neurons in our brains. (Backpropagation, Neural Networks, Deep Learning)
  • Evolutionary: Learning is a by-product of biological evolution, why wouldn't we be able to create a learning machine by mimicking evolutionary approaches such as natural selection and survival-of-the-fittest. (Fitness Functions, Point Mutation, Parasite Coevolution, Genetic Programming)
  • Statistical: Learning can be modeled using statistics and probabilities. Why not cut the middle-man if we can define learning ourselves? (Bayesian Inference, Bayesian Networks)
  • Similarity: Learning is all about recognizing what things are similar, how much and in which way; and recording this knowledge for further use. (Clustering, Similarity Analysis)

These machine learning theories are not contradicting. All approaches are good in some use-cases and meaning that the ultimate learning machine will most likely borrow ideas from all of the areas.

90% of actual machine learning work is about data logistics. How to store your data, how to move your data, how to preprocess your data and how to serve predictions in a scalable way.

Terminology:

  • Sample is an item to process e.g. to classify. Sample can be a document, a picture, a video, text or whatever that can be expressed using quantitative traits.
  • Feature is a distinct trait that is used to describe part of a sample. Feature is frequently called "predictor variable", "explanatory variable", "regressor" or "independent variable".
  • The number of features is commonly fixed in advance, all samples should have the same set of features extracted from them and samples can have even millions of features extracted from each one. Features of a particular sample are frequently called "feature vector". The complete collection of utilized features is called "x" or "input".
  • Label is a predefined or learned trait of a sample. Common use of machine learning is to train a program to predict label of an unlabeled sample using previously seen feature-to-label examples. Label is frequently called "explained variable", "response variable", "regressand" or "dependent variable". The complete collection of utilized labels is called "y" or "output".
  • Training is feeding data to a machine learning model so it "learns" how to make predictions. Training is frequently called "fitting".
  • Hyperparameters are the parameters that control the learning itself. For example, learning rate or dropout.

Examples of features and labels:

Iris Dataset:
  Features:
    0 = sepal length in cm
    1 = sepal width in cm
    2 = petal length in cm
    4 = petal length in cm
  LabeL:
    0 = Iris Setosa
    1 = Iris Versicolour
    2 = Iris Virginica

MNIST Dataset:
  Features:
    0-783 = 28x28 greyscale image where each pixel is darkness from 0.0 to 1.0
  Labels:
    0-9 = numbers from 0 to 9

Sources