ruk·si

Machine Learning Platforms

Updated at 2019-05-09 08:27

Machine learning platforms are services or tools that allow outsourcing some components of a machine learning system.

Machine learning platforms have three major categories they focus on:

  • Pre-training aka. Data Preparation
  • Training
  • Post-training aka. Deployment

Each with various subcategories:

  • Pre-training
    • Data Acquisition
    • Data Anonymization
    • Data Augmentation
    • Data Synthetization
    • Data Labeling
  • Pre-training AND Training
    • Simulation
    • Data Transformation
    • Data Management / Selection
  • Training
    • Hyperparameter Optimization
    • Training Framework
    • Offline Model Evaluation
  • Training AND Post-training
    • Model Management / Archive
    • Model Analysis
    • Model Interpretability / Transparency
    • Batch Inference
  • Post-training
    • Model Validation
    • Model Serving
    • Online Experimentation
    • Model Monitoring

Other:

  • Development Environment can also frequently be labeled as a machine learning platform; like Jupyter Notebook or specific IDE.
  • Experiment tracking and version control is usually part of all of these categories.
  • Job management, scheduling and queueing is usually part of all of these categories.
  • Hardware orchestration and scaling is usually part of all of these categories.

Analytics platforms, data science platforms, machine learning platforms, and deep learning platforms are all much like synonyms. The main difference is the core focus and how they change the workflow of a data scientist.

Data scientists have different preferences in workflow, which leads to very different machine learning platforms.

  1. some prefer to write plain Python, R or other code
  2. some prefer to just train models on spreadsheets
  3. some prefer point-n-click interfaces

Machine learning platforms have various target audiences:

  1. Solo Data Scientist: independent, contained problems and solutions, prefer open source tools.
  2. Specialized Data Science Team: focus on their specialization like marketing, risk management or CRM.
  3. General Data Science Teams: cross-discipline, broad problems and solutions.
  4. Non- Data Scientist Users: platform focuses on adding abstractions on top of the learning so the problem domain area experts can use machine learning without help of a data scientist.

Sources