ruk·si

🤖 Machine Learning
ML Platforms

Updated at 2019-05-09 11:27

Machine learning platforms are services or tools that allow outsourcing some components of a machine learning system .

Machine learning platforms have three major categories they focus on:

  • Pre-training aka. Data Preparation
  • Training
  • Post-training aka. Deployment

Each with various subcategories:

  • Pre-training
    • Data Acquisition
    • Data Anonymization
    • Data Augmentation
    • Data Synthetization
    • Data Labeling
  • Pre-training AND Training
    • Simulation
    • Data Transformation
    • Data Management / Selection
  • Training
    • Hyperparameter Optimization
    • Training Framework
    • Offline Model Evaluation
  • Training AND Post-training
    • Model Management / Archive
    • Model Analysis
    • Model Interpretability / Transparency
    • Batch Inference
  • Post-training
    • Model Validation
    • Model Serving
    • Online Experimentation
    • Model Monitoring

Other:

  • Development environment can also frequently be labeled as a machine learning platform, like Jupyter Notebook or specific IDE. They are not.
  • Experiment tracking and version control are usually part of all of these.
  • Job management, scheduling and queueing are usually part of all of these.
  • Hardware orchestration and scaling are usually part of all of these.

It's worth noting that 'Analytics Platforms', 'Data Science Platforms', 'Machine Learning Platforms', and 'Deep Learning Platforms' are often used interchangeably, their differences mainly lying in their core focus and how they modify the work process of a data scientist.

Data scientists have different preferences in workflow, which leads to very different machine learning platforms.

  1. some prefer to write plain Python, R or other code
  2. some prefer to just train models on spreadsheets
  3. some prefer point-n-click interfaces, the good old clickops

Machine learning platforms have various audiences of focus:

  1. Solo Data Scientist: independent, contained problems; prefer open source tools.
  2. Specialized Data Science Team: heavy focus on their specialization like marketing, risk management, CRM or whatever their domain might be.
  3. General Data Science Teams: cross-discipline, broad problems and solutions.
  4. Non-Data Scientist Users: platform focuses on adding abstractions on top of the learning so the problem domain area experts can use machine learning without the help of a data scientist.

Sources