ruk·si

MLflow

Updated at 2018-07-20 02:07

MLflow is a self-hosted open-source machine learning platform by Databricks. It aims to manage 1) data preparation 2) model training 3) model deployment.

You can pay Databricks to host MLflow for you though.

MLflow...

  • is designed to work with any machine learning tool
  • is build around a REST API
  • has three major parts; tracking, projects and models

Tracking

You use the tracking API to log parameters, metrics, etc.

import mlflow

mlflow.log_param("num_dimensions", 8)
mlflow.log_param("regularization", 0.1)

Projects

Projects are a format for packaging data science code in a reusable and reproducible way.

You define the expected environment and what commands or "entry points" you can run.

name: My Project
conda_env: my_env.yaml
entry_points:
  main:
    command: "python train.py -r {regularization} {data_file}"
    parameters:
      data_file: path
      regularization: { type: float, default: 0.1 }
  validate:
    command: "python validate.py {data_file}"
    parameters:
      data_file: path
mlflow run example/project -P alpha=0.5
mlflow run [email protected]:user/repository.git -P alpha=0.5

Models

Used to package machine learning models into multiple packages called flavors.

flavors:
  sklearn:
    sklearn_version: 0.19.1
    pickled_model: model.pkl
  python_function:
    loader_module: mlflow.sklearn

Sources