ruk·si

MLflow

Updated at 2018-07-19 23:07

MLflow is a self-hosted open-source machine learning platform by Databricks. It aims to manage 1) data preparation 2) model training 3) model deployment. You can pay Databricks to host the system though.

MLflow...

  • is designed to work with any machine learning tool.
  • is build around a REST API.
  • has three major components; tracking, projects and models.

MLflow Tracking

import mlflow

mlflow.log_param("num_dimensions", 8)
mlflow.log_param("regularization", 0.1)

MLflow Projects

name: My Project
conda_env: conda.yaml
entry_points:
  main:
	parameters:
	  data_file: path
	  regularization: {type: float, default: 0.1}
	command: "python train.py -r {regularization} {data_file}"
  validate:
	parameters:
	  data_file: path
	command: "python validate.py {data_file}"
mlflow run example/project -P alpha=0.5
mlflow run git@github.com:user/repository.git -P alpha=0.5

MLflow Models

Used to package machine learning models into multiple packages called flavors.

time_created: 2018-02-21T13:21:34.12
flavors:
  sklearn:
	sklearn_version: 0.19.1
	pickled_model: model.pkl
  python_function:
	loader_module: mlflow.sklearn
	pickled_model: model.pkl

Sources