MLflow
MLflow
Updated at 2018-07-20 02:07
MLflow is a self-hosted open-source machine learning platform by Databricks. It aims to manage 1) data preparation 2) model training 3) model deployment.
You can pay Databricks to host MLflow for you though.
MLflow...
- is designed to work with any machine learning tool
- is build around a REST API
- has three major parts; tracking, projects and models
Tracking
You use the tracking API to log parameters, metrics, etc.
import mlflow
mlflow.log_param("num_dimensions", 8)
mlflow.log_param("regularization", 0.1)
Projects
Projects are a format for packaging data science code in a reusable and reproducible way.
You define the expected environment and what commands or "entry points" you can run.
name: My Project
conda_env: my_env.yaml
entry_points:
main:
command: "python train.py -r {regularization} {data_file}"
parameters:
data_file: path
regularization: { type: float, default: 0.1 }
validate:
command: "python validate.py {data_file}"
parameters:
data_file: path
mlflow run example/project -P alpha=0.5
mlflow run [email protected]:user/repository.git -P alpha=0.5
Models
Used to package machine learning models into multiple packages called flavors.
flavors:
sklearn:
sklearn_version: 0.19.1
pickled_model: model.pkl
python_function:
loader_module: mlflow.sklearn