Airbnb Bighead
Updated at 2018-09-24 20:40
Bighead is internal machine learning system at Airbnb.
Situation in 2016-Q4:
- machine learning used for search rankings, smart pricing and fraud detection
- it took up on average 10 weeks to build a model
- built with Aerosolve, Spark and Scala
- no support for the latest tooling like TensorFlow, Torch or sklearn
- no consistency between machine learning workflows
- new teams struggle using machine learning
- build-and-forget -problems all around
Situation in 2018-Q1:
- machine learning used for classifying lists, room type classification, experience ranking, personalization, host availability, business travel classifier, making listing a space easier, customer service ticket routing
- everything standardized and automated using Bighead
The main focus of Bighead is to:
- remove incidental complexity by providing generic, reusable solutions
- simplify the workflow
- provide tools, libraries and environments for machine learning
- sharing feature data and model components inside the company
- make it easy to do the right thing e.g. consistent training/streaming/scoring logic
Bighead architecture:
- Zipline: data management framework, define features, data quality monitoring
- ML Automator: offline training and inference, periodic training, alert on score changes, uses Airflow to orchestrate these tasks
- Bighead Library: provide standard transformations for NLP and images, provide visualizations for the data, pass metadata about the original data samples, simple serializations and deserialization
- Bighead Service: supplies single source of truth to track model history, make model training reproducible, keeping evaluation metrics in a single place, model health data, model version (model code and docker image), model artifact (weights learned while training), models are always wrapped with a lightweight model API to integrate with other services
- Bighead UI: deployment rollback, review changes, model health metrics, alerts, split traffic to two or more models
- DeepThought: online inference service, allow data scientists to launch new models in production, share data transformations with training and inference, median response time 4ms
- Redspot: multi-tenant Jupyter Notebook environment, JupyterHub with remote instances running containerized notebooks, persistent notebooks in EFS
Sources
- Airbnb meetup presentation about the system