⛅️ Google Cloud Machine Learning
Updated at 2018-09-24 19:55
Google Cloud Machine Learning (GCML) runs TensorFlow code so you need to setup that first.
Running Google ML wrapped trainer locally.
# install Google Cloud SDK https://cloud.google.com/sdk/
git clone git@github.com:GoogleCloudPlatform/cloudml-samples.git
cd cloudml-samples/mnist/trainable/
# move the "trainer" directory to your venv with tensorflow installed
# cd to directory containing "trainer"
CLOUDSDK_PYTHON=/Users/ruksi/venvs/tensortest/bin/python gcloud beta ml \
local train \
--package-path=trainer \
--module-name=trainer.task
# let it finish
tensorboard --logdir=data/ --port=8080
# navigate to http://localhost:8080/ in browser
Preparing a storage bucket for Google ML.
- Create new project e.g. "machine-learning-test", this will generate unique identifier for the project e.g. "machine-learning-test-666".
- Go to
Top-left Menu > API Manager
. - Click
Enable API
, search forMachine Learning
and enable the API. - Wait for the API to become enabled.
- Go to
Top-left Menu > Machine Learning
. - Go to
Top-left Menu > Storage
- Click
Create Bucket
, name itmachine-learning-test-666-ml
and set storage class to "Nearline - United States (any region)".
Running the trainer in the cloud.
gcloud config set project machine-learning-test-666
JOB_NAME=mnist_${USER}_$(date +%Y%m%d_%H%M%S)
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
TRAIN_BUCKET=gs://${PROJECT_ID}-ml
TRAIN_PATH=${TRAIN_BUCKET}/${JOB_NAME}
gsutil rm -rf ${TRAIN_PATH}
gcloud beta ml jobs submit training ${JOB_NAME} \
--package-path=trainer \
--module-name=trainer.task \
--staging-bucket="${TRAIN_BUCKET}" \
--region=us-central1 \
-- \
--train_dir="${TRAIN_PATH}/train"
# to check the job status:
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}
gsutil ls ${TRAIN_PATH}/train
tensorboard --logdir=${TRAIN_PATH}/train --port=8080
Running a distributed trainer locally.
cd cloudml-samples/mnist/distributed/
rm -rf output/
CLOUDSDK_PYTHON=/Users/ruksi/venvs/tensortest/bin/python gcloud beta ml local train \
--package-path=trainer \
--module-name=trainer.task \
--distributed \
-- \
--train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz \
--eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz \
--output_path=output
Running the distributed trainer in the cloud.
JOB_NAME=mnist_distributed_${USER}_$(date +%Y%m%d_%H%M%S)
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
TRAIN_BUCKET=gs://${PROJECT_ID}-ml
TRAIN_PATH=${TRAIN_BUCKET}/${JOB_NAME}
gsutil rm -rf ${TRAIN_PATH}
cat << EOF > config.yaml
trainingInput:
# Use a cluster with many workers and a few parameter servers.
scaleTier: STANDARD_1
EOF
gcloud beta ml jobs submit training ${JOB_NAME} \
--package-path=trainer \
--module-name=trainer.task \
--staging-bucket="${TRAIN_BUCKET}" \
--region=us-central1 \
--config=config.yaml \
-- \
--train_data_paths="gs://cloud-ml-data/mnist/train.tfr.gz" \
--eval_data_paths="gs://cloud-ml-data/mnist/eval.tfr.gz" \
--output_path="${TRAIN_PATH}/output"
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}
# Wait it to complete.
gsutil ls ${TRAIN_PATH}/output
# You can inspect the logs through ML > Jobs > the job > View logs
tensorboard --logdir=${TRAIN_PATH}/output --port=8080
Predictions on the Cloud
cloudml-samples/mnist/deployable
rm -f data/{checkpoint,events,export}*
gcloud beta ml local train \
--package-path=trainer \
--module-name=trainer.task
# exported model will be in files data/export and data/export.meta.
JOB_NAME=mnist_deployable_${USER}_$(date +%Y%m%d_%H%M%S)
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
TRAIN_BUCKET=gs://${PROJECT_ID}-ml
TRAIN_PATH=${TRAIN_BUCKET}/${JOB_NAME}
gsutil rm -rf ${TRAIN_PATH}
gcloud beta ml jobs submit training ${JOB_NAME} \
--package-path=trainer \
--module-name=trainer.task \
--staging-bucket="${TRAIN_BUCKET}" \
--region=us-central1 \
-- \
--train_dir="${TRAIN_PATH}/train" \
--model_dir="${TRAIN_PATH}/model"
# wait for the job to finish
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}
# you could also do the following through the web interface
MODEL_NAME=mnist_${USER}_$(date +%Y%m%d_%H%M%S)
gcloud beta ml models create ${MODEL_NAME}
gcloud beta ml models versions create \
--origin=${TRAIN_PATH}/model/ \
--model=${MODEL_NAME} \
v1
gcloud beta ml models versions set-default --model=${MODEL_NAME} v1
# we can ask for predictions now using a sample dataset
head -n2 data/predict_sample.tensor.json
{"image": [0.0, 0.0, ...], "key": 0}
{"image": [0.0, 0.0, ...], "key": 1}
gcloud beta ml predict --model=${MODEL_NAME} \
--instances=data/predict_sample.tensor.json
# we can also send patch prediction requests
JOB_NAME=predict_mnist_${USER}_$(date +%Y%m%d_%H%M%S)
gsutil cp data/predict_sample.tensor.json ${TRAIN_PATH}/data/
gcloud beta ml jobs submit prediction ${JOB_NAME} \
--model=${MODEL_NAME} \
--data-format=TEXT \
--input-paths=${TRAIN_PATH}/data/predict_sample.tensor.json \
--output-path=${TRAIN_PATH}/output \
--region=us-central1
# wait for it to finish
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}
# check the predictions
gsutil ls ${TRAIN_PATH}/output/
gsutil cat ${TRAIN_PATH}/output/prediction.results-00000-of-00003
Hyperparameter Optimization
Currently allowed hyperparameter definitions:
goal: MAXIMIZE, MINIMIZE
maxTrials
maxParallelTrials
params:
type: INTEGER, DOUBLE, CATEGORICAL, DISCRETE
scaleType: UNIT_LINEAR_SCALE, UNIT_LOG_SCALE
minValue, maxValue, categoricalValues
Hyperparameter tuning locally.
cd cloudml-samples/mnist/hptuning/
rm -rf output/
gcloud beta ml local train \
--package-path=trainer \
--module-name=trainer.task \
--distributed \
-- \
--train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz \
--eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz \
--output_path=output
Hyperparameter tuning in the cloud.
JOB_NAME=mnist_hptuning_${USER}_$(date +%Y%m%d_%H%M%S)
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
TRAIN_BUCKET=gs://${PROJECT_ID}-ml
TRAIN_PATH=${TRAIN_BUCKET}/${JOB_NAME}
gsutil rm -rf ${TRAIN_PATH}
cat << EOF > config.yaml
trainingInput:
# Use a cluster with many workers and a few parameter servers.
scaleTier: STANDARD_1
# Hyperparameter-tuning specification.
hyperparameters:
# Maximize the objective value.
goal: MAXIMIZE
# Run at most 10 trials with different hyperparameters.
maxTrials: 10
# Run two trials at a time.
maxParallelTrials: 2
params:
# Allow the size of the first hidden layer to vary between 40 and 400.
# One value in this range will be passed to each trial via the
# --hidden1 command-line flag.
- parameterName: hidden1
type: INTEGER
minValue: 40
maxValue: 400
scaleType: UNIT_LINEAR_SCALE
# Allow the size of the second hidden layer to vary between 5 and 250.
# One value in this range will be passed to each trial via the
# --hidden2 command-line flag.
- parameterName: hidden2
type: INTEGER
minValue: 5
maxValue: 250
scaleType: UNIT_LINEAR_SCALE
# Allow the learning rate to vary between 0.0001 and 0.5.
# One value in this range will be passed to each trial via the
# --learning_rate command-line flag.
- parameterName: learning_rate
type: DOUBLE
minValue: 0.0001
maxValue: 0.5
scaleType: UNIT_LOG_SCALE
EOF
gcloud beta ml jobs submit training ${JOB_NAME} \
--package-path=trainer \
--module-name=trainer.task \
--staging-bucket="${TRAIN_BUCKET}" \
--region=us-central1 \
--config=config.yaml \
-- \
--train_data_paths="gs://cloud-ml-data/mnist/train.tfr.gz" \
--eval_data_paths="gs://cloud-ml-data/mnist/eval.tfr.gz" \
--output_path="${TRAIN_PATH}/output"
# wait for it to finish:
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}
# check the output, there should be 10 output directories, one for each trial
gsutil ls ${TRAIN_PATH}/output
tensorboard --logdir=${TRAIN_PATH}/output/10 --port=8080
Cloud Datalab
Interactive notebook environment, based on Jupyter, to work with data.
cd Projects
mkdir -p datalab
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
docker run -it \
-p "127.0.0.1:8081:8080" \
-v "${HOME}/Projects/datalab:/content" \
-e "PROJECT_ID=${PROJECT_ID}" \
gcr.io/cloud-datalab/datalab:local
# navigate to http://localhost:8081/
Feature Extraction
Google Cloud Machine Learning Python SDK has alpha feature extraction for CSV source data, but that's it.