⛅️ Google Cloud Machine Learning

Updated at 2018-09-24 19:55

Google Cloud Machine Learning (GCML) runs TensorFlow code so you need to setup that first.

TensorFlow notes

Running Google ML wrapped trainer locally.

# install Google Cloud SDK https://cloud.google.com/sdk/
git clone [email protected]:GoogleCloudPlatform/cloudml-samples.git
cd cloudml-samples/mnist/trainable/
# move the "trainer" directory to your venv with tensorflow installed
# cd to directory containing "trainer"
CLOUDSDK_PYTHON=/Users/ruksi/venvs/tensortest/bin/python gcloud beta ml \
    local train \
    --package-path=trainer \
    --module-name=trainer.task
# let it finish
tensorboard --logdir=data/ --port=8080
# navigate to http://localhost:8080/ in browser

Preparing a storage bucket for Google ML.

Create new project e.g. "machine-learning-test", this will generate unique identifier for the project e.g. "machine-learning-test-666".
Go to Top-left Menu > API Manager.
Click Enable API, search for Machine Learning and enable the API.
Wait for the API to become enabled.
Go to Top-left Menu > Machine Learning.
Go to Top-left Menu > Storage
Click Create Bucket, name it machine-learning-test-666-ml and set storage class to "Nearline - United States (any region)".

Running the trainer in the cloud.

gcloud config set project machine-learning-test-666
JOB_NAME=mnist_${USER}_$(date +%Y%m%d_%H%M%S)
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
TRAIN_BUCKET=gs://${PROJECT_ID}-ml
TRAIN_PATH=${TRAIN_BUCKET}/${JOB_NAME}
gsutil rm -rf ${TRAIN_PATH}
gcloud beta ml jobs submit training ${JOB_NAME} \
  --package-path=trainer \
  --module-name=trainer.task \
  --staging-bucket="${TRAIN_BUCKET}" \
  --region=us-central1 \
  -- \
  --train_dir="${TRAIN_PATH}/train"
# to check the job status:
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}
gsutil ls ${TRAIN_PATH}/train

tensorboard --logdir=${TRAIN_PATH}/train --port=8080

Running a distributed trainer locally.

cd cloudml-samples/mnist/distributed/
rm -rf output/
CLOUDSDK_PYTHON=/Users/ruksi/venvs/tensortest/bin/python gcloud beta ml local train \
  --package-path=trainer \
  --module-name=trainer.task \
  --distributed \
  -- \
  --train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz \
  --eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz \
  --output_path=output

Running the distributed trainer in the cloud.

JOB_NAME=mnist_distributed_${USER}_$(date +%Y%m%d_%H%M%S)
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
TRAIN_BUCKET=gs://${PROJECT_ID}-ml
TRAIN_PATH=${TRAIN_BUCKET}/${JOB_NAME}
gsutil rm -rf ${TRAIN_PATH}

cat << EOF > config.yaml
trainingInput:
  # Use a cluster with many workers and a few parameter servers.
  scaleTier: STANDARD_1
EOF

gcloud beta ml jobs submit training ${JOB_NAME} \
  --package-path=trainer \
  --module-name=trainer.task \
  --staging-bucket="${TRAIN_BUCKET}" \
  --region=us-central1 \
  --config=config.yaml \
  -- \
  --train_data_paths="gs://cloud-ml-data/mnist/train.tfr.gz" \
  --eval_data_paths="gs://cloud-ml-data/mnist/eval.tfr.gz" \
  --output_path="${TRAIN_PATH}/output"

gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}
# Wait it to complete.

gsutil ls ${TRAIN_PATH}/output
# You can inspect the logs through ML > Jobs > the job > View logs
tensorboard --logdir=${TRAIN_PATH}/output --port=8080

Predictions on the Cloud

cloudml-samples/mnist/deployable
rm -f data/{checkpoint,events,export}*
gcloud beta ml local train \
    --package-path=trainer \
    --module-name=trainer.task
# exported model will be in files data/export and data/export.meta.

JOB_NAME=mnist_deployable_${USER}_$(date +%Y%m%d_%H%M%S)
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
TRAIN_BUCKET=gs://${PROJECT_ID}-ml
TRAIN_PATH=${TRAIN_BUCKET}/${JOB_NAME}
gsutil rm -rf ${TRAIN_PATH}
gcloud beta ml jobs submit training ${JOB_NAME} \
  --package-path=trainer \
  --module-name=trainer.task \
  --staging-bucket="${TRAIN_BUCKET}" \
  --region=us-central1 \
  -- \
  --train_dir="${TRAIN_PATH}/train" \
  --model_dir="${TRAIN_PATH}/model"

# wait for the job to finish
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}

# you could also do the following through the web interface
MODEL_NAME=mnist_${USER}_$(date +%Y%m%d_%H%M%S)
gcloud beta ml models create ${MODEL_NAME}
gcloud beta ml models versions create \
  --origin=${TRAIN_PATH}/model/ \
  --model=${MODEL_NAME} \
  v1
gcloud beta ml models versions set-default --model=${MODEL_NAME} v1

# we can ask for predictions now using a sample dataset
head -n2 data/predict_sample.tensor.json
{"image": [0.0, 0.0, ...], "key": 0}
{"image": [0.0, 0.0, ...], "key": 1}

gcloud beta ml predict --model=${MODEL_NAME} \
  --instances=data/predict_sample.tensor.json

# we can also send patch prediction requests
JOB_NAME=predict_mnist_${USER}_$(date +%Y%m%d_%H%M%S)
gsutil cp data/predict_sample.tensor.json ${TRAIN_PATH}/data/
gcloud beta ml jobs submit prediction ${JOB_NAME} \
    --model=${MODEL_NAME} \
    --data-format=TEXT \
    --input-paths=${TRAIN_PATH}/data/predict_sample.tensor.json \
    --output-path=${TRAIN_PATH}/output \
    --region=us-central1

# wait for it to finish
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}

# check the predictions
gsutil ls ${TRAIN_PATH}/output/
gsutil cat ${TRAIN_PATH}/output/prediction.results-00000-of-00003

Hyperparameter Optimization

Currently allowed hyperparameter definitions:

goal: MAXIMIZE, MINIMIZE
maxTrials
maxParallelTrials

params:
    type: INTEGER, DOUBLE, CATEGORICAL, DISCRETE
    scaleType: UNIT_LINEAR_SCALE, UNIT_LOG_SCALE
    minValue, maxValue, categoricalValues

Hyperparameter tuning locally.

cd cloudml-samples/mnist/hptuning/
rm -rf output/
gcloud beta ml local train \
  --package-path=trainer \
  --module-name=trainer.task \
  --distributed \
  -- \
  --train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz \
  --eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz \
  --output_path=output

Hyperparameter tuning in the cloud.

JOB_NAME=mnist_hptuning_${USER}_$(date +%Y%m%d_%H%M%S)
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
TRAIN_BUCKET=gs://${PROJECT_ID}-ml
TRAIN_PATH=${TRAIN_BUCKET}/${JOB_NAME}
gsutil rm -rf ${TRAIN_PATH}

cat << EOF > config.yaml
trainingInput:
  # Use a cluster with many workers and a few parameter servers.
  scaleTier: STANDARD_1
  # Hyperparameter-tuning specification.
  hyperparameters:
    # Maximize the objective value.
    goal: MAXIMIZE
    # Run at most 10 trials with different hyperparameters.
    maxTrials: 10
    # Run two trials at a time.
    maxParallelTrials: 2
    params:
      # Allow the size of the first hidden layer to vary between 40 and 400.
      # One value in this range will be passed to each trial via the
      # --hidden1 command-line flag.
      - parameterName: hidden1
        type: INTEGER
        minValue: 40
        maxValue: 400
        scaleType: UNIT_LINEAR_SCALE
      # Allow the size of the second hidden layer to vary between 5 and 250.
      # One value in this range will be passed to each trial via the
      # --hidden2 command-line flag.
      - parameterName: hidden2
        type: INTEGER
        minValue: 5
        maxValue: 250
        scaleType: UNIT_LINEAR_SCALE
      # Allow the learning rate to vary between 0.0001 and 0.5.
      # One value in this range will be passed to each trial via the
      # --learning_rate command-line flag.
      - parameterName: learning_rate
        type: DOUBLE
        minValue: 0.0001
        maxValue: 0.5
        scaleType: UNIT_LOG_SCALE
EOF

gcloud beta ml jobs submit training ${JOB_NAME} \
  --package-path=trainer \
  --module-name=trainer.task \
  --staging-bucket="${TRAIN_BUCKET}" \
  --region=us-central1 \
  --config=config.yaml \
  -- \
  --train_data_paths="gs://cloud-ml-data/mnist/train.tfr.gz" \
  --eval_data_paths="gs://cloud-ml-data/mnist/eval.tfr.gz" \
  --output_path="${TRAIN_PATH}/output"

# wait for it to finish:
gcloud beta ml jobs describe --project ${PROJECT_ID} ${JOB_NAME}

# check the output, there should be 10 output directories, one for each trial
gsutil ls ${TRAIN_PATH}/output

tensorboard --logdir=${TRAIN_PATH}/output/10 --port=8080

Cloud Datalab

Interactive notebook environment, based on Jupyter, to work with data.

cd Projects
mkdir -p datalab
PROJECT_ID=`gcloud config list project --format "value(core.project)"`
docker run -it \
  -p "127.0.0.1:8081:8080" \
  -v "${HOME}/Projects/datalab:/content" \
  -e "PROJECT_ID=${PROJECT_ID}" \
  gcr.io/cloud-datalab/datalab:local

# navigate to http://localhost:8081/

Feature Extraction

Google Cloud Machine Learning Python SDK has alpha feature extraction for CSV source data, but that's it.

Sources

Google Cloud Machine Learning Documentation