ruk·si

Azure Machine Learning

Updated at 2017-11-20 07:41

Microsoft Azure Machine Learning is a drag-and-drop exploration, predictive modeling and deployment system. The visual editor part of it is called Azure Machine Learning Studio.

Azure ML Studio is used to create "experiments".

  • Experiments are directed graphs.
  • Graph nodes are called modules.
  • Edges show the data flow.

Azure ML uses PowerShell for automation. You can use PowerShell to simplify or automate many tasks in Azure Machine Learning.

Don't reinvent the wheel. Cortana Intelligence Gallery has a lot of experiment examples to copy, so you might want to check those out for your specific problem before making your own.

Modules

Modules can be used to do various operations:

  • Importing datasets from data source.
  • Exporting datasets to storage or database.
  • Transforming, filtering, combining, sampling and splitting datasets.
  • Automated feature selection.
  • Initializing, training, scoring and evaluating machine learning models.
  • Prepackaged OpenCV, statistical and text analysis modules.
  • Custom modules with R or Python.

You can visualize outputs by right clicking the module. Just select "Visualize" under the output you want to examine. This only allows to see limited part of the data so you need to download the outputs to explore them in more detail.

Machine Learning Studio supports datasets of up to 10 GB. Modules cannot process more than that, some can't even handle the 10 GB. And the 10 GB is the total size of all inputs in a module. But you can analyze larger datasets with some "Learning by Counts" module trickery.

You can create your own modules in R or Python. Write R or Python in web browser; take some inputs and spit out some outputs. Or you can upload your code in a zip and import that in the "Execute Python Script" module. These custom Python scripts can also generate visualizations.

dataset1 <- maml.mapInputPort(1)
dataset2 <- maml.mapInputPort(2)
data.set = rbind(dataset1, dataset2);
maml.mapOutputPort("data.set");
import pandas as pd

# both inputs are pandas data frames
def azureml_main(dataframe1 = None, dataframe2 = None):
    # "import Hello" if you provide zip with "Hello.py" to this module
    # Do your transformation here.
    return dataframe1,

Use the prepackaged machine learning model modules. You can do a custom model in R, but if you need customization like that, you will be better off with some other machine learning system than Azure ML.

Data

Azure Machine Learning Studio works with tabular data. Your datasets should be all rows and columns.

Studio offers some common data transformation modules. Like removing rows with missing values and such.

Supported data formats for datasets:

  • Plain text (.txt)
  • CSV with a header (.csv)
  • CSV without (.nh.csv)
  • TSV with a header (.tsv)
  • TSV without (.nh.tsv)
  • Excel file
  • Azure table
  • Hive table
  • SQL database table
  • OData values
  • SVMLight data (.svmlight)
  • Attribute Relation File Format data (.arff)
  • Zip file (.zip)
  • R object or workspace file (.RData)
  • Images (filename, RGB for each pixel)

Supported data types for column values:

  • String
  • Integer
  • Double
  • Boolean
  • DateTime
  • TimeSpan

Internally Studio will pass data as "data tables". When a module accept other data formats, the data is silently converted to a data table, but you can also convert data formats to data tables with the Convert to Dataset module.

You can get your data to Azure Machine Learning Studio in four ways:

  1. Enter data manually in the browser with Enter Data Manually module.
  2. Upload data yourself through browser to create a dataset module.
  3. Press "+ New" at the bottom left and "FROM LOCAL FILE".
  4. Tick the "This is the new version..." checkbox if updating an old file.
  5. Import data from an online data source using Import data module.
  6. Web URL
  7. Hadoop/HiveQL
  8. Azure blob storage, table, SQL database, CosmosDB
  9. OData data feed provider
  10. Any SQL Server database
  11. Import a dataset saved in another experiment. You save datasets by right clicking a module, going to the output and selecting "Save as Dataset".

You can easily access Studio datasets in local Python and R. Just right click the module, select the output and select "Generate Data Access Code".

Deployment

You can deploy the trained model using Azure Machine Learning Web Service. Then external applications can get predictions from your models.

There are two ways to use the web services:

  1. Request-Response Service (RRS): Synchronously ask a prediction for a single sample or "row".
  2. Batch Execution Service (BES): Asynchronously ask a prediction for multiple samples or "rows".

You can use Jupyter Notebooks and RStudio -made models in deployment. Then you bypass most of the ML Studio stuff though.

You cannot deploy a model locally. You can only host the models on Azure as most of the modules used in training is code owned by Microsoft.

Here is how you create a web service:

  1. Add Web service input module and attach it to Score Model module's "Dataset" input.
  2. Add Web service output module and attach it to Score Model module's output.
  3. Hover "Set up web service" in the lower menu and select "Predictive Web Service"; this will create another experiment where everything is set up.
  4. Press "Run"

Deployment is not ready for production use. You can use it for internal products though.

Other

You can save a trained model to Azure workspace. Right click on the "Train Model" module, hover "Trained model" and click "Save as Trained Model". Now module menu has section "Trained Models" from where you can drag your model out.

Sources