Page 46 - MSDN Magazine, July 2017
P. 46
• Defining the Machine Learning “experiment” that will gen- erate the estimated outcome by choosing the appropriate algorithm for the process.
• Executing the Training, Scoring and Evaluation processes to build the predictive model.
• Deploying a REST service that external applications can con- sume to obtain the predicted outcome “as a service.” This is pure Machine Learning as a Service (MLaaS).
Let’s go through these steps in detail.
Dataset
To develop and train a predictive analytics solution in Azure Machine Learning Studio, it’s necessary to import a dataset to analyze. There are several options for importing data and using it in a machine learning experiment:
1. Upload a local file. This is a manual task. Azure Machine Learning Studio, at the time of this writing, supports the upload of .csv and .tsv files, plain text (.txt), SVM Light (.svmlight), Attribute Relation File Format (.arff ), R Object or Workspace (.RData), and .zip files.
2. Access data from an existing data source. The currently supported data sources are a Web URL using HTTP, Hadoop using HiveQL, Azure Blob or Table storage, Azure SQL Database or SQL Server on Azure Virtual Machine, on-premises SQL Server database, and any OData feed.
3. From another Azure Machine Learning experiment saved as a dataset.
More information about importing training data into Azure Machine Learning Studio from various data sources is available at bit.ly/2pXhgus.
Experiment
Once the dataset is available, and assuming you call it “cache hits,” it’s possible to build the machine learning experiment that will analyze data, identify patterns of usage of objects in cache and forecast the future demand. This estimation, expressed as a number of hits under certain conditions, will be used for populating the L2 cache in Redis.
Building a new experiment consists of defining a flow of tasks that are executed by the machine learning engine. In its more linear representation, a demand estimation experiment includes the following steps:
• Add the dataset as the starting point of the experiment. On the left-hand panel, existing datasets are available under Saved Datasets | My Datasets.
• Optionally, apply a transformation to the existing dataset. This can be done for several reasons, including data cleansing or improv- ing data accuracy. You’ll use the Select Columns in Dataset (bit.ly/2qTUQZ3) and the Split Data (bit.ly/2povZKP) modules to reduce the number of columns to analyze, and divide your dataset into two distinct sets.
This is a necessary step when you want to separate data into training and testing sets, so that you can evaluate a model on a holdout dataset. The two modules are available under the Data Transformation section.
• You can now train the model on the first split of the dataset by adding the Train Model (bit.ly/2qU2J0N) module as a next step in the experiment flow, and connect it to the left-hand side connection point of the Split Data module (the fraction of rows to consider for training). The Train Model module can be found under Machine Learning | Train. In the Column Set option, select the label column in the training dataset. The column should contain the known values for the class or outcome you want to predict. This is a numeric data type, in the example project the cache hits by object, on a given day.
• Training a model requires to connect and configure one of the classifications or regression models provided in Azure Machine Learning Studio. You’ll use the Linear Regression (bit.ly/2qj01ol) module available in the Machine Learning | Initialize Model | Regression section.
Linear Regression
When a value is predicted, as with cache hits, the learning process is called “regression.” Training a regression model is a form of supervised machine learning (bit.ly/2qj01ol). That means you must provide a dataset that contains historical data from which to learn patterns. The data should contain both the outcome you’re trying to predict, and related factors (variables). The machine learning model uses the data to extract statistical patterns and build a model.
Regression algorithms (bit.ly/2qj6uiX) are algorithms that learn to predict the value of a real function for a single instance of data.
40 msdn magazine
Machine Learning
Figure 7 Cache Hits Estimation Experiment

