Page 25 - MSDN Magazine, July 2017
P. 25

Consider an arboretum that has an inventory of many species of flowers from all around the world. Now the organization wants to find and classify types of iris flowers in its inventory. The arboretum’s data scientist trained a model to label types of iris flowers using R on a single machine, but they have a great many species of flowers from all over the world and the pre-trained model can’t complete this simple task of identifying an iris flower. What’s needed is an intelligent, scalable data processing engine. The overall process to use these U-SQL R extensions to do prediction at scale is simply:
• Use the REFERENCE ASSEMBLY statement to include the R U-SQL extension to run R code in the U-SQL script. • Use the DEPLOY RESOURCE operation to upload the
pre-trained model as a resource on executing nodes.
• Use DECLARE to inline the R script in the U-SQL script. • Use the EXTRACT operation to load data into a rowset.
• Use the Extension.R.Reduce function to run the R script to
score each row in rowset using the uploaded pre-trained model. • Use the OUTPUT operation to store the result into a
persistent store.
Figure 4 shows the U-SQL script that carries out this process. In this simple U-SQL query, we’re using the U-SQL R extension
to do scoring at scale. The R and Python U-SQL extensions get au- tomatically installed and registered with ADLA account database when you install the U-SQL Advance Analytics Extension. In the U-SQL script, we first deploy the pre-existing model, which was trained using R on a single machine. This highlights the fact that it wasn’t trained using ADLA/U-SQL framework. Next, we extract and de-serialize the iris dataset into columns using the system-provided .csv format extractor, Extractors.cvs, and load the data into rowsets. Next, we generate a random number that will be used later to par- tition data to enable parallel processing. Then, we use the U-SQL R extension UDO Extension.R.Reducer and pass the R script that
Figure 3 Code Snippets for Other Cognitive APIs Supported in U-SQL
Figure 4 Using Pre-Existing Model in U-SQL Script
REFERENCE ASSEMBLY [ExtR];
DEPLOY RESOURCE @"/usqlext/samples/R/my_model_LM_Iris.rda";
// R script to score using pre trained R model DECLARE @MyRScript =
@"
load(""my_model_LM_Iris.rda"")
outputToUSQL=data.frame(predict(lm.fit, inputFromUSQL, interval=""confidence""))
";
DECLARE @PartitionCount int = 10;
@InputData =
EXTRACT SepalLength double,
SepalWidth double, PetalLength double, PetalWidth double, Species string
FROM @"/usqlext/samples/R/iris.csv"; USING Extractors.Csv();
@ExtendedData =
SELECT Extension.R.RandomNumberGenerator.GetRandomNumber(@PartitionCount) AS Par,
SepalLength, SepalWidth, PetalLength, PetalWidth
FROM @InputData;
// Predict Species
@RScriptOutput= REDUCE @ExtendedData
ON Par PRODUCE Par,
fit double, lwr double, upr double
READONLY Par USING
new Extension.R.Reducer(command:@MyRScript , rReturnType:"dataframe", stringsAsFactors:false);
OUTPUT @RScriptOutput
TO @"/Output/LMPredictionsIris.txt" USING Outputters.Tsv();
does the prediction, along with the model. Finally, we output the confidence interval for each flower from the inventory.
We started with a simple U-SQL script to understand the content of images, which is typically considered opaque. The script automatical- ly scales across hundreds of machines to transform images efficient- ly into actionable insights that can power intelligent applications. We also showcase how you can reuse an existing model that was trained using the popular R/Python environment and apply the model to do prediction on a massive amount of data using U-SQL R Extension. This is what can power the intelligence revolution. n
Hiren Patel is a senior technical program manager at Microsoft. He has been part of the Big Data group since 2011 and worked on designing and developing various aspect of the Cosmos/ADLA distributed execution engine, including lan- guage, optimizer, runtime and scheduling.
sHravan MattHur narayanaMurtHy is a senior engineering manager at Microsoft leading the Big Data Machine Learning team. He has several years of experience researching and developing machine learning algorithms that operate at cloud scale and distributed systems.
tHanks to the following Microsoft technical experts who reviewed this article: Saveen Reddy and Michael Rys
// Estimate age and gender for human faces @faces =
PROCESS @imgs PRODUCE FileName,
NumFaces int, FaceAge string, FaceGender string READONLY FileName
USING new Cognition.Vision.FaceDetector();
// Apply OCR @ocrs =
PROCESS @imgs PRODUCE FileName,
Text string
READONLY FileName
USING new Cognition.Vision.OcrExtractor();
// Sentiment Analysis on War and Peace @sentiment =
PROCESS @WarAndPeace
PRODUCE No, Year, Book, Chapter,
Text, Sentiment string, Conf double
READONLY No, Year,
Book, Chapter, Text
USING new Cognition.Text.SentimentAnalyzer(true)
msdnmagazine.com
July 2017 21









































   23   24   25   26   27