Page 39 - MSDN Magazine, July 2017
P. 39
After running the commands, you’ll see the following output:
Configuration option 'external scripts enabled' changed from 1 to 1. Run the RECONFIGURE statement to install.
Hello, AI! You’re now ready to do data science and AI with SQL Server. Using either Visual Studio, Visual Studio Code or SQL Server Management Studio, you can connect to the SQL Server instance with RorPython.YoucanrunthecodeprovidedinFigure3tomakesure that R and Python are installed correctly. The code will output “Hello AI.”
If you need to install additional R libraries, you should set lib.SQL as the location of the SQL Server library. Similarly, if you need to install additional Python libraries, you can do a pip install of the relevant Python libraries and set the PYTHONPATH to point to where Python has been installed.
You can find the default locations of the R library files at <Drive>:\\Program Files\\Microsoft SQL Server\\MSSQL14.MS- SQLSERVER\\R_SERVICES\\library and Python library files at <Drive>:\\Program Files\\Microsoft SQL Server\\MSSQL14.MS- SQLSERVER\\PYTHON_SERVICES\\library.
Let’s dive deeper into what’s in a stored procedure (with R code) that’s used for training a customer churn classification model. Figure 4 shows a stored procedure that trains a customer churn classification model by building an ensemble of decision trees using R. Figure 5 shows a similar stored procedure that trains a similar churn model using Python. Database developers leverage familiar skills of using T-SQL queries to select data from SQL server tables. The data is used as inputs to the R or Python code.
Figure 3 Code to Ensure R and Python Are Installed Correctly
Figure 5 Creating a Stored Procedure to Train a Random Forest Model Using Python
CREATE PROCEDURE trainPythonRandomForestModel (@trained_model varbinary(max) OUTPUT) AS
BEGIN
execute sp_execute_external_script @language = N'Python', @script = N'
df = churn_train_data
# Get all the columns columns = df.columns.tolist()
# Specify the label column target = "churn"
import numpy as np
from sklearn.ensemble import RandomForestClassifier
churn_model = RandomForestClassifier(n_estimators=20, max-depth=5) churn_model.fit(df[columns], df[target])
import pickle
#Serialize the model as a binary object trained_model = pickle.dumps(lin_model)
@input_data_1 = N'select "TotalMinutesUsedLastMonth", "State", "CallDropRate", "UnPaidBalance", "TotalCallDuration", "TotalDataUsageMB" from dbo.telco_churn_data where Year = 2017'
@input_data_1_name = N'churn_train_data'
@params = N'@trained_model varbinary(max) OUTPUT' @trained_model = @trained_model OUTPUT;
END;
Code for R
exec sp_execute_external_script @language =N'R',
@script=N'OutputDataSet<-InputDataSet', @input_data_1 =N'select ''Hello AI'' as txt' with result sets (([txt] nvarchar(8)));
go
Code for Python
exec sp_execute_external_script @language =N'Python',
@script=N'OutputDataSet = InputDataSet', @input_data_1 =N'select ''Hello AI'' as txt' with result sets (([txt] nvarchar(8)));
Figure 4 Creating a Stored Procedure
to Train a Decision Forest Model Using R
Once the model is trained, it’s serialized and returned as varbi- nary(max). The model can then be stored in a SQL Server table.
The complete sample code for a customer churn model can be found at aka.ms/telcochurnsamples.
With SQL Server 2017 CTP2, you can run Python code in the stored procedures. By supporting Python in SQL Server, it opens up new opportunities for you to use many of the deep learning toolkits (CNTK, TensorFlow and more), which provide Python APIs. In addition, the deep-learning toolkits enable you to specify the use of GPUs when training your model. You can now use SQL Server 2017 to perform intensive deep-learning jobs on text, images, and unstructured data that are stored in SQL Server, and then operationalize and do inference with SQL Server. Very exciting!
Wrapping Up
SQL Server has evolved over the years into a top-notch, enterprise- ready, scalable and hybrid data platform. This lets companies build intelligent, mission-critical applications, backed by decades of database innovations from indexes, spatial indexes, in-memory, column stores, high availability, resource governance and more. With the SQL Server 2017 release, with built-in R and Python support, SQL Server is in a unique position to fuel innovations that database professionals and developers can co-create with the data science and AI communities. The possibilities are endless.n
Wee Hyong has worn many hats in his career: developer, program/product man- ager, data scientist, researcher and strategist. His gamut of experience spanning industry and research has given him unique abilities to help organizations accel- erate their digital transformations using data science and artificial intelligence. You can follow him on Twitter: @weehyong.
THanks to the following Microsoft technical expert for reviewing this article: Joy Qiao
CREATE PROCEDURE trainRDecisionForestModel AS BEGIN
execute sp_execute_external_script @language = N'R', @script = N'
require("RevoScaleR");
labelVar = "churn"
trainVars <- rxGetVarNames(telcoCDR_Data_train) trainVars <- trainVars[!trainVars %in% c(labelVar)]
temp <- paste(c(labelVar, paste(trainVars, collapse = "+")), collapse = "~") formula <- as.formula(temp)
rx_forest_model <- rxDForest(formula = formula,
data = telcoCDR_Data_train,
nTree = 8, maxDepth = 32, mTry = 2, minBucket=1, replace = TRUE, importance = TRUE, seed=8, parms=list(loss=c(0,4,1,0)))
rxDForest_model <- data.frame(
payload = as.raw(serialize(rx_forest_model, connection=NULL))); ' @input_data_1 = N'select * from telcoCDR_Data_train' @input_data_1_name = N'telcoCDR_Data_train'
@output_data_1_name = N'rxDForest_model'
with result sets ((model varbinary(max))); end;
msdnmagazine.com
July 2017 33