Skip to content
This repository has been archived by the owner on Oct 28, 2019. It is now read-only.

Bug bash instructions

Andrie de Vries edited this page Nov 17, 2015 · 18 revisions

Getting started

1. Installing the package

To install the package directly from github, use:

# Install devtools
if(!require("devtools")) install.packages("devtools")
devtools::install_github("RevolutionAnalytics/AzureML")

2. Find your AzureML credentials

To use any of the functions, you need your AzureML credentials. To find these, read the vignette Getting Started with the AzureML Package.

You will need these credentials to use the function workspace(). This function sets up your credentials in R and allows you to use all of the other functions in the package.

3. Create a json file with your credentials

The easiest way to use the workspace() function is to create a json file in the location ~/.azureml/settings.json

Copy the following and modify with your own credentials:

{"workspace":{
"id"                  : "Add your id here",
"authorization_token" : "Add your authorisation token here",
"api_endpoint"        : "",
"management_endpoint" : ""
}}

Then save the file at ~/.azureml/settings.json. On windows, save the file at C:\Users\<yourname>\Documents\.azureml

If you have any doubt as to the location of `~/", try:

> path.expand("~/")
[1] "C:/Users/adevries/Documents/"

Work with AzureML datasets

4. Download an AzureML dataset to your workspace

Try:

  • Read the help for ?workspace, ?datasets and ?download.datasets
  • Create a workspace object
  • Getting a listing of available datasets in your workspace
  • Download a specific dataset from AzureML as a data frame
ws <- workspace()
d <- datasets(ws)
dat <- download.datasets(d, "Movie Ratings")
head(dat)

Publish an R script as an AzureML API endpoint

5. Publish an API endpoint

You can publish any R function as an API endpoint in AzureML.

  • Read the help for ?publish
  • Try some of the examples in ?publish or ?consume

Here is a more complicated example showing how to create a function that takes ordered factors as input:

# Train a model using diamonds in ggplot2

library(ggplot2)
data(diamonds)
str(diamonds)
train <- diamonds[sample.int(nrow(diamonds), 30000), ]

model <- glm(price ~ carat + clarity + color + cut - 1, data = train, family = Gamma(link = "log"))
summary(model)

# The model works reasonably well, except for some outliers

dd <- diamonds[sample(nrow(diamonds), 1000), ]
plot(exp(predict(model, dd)) ~ dd$price)

# Create a function to publish. The function takes care of converting characters correctly to factors

predictDiamonds <- function(x){
  x$cut     <- factor(x$cut,     levels = levels(diamonds$cut), ordered = TRUE)
  x$clarity <- factor(x$clarity, levels = levels(diamonds$clarity), ordered = TRUE)
  x$color   <- factor(x$color,   levels = levels(diamonds$color), ordered = TRUE)
  exp(predict(model, newdata = x))
}

# Create sample data to score

toScore <- diamonds[1:5, c("price", "cut", "color", "clarity", "carat")]
toScore <- as.data.frame(
  lapply(toScore, function(x)if(is.factor(x) | is.ordered(x)) as.character(x) else x),
  stringsAsFactors = FALSE
)

# Publish the model

ws <- workspace()

# Helper function to convert types correctly

azuremlTypes <- function(x){
  lapply(x, function(x){
    switch(class(x)[1],
           integer = "integer",
           numeric = "numeric",
           double = "numeric",
           character = "character",
           factor = "character",
           ordered = "character",
           logical = "logical",
           stop("unknown class")
    )
  })
}

azuremlTypes(toScore)

# deleteWebService(ws, name = "diamonds") # Use this line to delete duplicate services
# Sys.sleep(1)

# Publish the service

publishWebService(ws, fun = predictDiamonds, name = "diamonds",
                  inputSchema = azuremlTypes(toScore),
                  outputSchema = list(ans = "numeric"),
                  data.frame = TRUE
)

6. Consume the model from R

Now that you've published an API, you can send data for scoring by using the function consume().

  • Read the help for ?consume
  • Try some of the examples

To consume the model you published in the previous section, try:

# Define the endpoint to call

Sys.sleep(2) # AzureML can respond with error if you call too soon after previous call
api <- services(ws, name = "diamonds")
Sys.sleep(1)
ep <- endpoints(ws, api)
Sys.sleep(1)
discoverSchema(ep$HelpLocation)

# Consume the service

Sys.sleep(1)
consume(ep, toScore[1, ], retryDelay = 1)
consume(ep, toScore, retryDelay = 1)

Reporting issues and problems

To report issues or problems, use the issue log or send me a direct message:

Clone this wiki locally