This repository contains various end-to-end use case examples using the DataRobot API. Each use case directory contains instructions for its own use.
A simple example to get you started can be found here. Also executable through Google Colab.
For each respective guide, follow the instructions in its own .ipynb
or .Rmd
file.
Please pay attention to the different DataRobot API Endpoints.
The API endpoint you specify for accessing DataRobot is dependent on the deployment environment, as follows:
- AI Platform Trial—https://app2.datarobot.com/api/v2
- US Managed AI Cloud—https://app.datarobot.com/api/v2
- EU Managed AI Cloud—https://app.eu.datarobot.com/api/v2
- On-Premise—https://{datarobot.example.com}/api/v2 (replacing {datarobot.example.com} with your specific deployment endpoint)
The DataRobot API Endpoint is used to connect your IDE to DataRobot.
- To learn to use DataRobot, visit DataRobot University.
- For articles on using DataRobot, feature deep dives, and example workflows, visit DataRobot Community.
- For simple example scripts, visit Examples for Data Scientists.
-
Lead Scoring for selling online courses: Predict who is likely to become a customer by using binary classification strategy. Create a custom feature list. Get the ROC Curve, Feature Impact, and Feature Effects. Plot them for analysis. Retrain your model and make predictions. Python
-
Predict Hospital Readmissions: Predict which patients are likely to be readmitted within 30 days after being discharged by using binary classification. Install the software, find your API token, choose the best model, get the evaluation metrics, and make predictions. Python R
-
Predict COVID-19 at the County Level: Predict high risk counties with a look-alike modeling strategy. Build a binary classification model and rank each county by the probability of seeing cases. Set up the project, get evaluation and interpretability metrics, plot results, and get prediction explanations. Python.
-
Predict Medical Fraud: Predict fraudulent medical claims with binary classification. Connect to a SQL database, create a data store, write custom functions to build multiple projects, conduct anomaly detection and deploy the model using the prediction server. Save the results for a custom dashboard. Python
-
Lead Scoring Bank Marketing: Predict which customers are likely to purchase a product or service in response to a bank telemarketing campaign. Upload data, create a project, and get and plot the ROC Curve and Feature Impact. Get the holdout predictions. R
API Training: The DataRobot API Training is targeted at data scientists and motivated individuals with at least basic coding skills who want to take automation with DataRobot to the next level. Python R
Here you will be able to learn how to use the DataRobot API through a series of exercises that will challenge you, and teach you how to solve some of the most common problems that people run into.
Start by carefully reading the "API Training - Introductory Notebook" Python or R. This will help you learn the basics and provide a concrete overview for the API. Afterwards, go within the /Exercises folder and start downloading and solving the exercises.
The list of exercises is as follows:
-
Classification Model Factory: Create a model factory for a binary classification problem using our readmissions dataset. Predict the likelihood of patient readmission. Build a single project and find the best model. Then, build more projects based on admission id. Find the best model for each subproject. Make this model ready for deployment. Python R
-
Time Series Model Factory: Create a time series model factory using our store sales multiseries dataset. Set up a time series multiseries project. Get the best model and its performance. Cluster the data and create plots over time. Create a project for each cluster and evaluate the results. Python R
-
Automated Retraining and Replacement of Models: Automatically retrain and replace models with this automated continuous training pipeline. Python/cURL
-
Monitoring Drift and Replacing Models: Monitor your deployment for data drift and replace the model once a criteria is met. Connect to a SQL server and create a data store. Create a project based on the data source. Deploy the recommended model and set up drift tracking settings. Upload and make predictions on a dataset with drift. Check the drift results and replace the model. Python
-
Multiclass one-vs-rest Modeling: Create a one-vs-rest model to do geophysical classification with 9 potential classes. Preprocess the data and split up the dataset. Use a loop to build nine projects and put the result into a DataFrame. Then, get the predictions and plot them with an advanced visualization technique. Python
-
Predicting Product Type Based on Customer Complaints: Use the free text from customer complaints to predict which product the customers are addressing. Python
- Predict C02 levels of Mauna Loa: Create an OTV project to predict C02 levels. This project trains on older data and then validates on newer data. This strategy is done because scientists in this case know that the data changes. Import your data, create lagged features, define date-time partitioning, select a model, and get Feature Impact. Python
-
Double Pendulum with Eureqa Models: Solve a regression problem using Eureqa blueprints. Eureqa makes no prior assumptions about the dataset, instead fitting models to the data dynamically. The models are presented as mathematical equations, so end users can seamlessly understand results and recommendations. Set up a manual mode project and select Eureqa blueprints from the repository. Advance tune the default model and print the mathematical expression. Python
-
Analyzing Residuals to Build Better Models: Use residuals created by DataRobot insights to evaluate your models and make them better. Python
- Forecasting US COVID-19 Cases Using Time-Series: Create an AutoTS model on historical data taken from the US, France, and Spain. Clean and prepare the data. Create the time series project and build models. Forecast 10 days ahead for each country and write the results to a CSV file. (Tutorial in DataRobot Community.)R
-
VisualAI Heartbeats: Create a Visual AI project to classify images of sound. Heartbeats of people with normal and atypical heart conditions were recorded onto WAV files. This code shows you how to create spectrograms from the images and import them into DataRobot for Visual AI classification. (Tutorial in DataRobot Community.) Python
-
Detecting Droids with DataRobot: Create a Visual AI project to classify images of droids and create a custom shiny application. Build file paths to images and set up folders for VisualAI. Import that data in the platform and create image classification models. Get evaluation metrics and plot them with ggplot. Create a deployment using the prediction server. Make a shiny app that hits the deployment. (Tutorial in the DataRobot Community.) R
-
Visual AI Oxford Pets: Create a Visual AI project to classify dog breeds! (Tutorial in DataRobot Community.) Python
- Anti-Money Laundering with Outlier Detection: Create an unsupervised model that can predict money-laundering related transactions. Use a small set of labeled data to evaluate how the different models can perform. (Tutorial in DataRobot Community.)Python
-
Amazon Web Services (AWS): This repository has a number of solutions for using DataRobot with AWS. Use DataRobot Prime and Scoring Code models with AWS Lambda. Also use DataRobot Scoring Code models with AWS SageMaker. cURL
-
Database Connections and Writebacks: DataRobot provides a “self-service” JDBC product for database connectivity setup. Once configured, you can read data from production databases for model building and predictions. This allows you to quickly train and retrain models on that data, and avoids the unnecessary step of exporting data from your enterprise database to a CSV file for ingest to DataRobot. Python
- Feature Discovery with Instacart Dataset: An example of how to use Feature Discovery through the Python API. Python
If you'd like to report an issue or bug, suggest improvements, or contribute code to this project, please refer to CONTRIBUTING.md.
This project has adopted the Contributor Covenant for its Code of Conduct. See CODE_OF_CONDUCT.md to read it in full.
Licensed under the Apache License 2.0. See LICENSE to read it in full.