Server side application:
This package provides a modern RESTFul predictor API created using Python programming language and Flask-RESTful library. It is designed to be used for various predictors due to its flexible input/output data definition (multi-dimensional features, multiple subjects, etc.). On top of that, the predictor API provides endpoints for user authentication and JWT-based request authorization, it supports handling of cross-origin resource sharing, request-response caching, advanced logging, etc. It comes also with the basic support for containerization via Docker (Dockerfile and docker-compose).
Client side application:
To make the use of the Predictor API as easy as possible, there is a PyPi-installable lightweight client side application named Predictor API client that provides method-based calls to all endpoints accessible on the API. For more information about the Predictor API client, please read the official readme and documentation.
Endpoints:
- predictor endpoints (
api/resources/predict
andapi/resources/predict_proba
)/predict
- calls.predict
on the specified predictor. This endpoint is designed to be used to get the predicted values (e.g. classification: class label, regression: predicted value)./predict_proba
- calls.predict_proba
on the specified predictor. This endpoint is supposed to be used to get the predicted probabilities (e.g. classification: class probabilities).
- security endpoints (
api/resources/security
)/signup
- signs-up a new user./login
- logs-in an existing user (obtains access and refresh JWT tokens)./refresh
- refreshes an expired access token (obtains refreshed FWT access token).
The full programming sphinx-generated docs can be seen in the official documentation.
Contents:
# Clone the repository
git clone https://github.com/BDALab/predictor-api.git
# Install packaging utils
pip install --upgrade pip
pip install --upgrade virtualenv
# Change directory
cd predictor-api
# Activate virtual environment
# Linux
# Windows
# Linux
virtualenv .venv
source .venv/bin/activate
# Windows
virtualenv venv
venv\Scripts\activate.bat
# Install dependencies
pip install -r requirements.txt
# Three necessary steps (see the configuration section bellow):
#
# 1. create .env file with the JWT secret key at api/.env
# 2. add dependencies of the predictors to be used at requirements_predictors.txt
# 3. configure the location of the serialized predictors at api/configuration/ml.json
To make the Predictor API working, there are three steps that must be performed:
- create
.env
file with the JWT secret key atapi/.env
to enable proper user authorization of the requests (more information can be seen in the next sub-section; 2. point - authorization) - add dependencies of the serialized predictors to be used in the API at
requirements_predictors.txt
to enable automatic installation of the libraries used to train the predictors (more information can be seen in the next sub-section; 6. point - machine learning) - configure the location of the serialized predictors to be used in the API at
api/configuration/ml.json
to enable loading, i.e. deserialization of the models (more information can be seen in the next sub-section; 6. point - machine learning)
The package provides various configuration files stored at api/configuration
. More specifically, the following configuration is provided:
- authentication (
api/configuration/authentication.json
): it supports the configuration of the database of users. In this version, thesqlite
database is used for simplicity. The main configuration is the URI for the*.db
file (pre-set toapi/authentication/database/database/database.db
). An empty database file is created automatically. - authorization (
api/configuration/authorization.json
): it supports the configuration of the request authorization. In this version, the JWT authorization is supported. The main configuration is the name of the.env
file that stores the JWT secret key. For security reasons, the.env
file is not part of this repository, i.e. before using the API, it is necessary to create the .env file atapi
-level, i.e.api/.env
and set the JWT_SECRET_KEY field (e.g.JWT_SECRET_KEY="wfTHu38GpF5y60djwKC0EkFj586jdyZR"
). - cors (
api/configuration/cors.json
): it supports the configuration of the cross-origin resource sharing. In this version, no sources are added to theorigins
, (to be updated per deployment). - caching (
api/configuration/caching.json
): it supports the configuration of API request-response caching. In this version, the simple in-memory caching with the TTL of 60 seconds is used. - logging (
api/configuration/logging.json
): it supports the configuration of the logging. The package provides logging on three levels: (a) request, (b) response, (c) werkzeug. The log files are created in thelogs
directory located at the predictor's root directory. - machine learning (
api/configuration/ml.json
): it supports the configuration of the predictors. First, the dependencies of the serialized predictor models must be added torequirements_predictors.txt
(e.g. when using serialized scikit-learn models,scikit-learn
must be added). The API will automatically install all predictor dependencies specified in this file. Next, the location of the serialized models must be set viapredictors.location
(full-path is needed; by default, it is set to:api/ml/models
). All serialized models must be placed atpredictors.location
to be loadable at the runtime. Only models serialized asjoblib
files are supported.
In order for a user to use the API, the following steps are required:
- a new user must be created via the
/signup
endpoint - the existing user must log-in to get the access and refresh token via the
login
endpoint - calls to the
/predict
or/predict_proba
endpoints can be made - if the access token expires, a new one must be obtained via the
/refresh
endpoint
For specific examples for each step of the workflow, see the Examples section.
Structure of the input data is the following: it is a dict
object with these field-value pairs (example bellow):
features
(dict
, mandatory; placeholder for the feature values/labels)features.values
(numpy.array
, mandatory; feature values)features.labels
(list
, optional; feature labels)model
(str
, mandatory; predictor identifier)
Shape:
Shape of the feature values: (first dimension, (inner dimensions), last dimension)
- the first dimension is dedicated to subjects
- the inner dimensions are dedicated to the dimensionality of the features
- the last dimension is dedicated to features
Important requirement that must be met is to provide the predictor with the data it can process (shape, format, etc.).
# Dimensions: M subjects, N features of (... dimensions)
{
"features": {
"labels": ["feature 1", ... "feature N"],
"values": array of shape (M, ..., N)
},
"model": "model_identifier"
}
Examples:
- 100 subjects, each having 30 1-D features (shape
(1,)
or shape(1, 1)
):shape = (100, 1, 30)
- 250 subjects, each having 20 2-D features (shape
(2,)
or shape(1, 2)
):shape = (250, 2, 20)
- 500 subjects, each having 10 features with the shape of
(3, 4)
:shape = (500, 3, 4, 10)
Structure of the output data is the following: it is a dict
object with these field-value pairs (example bellow): predicted
(numpy.array
, mandatory; predicted values)
Shape:
Shape of the predicted values: (first dimension, last dimension)
- the first dimension is dedicated to subjects
- the last dimension is dedicated to the dimensionality of the predicted values
# Dimensions: M subjects, C-dimenasional predicted values
{
"predicted": array of shape (M, C)
}
Examples:
- 100 subjects,
/predict
(classification):shape = (100, 1)
orshape = (100, 1, 1)
(1 class label) - 250 subjects,
/predict_proba
(classification):shape = (250, 1, 10)
(10 classes; class probabilities) - 500 subjects,
/predict
(regression):shape = (100, 1)
orshape = (100, 1, 1)
(1 predicted value)
As the feature values/predictions are stored as a numpy.array
, they must be JSON-serialized/deserialized. For this purpose, the package provides the api.wrapper.data.DataWrapper
class.
import requests
# Prepare the sign-up data (new user to be created)
body = {
"username": "user123",
"password": "pAsSw0rd987!"
}
# Call the sign-up endpoint (locally deployed API)
response = requests.post(
"http://localhost:5000/signup",
json=body)
import requests
# Prepare the log-in data (already created user)
body = {
"username": "user123",
"password": "pAsSw0rd987!"
}
# Call the log-in endpoint (locally deployed API)
response = requests.post(
"http://localhost:5000/login",
json=body)
# Get the access and refresh tokens from the response
if response.ok:
access_token = response.json().get("access_token")
refresh_token = response.json().get("refresh_token")
import numpy
import requests
from pprint import pprint
from api.wrappers.data import DataWrapper
# Set the number of subjects (10)
num_subjects = 10
# Set the shape of the features for each subject (1, 100): 1-D feature vector with 100 features
features_shape = (1, 100)
# Prepare the feature values/labels (labels are optional)
values = numpy.random.rand(num_subjects, *features_shape)
labels = [f"feature {i}" for i in range(features_shape[-1])]
# Serialize the feature values
values = DataWrapper.wrap_data(values)
# Prepare the model identifier
model = "a3ed6e799cd755286138a53e5fd43102"
# Prepare the predictor data
body = {
"features": {
"labels": values,
"values": labels
},
"model": model
}
# Prepare the authorization header (take the access_token obtained via /login endpoint)
headers = {
"Authorization": f"Bearer <access_token>"
}
# Call the predict endpoint (locally deployed API; endpoints: /predict or /predict_proba)
response = requests.post(
url="http://localhost:5000/predict",
json=body,
headers=headers,
verify=True,
timeout=10)
if response.ok:
# Get the predictions
predicted = response.json().get("predicted")
# Deserialize the predictions
predicted = DataWrapper.unwrap_data(predicted)
pprint(predicted)
import requests
# Prepare the refresh headers (take the refresh_token obtained via /login endpoint)
headers = {
"Authorization": f"Bearer <refresh_token>"
}
# Call the refresh endpoint (locally deployed API)
response = requests.post(
"http://localhost:5000/refresh",
headers=headers)
# Get the refreshed access token
if response.ok:
access_token = response.json().get("access_token")
This project is licensed under the MIT License - see the LICENSE file for details.
This package is developed by the members of Brain Diseases Analysis Laboratory. For more information, please contact the head of the laboratory Jiri Mekyska [email protected] or the main developer: Zoltan Galaz [email protected].