Deployment on Sherlock

Tianyu Du edited this page Sep 13, 2022 · 4 revisions

Memo. on Deployment torch-choice and bemb on Sherlock.

This tutorial guides you through steps of installing/deploying torch-choice and bemb on the Sherlock cluster. The Sherlock is a computing cluster exclusive to Stanford members, you can skip this tutorial if you are using our package on your own machine.


  • For texts surrounded by angle brackets like <SUNET ID>, you should replace them with your own information.
  • For steps with a $(\dagger)$ symbol, you only need to run them once during the initial installation. You don't need to re-run them afterward.

This tutorial assumes you are familiar with running jobs on sherlock.

Deployment Steps:

  1. Log into the sherlock cluster using ssh <SUNET ID>
  2. Request a computing node with GPU accelerator using srun -p athey --gpus 1 -t 2:00:00 -c 8 --pty bash. You can use torch-choice and bemb without GPU, but we request a GPU machine to verify the installation. Feel free to modify configurations as needed.
  3. We will be calling ml py-pytorch/1.8.1_py39 to load the PyTorch installation on Sherlock.
  4. $(\dagger)$ To check we have successfully loaded GPU-enabled PyTorch, type python3.9 to open a Python terminal. The following Python commands help you verify the installation.
>>> import torch
>>> torch.__version__
>>> torch.cuda.is_available()
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 2080 Ti'
  1. $(\dagger)$ Install required dependencies. Some of them might have been installed already, in this case, you will be told "Requirement already satisfied: ...". Because Sherlock is a shared system, we need to add --user so that the package is installed for you only.
pip3.9 install numpy --user
pip3.9 install termcolor --user
pip3.9 install scikit-learn --user
pip3.9 install pandas --user
pip3.9 install tabulate --user
pip3.9 install pytorch-lightning --user

The last required dependency is torch-scatter, it's a bit tricky to install. The key is to have the version of torch-scatter matched your Pytorch version. For example, the torch.__version__ above returns '1.8.1+cu111', which means we are using PyTorch 1.8.1 with Nvidia driver version CUDA 11.1. The --no-cache-dir ensures we are downloading the correct version of package from Internet directly. Installing torch-scatter might take a while (specifically, the Building wheel for torch-scatter ( ... / step might take 5 to 10 minutes). You can find the list of supported versions on their gitub repo. and here. Some warning/error messages might be shown during the installation process, but you are fine as long as you see the Successfully installed torch-scatter-X.X.X message at the very end.

pip3.9 install torch-scatter -f --user --no-cache-dir

You can open up a python terminal by typing python3.9 and run the following commands:

>>> import torch
>>> import torch_scatter
>>> torch_scatter.__version__
  1. $(\dagger)$ Install torch-choice and bemb
pip3.9 install torch-choice --user
pip3.9 install bemb --user
  1. $(\dagger)$ Verify installation by running a mini example. You can copy-paste all commands below in a python terminal. You should see progress bar showing training progress. The final performance of this mini-example should be around 50% accuracy (there are 50 items!) and aroudn -2.5 log-likelihood on the test set. The whole example takes about 3-5 minutes to run.
import numpy as np
import pandas as pd
import torch
from import ChoiceDataset
from bemb.model import LitBEMBFlex
from bemb.utils.run_helper import run
import matplotlib.pyplot as plt
import seaborn as sns

# simulate dataset
num_users = 1500
num_items = 50
data_size = 1000

user_index = torch.LongTensor(np.random.choice(num_users, size=data_size))
Us = np.arange(num_users)
Is = np.sin(np.arange(num_users) / num_users * 4 * np.pi)
Is = (Is + 1) / 2 * num_items
Is = Is.astype(int)

PREFERENCE = dict((u, i) for (u, i) in zip(Us, Is))

# construct users.
item_index = torch.LongTensor(np.random.choice(num_items, size=data_size))

for idx in range(data_size):
    if np.random.rand() <= 0.5:
        item_index[idx] = PREFERENCE[int(user_index[idx])]

user_obs = torch.zeros(num_users, num_items)
user_obs[torch.arange(num_users), Is] = 1

item_obs = torch.eye(num_items)

dataset = ChoiceDataset(user_index=user_index, item_index=item_index, user_obs=user_obs, item_obs=item_obs)

idx = np.random.permutation(len(dataset))
train_size = int(0.8 * len(dataset))
val_size = int(0.1 * len(dataset))
train_idx = idx[:train_size]
val_idx = idx[train_size: train_size + val_size]
test_idx = idx[train_size + val_size:]

dataset_list = [dataset[train_idx], dataset[val_idx], dataset[test_idx]]

bemb = LitBEMBFlex(
    learning_rate=0.03,  # set the learning rate, feel free to play with different levels.
    pred_item=True,  # let the model predict item_index, don't change this one.
    num_seeds=32,  # number of Monte Carlo samples for estimating the ELBO.
    utility_formula='theta_user * alpha_item',  # the utility formula.
    # whether to turn on obs2prior for each parameter.
    obs2prior_dict={'theta_user': True, 'alpha_item': True},
    # the dimension of latents, since the utility is an inner product of theta and alpha, they should have
    # the same dimension.
    coef_dim_dict={'theta_user': 10, 'alpha_item': 10}

bemb ='cuda')
# use the provided run helper to train the model.
# we set batch size to be 5% of the data size, and train the model for 50 epochs.
# there would be 20*50=1,000 gradient update steps in total.
bemb = bemb.fit_model(dataset_list, batch_size=len(dataset) // 20, num_epochs=50)