-
Notifications
You must be signed in to change notification settings - Fork 2
Deployment on Sherlock
This tutorial guides you through steps of installing/deploying torch-choice
and bemb
on the Sherlock cluster. The Sherlock is a computing cluster exclusive to Stanford members, you can skip this tutorial if you are using our package on your own machine.
Notes
- For texts surrounded by angle brackets like
<SUNET ID>
, you should replace them with your own information. - For steps with a
$(\dagger)$ symbol, you only need to run them once during the initial installation. You don't need to re-run them afterward.
This tutorial assumes you are familiar with running jobs on sherlock.
- Log into the sherlock cluster using
ssh <SUNET ID>@login.sherlock.stanford.edu
. - Request a computing node with GPU accelerator using
srun -p athey --gpus 1 -t 2:00:00 -c 8 --pty bash
. You can usetorch-choice
andbemb
without GPU, but we request a GPU machine to verify the installation. Feel free to modify configurations as needed. - We will be calling
ml py-pytorch/1.8.1_py39
to load the PyTorch installation on Sherlock. -
$(\dagger)$ To check we have successfully loaded GPU-enabled PyTorch, typepython3.9
to open a Python terminal. The following Python commands help you verify the installation.
>>> import torch
>>> torch.__version__
'1.8.1+cu111'
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 2080 Ti'
-
$(\dagger)$ Install required dependencies. Some of them might have been installed already, in this case, you will be told "Requirement already satisfied: ...". Because Sherlock is a shared system, we need to add--user
so that the package is installed for you only.
pip3.9 install numpy --user
pip3.9 install termcolor --user
pip3.9 install scikit-learn --user
pip3.9 install pandas --user
pip3.9 install tabulate --user
pip3.9 install pytorch-lightning --user
The last required dependency is torch-scatter
, it's a bit tricky to install. The key is to have the version of torch-scatter
matched your Pytorch version. For example, the torch.__version__
above returns '1.8.1+cu111'
, which means we are using PyTorch 1.8.1 with Nvidia driver version CUDA 11.1. The --no-cache-dir
ensures we are downloading the correct version of package from Internet directly.
Installing torch-scatter
might take a while (specifically, the Building wheel for torch-scatter (setup.py) ... /
step might take 5 to 10 minutes). You can find the list of supported versions on their gitub repo. and here.
Some warning/error messages might be shown during the installation process, but you are fine as long as you see the Successfully installed torch-scatter-X.X.X
message at the very end.
pip3.9 install torch-scatter -f https://data.pyg.org/whl/torch-1.8.1+cu111.html --user --no-cache-dir
You can open up a python terminal by typing python3.9
and run the following commands:
>>> import torch
>>> import torch_scatter
>>> torch_scatter.__version__
'2.0.9'
-
$(\dagger)$ Installtorch-choice
andbemb
pip3.9 install torch-choice --user
pip3.9 install bemb --user
-
$(\dagger)$ Verify installation by running a mini example. You can copy-paste all commands below in a python terminal. You should see progress bar showing training progress. The final performance of this mini-example should be around 50% accuracy (there are 50 items!) and aroudn -2.5 log-likelihood on the test set. The whole example takes about 3-5 minutes to run.
import numpy as np
import pandas as pd
import torch
from torch_choice.data import ChoiceDataset
from bemb.model import LitBEMBFlex
from bemb.utils.run_helper import run
import matplotlib.pyplot as plt
import seaborn as sns
# simulate dataset
num_users = 1500
num_items = 50
data_size = 1000
user_index = torch.LongTensor(np.random.choice(num_users, size=data_size))
Us = np.arange(num_users)
Is = np.sin(np.arange(num_users) / num_users * 4 * np.pi)
Is = (Is + 1) / 2 * num_items
Is = Is.astype(int)
PREFERENCE = dict((u, i) for (u, i) in zip(Us, Is))
# construct users.
item_index = torch.LongTensor(np.random.choice(num_items, size=data_size))
for idx in range(data_size):
if np.random.rand() <= 0.5:
item_index[idx] = PREFERENCE[int(user_index[idx])]
user_obs = torch.zeros(num_users, num_items)
user_obs[torch.arange(num_users), Is] = 1
item_obs = torch.eye(num_items)
dataset = ChoiceDataset(user_index=user_index, item_index=item_index, user_obs=user_obs, item_obs=item_obs)
idx = np.random.permutation(len(dataset))
train_size = int(0.8 * len(dataset))
val_size = int(0.1 * len(dataset))
train_idx = idx[:train_size]
val_idx = idx[train_size: train_size + val_size]
test_idx = idx[train_size + val_size:]
dataset_list = [dataset[train_idx], dataset[val_idx], dataset[test_idx]]
bemb = LitBEMBFlex(
learning_rate=0.03, # set the learning rate, feel free to play with different levels.
pred_item=True, # let the model predict item_index, don't change this one.
num_seeds=32, # number of Monte Carlo samples for estimating the ELBO.
utility_formula='theta_user * alpha_item', # the utility formula.
num_users=num_users,
num_items=num_items,
num_user_obs=dataset.user_obs.shape[1],
num_item_obs=dataset.item_obs.shape[1],
# whether to turn on obs2prior for each parameter.
obs2prior_dict={'theta_user': True, 'alpha_item': True},
# the dimension of latents, since the utility is an inner product of theta and alpha, they should have
# the same dimension.
coef_dim_dict={'theta_user': 10, 'alpha_item': 10}
)
bemb = bemb.to('cuda')
# use the provided run helper to train the model.
# we set batch size to be 5% of the data size, and train the model for 50 epochs.
# there would be 20*50=1,000 gradient update steps in total.
bemb = bemb.fit_model(dataset_list, batch_size=len(dataset) // 20, num_epochs=50)