This repo is the code of our survey paper "基于自监督的预训练在推荐系统中的研究综述" accepted by CCIR 2023, which collects several codes and datasets of Self-supervised learning Sequential Recommendation System baselines.
For CL4Rec and SGL models, we reproduce them and run experiment with RecBole.
Compared with the original paper, the code is changed, e.g., we have added the code to count the indicators of different length.
Beauty | ML-1M | Yelp | |
---|---|---|---|
User | 22364 | 6040 | 22845 |
Item | 12102 | 3352 | 16552 |
Interaction | 194687 | 269721 | 237004 |
Total File | 4.18M | 5.30M | 5.19M |
Min_len | 5 | 17 | 5 |
Max_len | 50 | 50 | 50 |
Avg_len | 8.7057 | 44.6557 | 10.37443 |
Density | 0.07194251% | 1.3322134% | 0.06267784% |
Attributes | 2320 | 18 | 1158 |
Min. Attribute / Item | 1 | 1 | 0 |
Max. Attribute / Item | 9 | 6 | 33 |
Avg. Attribute / Item | 3.9391 | 1.7072 | 4.9205 |
length | Beauty | ML-1M | Yelp |
---|---|---|---|
[0,20) | 21228 | 94.9202% | 177 | 2.9305% | 20744 | 90.8032% |
[20,30) | 655 | 2.9289% | 684 | 11.3245% | 1094 | 4.7888% |
[30,40) | 231 | 1.0330% | 543 | 8.9901% | 511 | 2.2368% |
[40,50] | 250 | 1.1179% | 4636 | 76.7550% | 496 | 2.1712% |
overall | 22364 | 100% | 6040 | 100% | 22845 | 100% |
We refer to the method in [1,2,3] to process the datasets. If the user interacts with the item, we will convert the interaction with a clear score into implicit positive feedback. After that, we will group the interactive information according to users. We will sort the items for each user according to the timestamp of their interaction with the items. Because this work aims not to investigate the "cold start" issue in the recommendation system, we circularly filter out users with less than 5 interactions and items with less than 5 interactions. In addition, there are users with too much interactive data in the dataset used in this work, so we limit the maximum length of the user interaction sequences to 50. Because the yelp dataset is too large, we adopted a processing method similar to [3], and only the 2019 data of the dataset was intercepted.
Reference:
[1] Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In ICDM. IEEE, 197–206.
[2] Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In CIKM. 1441–1450.
[3] S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20).
For each model, the requirement.txt is provided, you can use
pip install -r requirements.txt
if you use pip.
conda install --yes --file requirements.txt
if you use conda.
Besides, we provide the slurm execute script in each baseline folder.
More details about slurm usage can be found on this link: https://slurm.schedmd.com/documentation.html
[Note: please modify "conda activate envname" to your environment]
#!/bin/bash
#SBATCH -e sas_ans_FT.err
#SBATCH -o sas_ans_FT.out
#SBATCH -J sas4recFT # jobname
#SBATCH --partition=debug
#SBATCH --nodelist=gpuxx
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4
#SBATCH --time=999:00:00
conda activate xxx
python main.py