A collection of text contrastive methods, reproducing existing text-based methods or application of CV domain methods.
All encoders of these models are based on pretrained BERT.
- SimCSE: [original paper] [code]
- DirectCSE: [original paper]
- BYOLSE: [original paper]
- DirectBYOLSE: combination of DirectCSE and BYOLSE
- PyTorch 1.9
- python 3.8
- transformers 4.8
- jupyter-notebook
- prepare datasets (data-prepare notebook) and pretrained BERT weights
- change all the file path in PATH.py to your own path
- Follow the example-usage notebook to init&train models
Note: only SimCSE model can get good results (STS ~ 76 ), other models are more of an experimental nature. Feel free to change the model architecture or training parameters for better results, try it!
If you have any questions or suggestions, feel free to open an issue!