Skip to content

Implementation from scratch of SPT-LSA : Training ViT for small size Datasets

Notifications You must be signed in to change notification settings

valentin-fngr/SPT-LSA-ViT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPT-LSA ViT : training ViT for small size Datasets

Here is a non official implementation, in Pytorch, of the paper Vision Transformer for Small-Size Datasets.

The configuration has been trained on CIFAR-10 and shows interesting results.

The main components of the papers are :

The ViT architecture :

image

The Shifted Patch Tokenizer (for increasing the locality inductive bias) :

image

The Locality Self-Attention :

image

These components can be found in the models.py

Todo

  • Use register_buffer for the -inf mask in the Locality Self-Attention
  • Use warmup
  • Visualize Attention layers
  • Track scaling coefficient in attention using TensorBoard

About

Implementation from scratch of SPT-LSA : Training ViT for small size Datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published