Project for the Computer Vision course.
This project aims to test the CLIP[1] capabilities over zero shot downstream tasks and test the few-shot capabilities over the CIFAR100 dataset. It wants to replicate the results estimated by CLIP's developer over the 12 samples per class few shot classifier, by exploiting a linear layer instead of the logistic regressor used previously.
[1] CLIP Learning Transferable Visual Models From Natural Language Supervision https://arxiv.org/abs/2103.00020