from user input prompt, GPT generate visual features of each class( of imagenet) and calculate the test set accuracy using zero-shot classification by CLIP.
in classify_by_description_release directory,
- descriptiors : GPT generated visal features data, .json file in directory
- output : output accuracy directory
- run_prompt_options.sh : calculate accuracy using CLIP model
for more execution manual, https://www.notion.so/611ade73f01e4ce79d42d6439b3eadd6?pvs=4