(Image from https://scikit-image.org/)
- Zero-Shot Prediction
### predicts the most likely top5 labels among input textual labels ###
a cat: 98.40%
a human: 1.35%
a dog: 0.24%
This model requires additional module.
pip3 install ftfy
Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.
For the sample image,
$ python3 clip.py
If you want to specify the input image, put the image path after the --input
option.
$ python3 clip.py --input IMAGE_PATH
You can use --text
option if you want to specify a subset of the texture labels to input into the model.
Default labels is "a human", "a dog" and "a cat".
$ python3 clip.py --text "a human" --text "a dog" --text "a cat"
If you want to load a subset of the texture labels you input into the model from a file, use the --desc_file
option.
$ python3 clip.py --desc_file imagenet_classes.txt
By adding the --model_type
option, you can specify model type which is selected from "ViTB32", "RN50". (default is ViTB32)
$ python3 clip.py --model_type ViTB32
Pytorch
ONNX opset=11
ViT-B32-encode_image.onnx.prototxt
ViT-B32-encode_text.onnx.prototxt
RN50-encode_image.onnx.prototxt
RN50-encode_text.onnx.prototxt