Skip to content

Latest commit

 

History

History
61 lines (42 loc) · 1.41 KB

README.md

File metadata and controls

61 lines (42 loc) · 1.41 KB

Vision Transformer

input

input image

(from https://pixabay.com/photos/labrador-retriever-dog-pet-labrador-6244939/)


output

output_image


usage

Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For the sample image,

$ python vit.py
(ex on CPU)  $ python vit.py -e 0
(ex on BLAS) $ python vit.py -e 1
(ex on GPU)  $ python vit.py -e 2

If you want to specify the input image, put the image path after the --input option.
You can use --savepath option to change the name of the output file to save.

$ python3 vit.py --input IMAGE_PATH --savepath SAVE_IMAGE_PATH
$ python3 vit.py -i IMAGE_PATH -s SAVE_IMAGE_PATH

By adding the --video option, you can input the video.

$ python3 vit.py --video VIDEO_PATH --savepath SAVE_VIDEO_PATH
$ python3 vit.py -v VIDEO_PATH -s SAVE_VIDEO_PATH
(ex) $ python3 vit.py --video input.mp4 --savepath output.mp4

Reference

Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)


Framework

Pytorch


Model Format

ONNX opset = 10


Netron

ViT-B_16-224.onnx.prototxt