Add Convolutional vision Transformer (CvT) #2176

fffffgggg54 · 2024-05-14T01:07:36Z

CvT as described in https://arxiv.org/abs/2103.15808

Swin-era heirarchical transformer. From-scratch reimplementation, cleaner than original that exposes most module cfgs as kwargs, uses sdpa/timm style (https://github.com/microsoft/CvT/tree/main). WIP/barebones test for now, stuck at successful weight remap but incorrect activations that seem to come, at least in part, from BatchNorm layers.

HuggingFaceDocBuilderDev · 2024-05-14T01:09:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fffffgggg54 · 2024-05-20T20:52:15Z

Validation for cvt-13 lines up with paper (81.678 top-1 for me), acts are off from reference impl by minute amounts (MSE of logits for 1 sample off on the order of 1e-10). Initial problems had to do with norm before attn and attn residual when there is a cls_token. I'll finish this off later today most likely.

* Update cvt.py * Update cvt.py

Smartappli · 2024-06-15T00:22:43Z

@fffffgggg54 is this still in progress ?

fffffgggg54 · 2024-06-16T03:23:09Z

@fffffgggg54 is this still in progress ?

Yes. I am a bit busy and traveling right now, but I am still working on getting the validation to line up with the paper. It has been a pain to try and find which part of the model deviates from the reference impl.

fffffgggg54 · 2024-06-28T12:14:46Z

Some updates @rwightman @Smartappli: I have added the reference implementation into a branch on my fork and compared validation performance. There are some slight numerical deviations and the top-1 is off by an insignificant amount. After digging in the reference repo's validation setup, I changed the validation configurations so that the crop settings match what the authors used. The throughput of my implementation sees a ~10% increase over the reference impl on win10/pt2.2.0/fp32. Fused attn was not available. There is a bit of cleanup still (head, torchscript?, stem), but this is the last technical hurdle I was hung up on.

fffffgggg54 added 13 commits December 27, 2023 07:57

wip

5f19928

Update cvt.py

2df705a

Update cvt.py

c737410

Update cvt.py

43e363e

Update cvt.py

3b89b47

Update cvt.py

396e8a5

Update cvt.py

8e6c567

Merge branch 'huggingface:main' into cvt

1f0cf09

wip

63532c2

Update cvt.py

c7120f6

Update cvt.py

a1c4c1e

Update cvt.py

7a33ff4

Merge branch 'huggingface:main' into cvt

c63ee94

fffffgggg54 added 11 commits May 20, 2024 10:02

Update cvt.py

1cdedea

Update cvt.py

187208f

Update cvt.py

0aadb30

Update cvt.py

b06907b

Update cvt.py

832c155

oh xd i feel stupid

e3e3b3f

Update cvt.py

6c896b1

Update cvt.py

df05c0d

Update cvt.py

025d8a4

Update cvt.py

183a5da

remove probes

186dab3

fffffgggg54 added 4 commits May 20, 2024 15:00

Update cvt.py

e69b906

Merge branch 'huggingface:main' into cvt

7ba93ae

Cvt 1 (#14)

efa1a36

* Update cvt.py * Update cvt.py

Update cvt.py

e022a47

Update cvt.py

956d7e6

fffffgggg54 added 2 commits June 28, 2024 05:03

Validation default cfg

937ea42

Update cvt.py

7d68ef7

fffffgggg54 marked this pull request as ready for review June 28, 2024 12:14

fffffgggg54 added 3 commits July 2, 2024 00:04

Update cvt.py

6c31ec1

Update cvt.py

d08a468

Update cvt.py

76afc05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Convolutional vision Transformer (CvT) #2176

Add Convolutional vision Transformer (CvT) #2176

fffffgggg54 commented May 14, 2024

HuggingFaceDocBuilderDev commented May 14, 2024

fffffgggg54 commented May 20, 2024

Smartappli commented Jun 15, 2024

fffffgggg54 commented Jun 16, 2024

fffffgggg54 commented Jun 28, 2024 •

edited

Loading

Add Convolutional vision Transformer (CvT) #2176

Are you sure you want to change the base?

Add Convolutional vision Transformer (CvT) #2176

Conversation

fffffgggg54 commented May 14, 2024

HuggingFaceDocBuilderDev commented May 14, 2024

fffffgggg54 commented May 20, 2024

Smartappli commented Jun 15, 2024

fffffgggg54 commented Jun 16, 2024

fffffgggg54 commented Jun 28, 2024 • edited Loading

fffffgggg54 commented Jun 28, 2024 •

edited

Loading