Skip to content

Commit

Permalink
data2vec v2.0 (facebookresearch#4903)
Browse files Browse the repository at this point in the history
data2v2c 2.0
Co-authored-by: Arun Babu <[email protected]>
Co-authored-by: Wei-Ning Hsu <[email protected]>
  • Loading branch information
alexeib authored Dec 12, 2022
1 parent 0f33ccf commit d871f61
Show file tree
Hide file tree
Showing 236 changed files with 17,324 additions and 519 deletions.
122 changes: 122 additions & 0 deletions examples/data2vec/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,125 @@
# data2vec 2.0

data2vec 2.0 improves the training efficiency of the original data2vec algorithm. We make the following improvements for efficiency considerations - we forward only the unmasked timesteps through the encoder, we use convolutional decoder and we use multimasking to amortize the compute overhead of the teacher model. You can find details in [Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language](https://ai.facebook.com/research/xyz)

## Pretrained and finetuned models
### Vision
| Model | Finetuning split | Link
|---|---|---
data2vec ViT-B | No fine-tuning | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/base_imagenet.pt)
data2vec ViT-B | Imagenet-1K | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/base_imagenet_ft.pt)
data2vec ViT-L | No fine-tuning | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/large_imagenet.pt)
data2vec ViT-L | Imagenet-1K | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/large_imagenet_ft.pt)
data2vec ViT-H | No fine-tuning | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/huge_imagenet.pt)
data2vec ViT-H | Imagenet-1K | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/huge_imagenet_ft.pt)

Vision models only are license under CC-BY-NC.
### Speech

| Model | Finetuning split | Dataset | Link
|---|---|---|---
data2vec Base | No fine-tuning | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/base_libri.pt)
data2vec Base | 960 hours | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/base_libri_960h.pt)
data2vec Large | No fine-tuning | [Libri-light](https://github.com/facebookresearch/libri-light) | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/large_vox.pt)
data2vec Large | 960 hours | [Libri-light](https://github.com/facebookresearch/libri-light) | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/large_vox_960h.pt)

### NLP

Model | Fine-tuning data | Dataset | Link
|---|---|---|---|
data2vec Base | No fine-tuning | Books + Wiki | [download](https://dl.fbaipublicfiles.com/fairseq/data2vec2/nlp_base.pt)

[//]: # (## Data Preparation)

[//]: # ()
[//]: # (### Vision)

[//]: # (add details)

[//]: # (### Speech)

[//]: # (add details)

[//]: # ()
[//]: # (### NLP)

[//]: # (add details)


## Commands to train different models using data2vec 2.0

### Vision

Commands to pretrain different model configurations
```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/v2 \
--config-name base_images_only_task task.data=/path/to/dir
```

```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/v2 \
--config-name large_images_only_task task.data=/path/to/dir
```

```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/v2 \
--config-name huge_images14_only_task task.data=/path/to/dir
```

Commands to finetune different model configurations

```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/vision/finetuning \
--config-name mae_imagenet_clean task.data=/path/to/dir model.model_path=/path/to/pretrained/model
```

```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/vision/finetuning \
--config-name mae_imagenet_large_clean task.data=/path/to/dir model.model_path=/path/to/pretrained/model
```

```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/vision/finetuning \
--config-name mae_imagenet_huge_clean task.data=/path/to/dir model.model_path=/path/to/pretrained/model
```

### Speech

```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/v2 \
--config-name base_audio_only_task task.data=/path/to/manifests
```

```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/v2 \
--config-name large_audio_only_task task.data=/path/to/manifests
```

Finetuning:

```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/wav2vec/config/finetuning --config-name vox_10h \
task.data=/path/to/manifests model.w2v_path=/path/to/pretrained/model common.user_dir=examples/data2vec
```

Replace vox_10h with the right config depending on your model and fine-tuning split.
See examples/wav2vec/config/finetuning for all available configs.

### NLP

Commands to pretrain
```shell script
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/v2 \
--config-name base_text_only_task task.data=/path/to/file
```

Commands to fine-tune all GLUE tasks
```shell script
$ task=cola # choose from [cola|qnli|mrpc|rte|sst_2|mnli|qqp|sts_b]
$ lr=1e-5 # sweep [1e-5|2e-5|4e-5|6e-5] for each task
$ python fairseq_cli/hydra_train.py -m --config-dir examples/data2vec/config/v2/text_finetuning \
--config-name $task task.data=/path/to/file model.model_path=/path/to/pretrained/model "optimization.lr=[${lr}]"
```

# data2vec

Expand Down
Empty file added examples/data2vec/__init__.py
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# @package _group_

common:
fp16: true
log_format: json
log_interval: 200
all_gather_list_size: 70000
tensorboard_logdir: tb
min_loss_scale: 1e-6

checkpoint:
save_interval: 1
no_epoch_checkpoints: true
best_checkpoint_metric: mAP
maximize_best_checkpoint_metric: true

task:
_name: audio_classification
data: ???
normalize: true
labels: lbl

dataset:
num_workers: 6
max_tokens: 2560000
skip_invalid_size_inputs_valid_test: true
valid_subset: eval
validate_interval: 5

distributed_training:
ddp_backend: legacy_ddp
distributed_world_size: 8

criterion:
_name: model
can_sum: false
log_keys:
- _predictions
- _targets

optimization:
max_update: 30000
lr: [0.00006] # scratch 53-5

optimizer:
_name: adam
adam_betas: (0.9,0.98)
adam_eps: 1e-08

lr_scheduler:
_name: cosine
warmup_updates: 5000

model:
_name: audio_classification
model_path: ???
apply_mask: true
mask_prob: 0.6
mask_length: 5 # scratch 1
mask_channel_prob: 0
mask_channel_length: 64
layerdrop: 0.1
dropout: 0.1
activation_dropout: 0.1
attention_dropout: 0.2
feature_grad_mult: 0 # scratch 1
label_mixup: true
source_mixup: 0.5
prediction_mode: lin_softmax # scratch average_sigmoid

Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# @package _global_

hydra:
job:
config:
override_dirname:
kv_sep: ':'
item_sep: '/'
exclude_keys:
- run_config
- distributed_training.distributed_port
- distributed_training.distributed_world_size
- model.pretrained_model_path
- model.target_network_path
- next_script
- task.cache_in_scratch
- task.data
- checkpoint.save_interval_updates
- checkpoint.keep_interval_updates
- checkpoint.save_on_overflow
sweep:
dir: /checkpoint/${env:USER}/${env:PREFIX}/${hydra.job.config_name}_${hydra.launcher.gpus_per_node}/${hydra.job.override_dirname}
subdir: ''
launcher:
submitit_folder: ${hydra.sweep.dir}
timeout_min: 4320
cpus_per_task: 10
gpus_per_node: 8
tasks_per_node: 8
mem_gb: 450
nodes: 1
name: ${env:PREFIX}_${hydra.job.config_name}
partition: devlab,learnlab,learnfair,scavenge
constraint: volta32gb,ib4
max_num_timeout: 30
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# @package _global_

hydra:
job:
config:
override_dirname:
kv_sep: ':'
item_sep: '/'
exclude_keys:
- run_config
- distributed_training.distributed_port
- distributed_training.distributed_world_size
- model.pretrained_model_path
- model.target_network_path
- next_script
- task.cache_in_scratch
- task.data
- checkpoint.save_interval_updates
- checkpoint.keep_interval_updates
- checkpoint.save_on_overflow
sweep:
dir: /checkpoint/${env:USER}/${env:PREFIX}/${hydra.job.config_name}_${hydra.launcher.gpus_per_node}/${hydra.job.override_dirname}
subdir: ''
launcher:
submitit_folder: ${hydra.sweep.dir}
timeout_min: 4320
cpus_per_task: 10
gpus_per_node: 1
tasks_per_node: 1
mem_gb: 100
nodes: 1
name: ${env:PREFIX}_${hydra.job.config_name}
partition: devlab,learnlab,learnfair,scavenge
constraint: volta32gb
max_num_timeout: 30
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# @package _global_

hydra:
job:
config:
override_dirname:
kv_sep: ':'
item_sep: '/'
exclude_keys:
- run_config
- distributed_training.distributed_port
- distributed_training.distributed_world_size
- model.pretrained_model_path
- model.target_network_path
- next_script
- task.cache_in_scratch
- task.data
- checkpoint.save_interval_updates
- checkpoint.keep_interval_updates
- checkpoint.save_on_overflow
sweep:
dir: /checkpoint/${env:USER}/${env:PREFIX}/${hydra.job.config_name}_${hydra.launcher.gpus_per_node}/${hydra.job.override_dirname}
subdir: ''
launcher:
submitit_folder: ${hydra.sweep.dir}
timeout_min: 4320
cpus_per_task: 10
gpus_per_node: 8
tasks_per_node: 8
mem_gb: 450
nodes: 2
name: ${env:PREFIX}_${hydra.job.config_name}
partition: devlab,learnlab,learnfair,scavenge
constraint: volta32gb,ib4
max_num_timeout: 30
91 changes: 91 additions & 0 deletions examples/data2vec/config/audio/pretraining/audioset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# @package _group_

common:
fp16: true
log_format: json
log_interval: 200
tensorboard_logdir: tb
min_loss_scale: 1e-6
user_dir: /private/home/abaevski/fairseq-py/examples/data2vec

checkpoint:
save_interval: 1
save_interval_updates: 25000
keep_interval_updates: 1
no_epoch_checkpoints: true

task:
_name: audio_pretraining
data: /private/home/abaevski/data/audioset
max_sample_size: 320000
min_sample_size: 32000
normalize: true

dataset:
num_workers: 6
max_tokens: 3400000
skip_invalid_size_inputs_valid_test: true
validate_interval: 5
required_batch_size_multiple: 1
disable_validation: true

distributed_training:
distributed_world_size: 24
ddp_backend: legacy_ddp

criterion:
_name: model
log_keys:
- ema_decay
- target_var
- pred_var
# - avg_self_attn
# - weights

optimization:
max_update: 200000
lr: [0.0005]

optimizer:
_name: adam
adam_betas: (0.9,0.98)
adam_eps: 1e-06
weight_decay: 0.01

lr_scheduler:
_name: cosine
warmup_updates: 10000

model:
_name: data2vec_audio
extractor_mode: layer_norm
encoder_layerdrop: 0.05
dropout_input: 0.0
dropout_features: 0.0
feature_grad_mult: 1.0
encoder_embed_dim: 768

mask_prob: 0.65
mask_length: 10

loss_beta: 0
loss_scale: null

instance_norm_target_layer: true
layer_norm_targets: true
average_top_k_layers: 12

self_attn_norm_type: deepnorm
final_norm_type: deepnorm

pos_conv_depth: 5
conv_pos: 95

ema_decay: 0.999
ema_end_decay: 0.9999
ema_anneal_end_step: 30000
ema_transformer_only: true
ema_layers_only: false

require_same_masks: true
mask_dropout: 0
Loading

0 comments on commit d871f61

Please sign in to comment.