Skip to content

Latest commit

 

History

History
29 lines (21 loc) · 1.36 KB

README.md

File metadata and controls

29 lines (21 loc) · 1.36 KB

Multimodal Pathway for Audio Recognition

This repo builds based on the code and models of Audio MAE.

1. Installation

  • This repo follows the MAE repo, Installation and preparation follow that repo.
  • Copy files and patch the timm package by ``bash timm_patch.sh'' (Please change the path to your own timm package path).
  • Please find mae_env.yml for all the dependencies.
  • You may also use download the conda-packed conda env, untar it, and then:
conda env create -f mae_env.yaml

2. Prepare data:

Please download AudioSet at here. Due to copyright we cannot release the data. The data annotation json parased and used in this work is available here. The format follows the one in AST. Please be sure to modify the path in the scripts accordingly to reflect your own setup.

3. Pretrianing on AudioSet-2M

For the brave ones to pre-train on AudioSet-2M: Please use the pretrain_audioset2M.sh by:

bash pretrain_audioset2M.sh

4. Fine-tuning on AudioSet-2M and AudioSet-20K

bash ft_srun_pathway.sh