This work is be take from Audio Sep and a demo HF of the same model.
NOTE: This is an anofficial implementation of Audio Sep, but this work without Miniconda :)
Go to command prompt, navigate like this:
cd documents
and clone the repository:
git clone https://github.com/Brodvd/Audio-Sep---from-Huggin-Face---.git
Install the dependences in the file requirements.txt
(I used Python 3.10, but I think is the same):
pip install -r requirements.txt
Create the folder /checkpoint
and download the checkpoints in the folder from here.
- run the file
pipiline.py
changing the files path and the text query (I recommend using the folder/audio
)- the file input should be in 32000 KHz, format
.wav
- the file output will be in mono format
- the file input should be in 32000 KHz, format
NOTE: this model use lot of memory of the computer, so for the old laptop like me (Windows 10 Home) if you give a input file > 1 it will work only with the chunk-based inference that have a few less quality:
inference(model, audio_file, text, output_file, device, use_chunk=True)
- run the file
app.py
- copy the link that will be appear in the debug on a browser
- use the model online (like Huggin Face)
Obviously the same of the chunk-based inference if you want to have more speed.
Go here to find the instructions for the training of Audio Sep.
AudioSep is a foundation model for open-domain sound separation with natural language queries. AudioSep has two key components: a text encoder and a separation model :
The model has two checkpoint, one for the text query and the other for the suorce separation, the first in .pt
and the second in .ckpt
.
If you want to see my valutation of Audio Sep go --> here.
If you have any problems/discussions about Audio Sep open a problem in this repository.
If you found this tool useful, please consider citing
@article{liu2023separate,
title={Separate Anything You Describe},
author={Liu, Xubo and Kong, Qiuqiang and Zhao, Yan and Liu, Haohe and Yuan, Yi, and Liu, Yuzhuo, and Xia, Rui and Wang, Yuxuan, and Plumbley, Mark D and Wang, Wenwu},
journal={arXiv preprint arXiv:2308.05037},
year={2023}
}
@inproceedings{liu22w_interspeech,
title={Separate What You Describe: Language-Queried Audio Source Separation},
author={Liu, Xubo and Liu, Haohe and Kong, Qiuqiang and Mei, Xinhao and Zhao, Jinzheng and Huang, Qiushi, and Plumbley, Mark D and Wang, Wenwu},
year=2022,
booktitle={Proc. Interspeech},
pages={1801--1805},
}