Audio-Sep-from-Huggin-Face-

Separate "Anything" You Describe

This work is be take from Audio Sep and a demo HF of the same model.

NOTE: This is an anofficial implementation of Audio Sep, but this work without Miniconda :)

Setup

Go to command prompt, navigate like this:

cd documents

and clone the repository:

git clone https://github.com/Brodvd/Audio-Sep---from-Huggin-Face---.git

Install the dependences in the file requirements.txt (I used Python 3.10, but I think is the same):

pip install -r requirements.txt

Create the folder /checkpoint and download the checkpoints in the folder from here.

Using

For the pipiline

run the file pipiline.py changing the files path and the text query (I recommend using the folder /audio )
- the file input should be in 32000 KHz, format .wav
- the file output will be in mono format

NOTE: this model use lot of memory of the computer, so for the old laptop like me (Windows 10 Home) if you give a input file > 1 it will work only with the chunk-based inference that have a few less quality:

inference(model, audio_file, text, output_file, device, use_chunk=True)

Demo Gradio

run the file app.py
copy the link that will be appear in the debug on a browser
use the model online (like Huggin Face)

Obviously the same of the chunk-based inference if you want to have more speed.

Training

Go here to find the instructions for the training of Audio Sep.

My personal valutation

Description of the model

AudioSep is a foundation model for open-domain sound separation with natural language queries. AudioSep has two key components: a text encoder and a separation model :

The model has two checkpoint, one for the text query and the other for the suorce separation, the first in .pt and the second in .ckpt .

Discussion about the quality of the output of this checkpoint

If you want to see my valutation of Audio Sep go --> here.

Discussion

If you have any problems/discussions about Audio Sep open a problem in this repository.

Cite the work done by Audio-Agi

If you found this tool useful, please consider citing

@article{liu2023separate,
  title={Separate Anything You Describe},
  author={Liu, Xubo and Kong, Qiuqiang and Zhao, Yan and Liu, Haohe and Yuan, Yi, and Liu, Yuzhuo, and Xia, Rui and Wang, Yuxuan, and Plumbley, Mark D and Wang, Wenwu},
  journal={arXiv preprint arXiv:2308.05037},
  year={2023}
}

@inproceedings{liu22w_interspeech,
  title={Separate What You Describe: Language-Queried Audio Source Separation},
  author={Liu, Xubo and Liu, Haohe and Kong, Qiuqiang and Mei, Xinhao and Zhao, Jinzheng and Huang, Qiushi, and Plumbley, Mark D and Wang, Wenwu},
  year=2022,
  booktitle={Proc. Interspeech},
  pages={1801--1805},
}

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
assets		assets
audio		audio
callbacks		callbacks
config		config
data		data
datafiles		datafiles
examples		examples
models		models
optimizers		optimizers
LICENSE		LICENSE
README.md		README.md
Training.md		Training.md
Valutation.md		Valutation.md
app.py		app.py
environment.yml		environment.yml
gradio_examples.py		gradio_examples.py
losses.py		losses.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-Sep-from-Huggin-Face-

Separate "Anything" You Describe

NOTE: This is an anofficial implementation of Audio Sep, but this work without Miniconda :)

Setup

Using

For the pipiline

Demo Gradio

Training

My personal valutation

Description of the model

Discussion about the quality of the output of this checkpoint

Discussion

Cite the work done by Audio-Agi

About

Languages

License

Brodvd/Audio-Sep---from-HF---

Folders and files

Latest commit

History

Repository files navigation

Audio-Sep-from-Huggin-Face-

Separate "Anything" You Describe

NOTE: This is an anofficial implementation of Audio Sep, but this work without Miniconda :)

Setup

Using

For the pipiline

Demo Gradio

Training

My personal valutation

Description of the model

Discussion about the quality of the output of this checkpoint

Discussion

Cite the work done by Audio-Agi

About

Topics

Resources

License

Stars

Watchers

Forks

Languages