SoundStream for Pytorch

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint.

16kHz pretrained model was trained on LibriSpeech train-clean-100 with NVIDIA T4 for about 150 epochs (around 50 hours) in total. The model is not causal.

import torchaudio
import torch

model = torch.hub.load("kaiidams/soundstream-pytorch", "soundstream_16khz")
x, sr = torchaudio.load('input.wav')
x, sr = torchaudio.functional.resample(x, sr, 16000), 16000
with torch.no_grad():
    y = model.encode(x)
    # y = y[:, :, :4]  # if you want to reduce code size.
    z = model.decode(y)
torchaudio.save('output.wav', z, sr)

sample audio

Audio references are sampled from LibriSpeech test-clean.

Reference	SoundStream
audio link	audio link
audio link	audio link
audio link	audio link

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
lightning_logs		lightning_logs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
batch.py		batch.py
hubconf.py		hubconf.py
program.py		program.py
soundstream.py		soundstream.py
spary.py		spary.py
test_for_git		test_for_git

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoundStream for Pytorch

sample audio

About

Releases

Packages

Contributors 3

Languages

License

uuuuuvp/soundstream-pytorch

Folders and files

Latest commit

History

Repository files navigation

SoundStream for Pytorch

sample audio

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages