GitHub - cookcodes/Percepnet-Keras: percepnet implemented using Keras, still need to be optimized and tuned.

cookcodes / Percepnet-Keras Public

Notifications You must be signed in to change notification settings
Fork 21
Star 39

percepnet implemented using Keras, still need to be optimized and tuned.

39 stars 21 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
doc		doc
examples		examples
include		include
m4		m4
src		src
training		training
AUTHORS		AUTHORS
COPYING		COPYING
Makefile.am		Makefile.am
README		README
TRAINING-README		TRAINING-README
autogen.sh		autogen.sh
configure.ac		configure.ac
rnnoise-uninstalled.pc.in		rnnoise-uninstalled.pc.in
rnnoise.pc.in		rnnoise.pc.in
update_version		update_version

Repository files navigation

PercepNet (Still need to be tuned)

Unofficial implementation of PercepNet : A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

https://www.researchgate.net/publication/343568932_A_Perceptually-Motivated_Approach_for_Low-Complexity_Real-Time_Enhancement_of_Fullband_Speech

Compared with https://github.com/jzi040941/PercepNet , this version is implemented using Keras.

----------------------------------------------------------
Due to github file size limit is 100M, rnn_data.c is compressed to rnn_data.c.tgz. 
This file need to be extracted before furthur compileing.
% cd src
% tar -xzvf rnn_data.c.tgz
% cd ..


To compile, just type:
% ./autogen.sh
% ./configure
% make

A simple command-line tool is
provided as an example. It operates on RAW 16-bit (machine endian) mono
PCM files sampled at 48 kHz. It can be used as:

./examples/rnnoise_demo <noisy speech> <output denoised>

The output is also a 16-bit raw PCM file.

------------------------------------------------------------

How to train:
(change to src subdirectory, assumed the clean and noise files's directory are in ~/DNS-Challenge/datasets/rnnoise3/)
cd ~/percepnet/src
./denoise_training ~/DNS-Challenge/datasets/rnnoise3/clean  ~/DNS-Challenge/datasets/rnnoise3/noise 80000000 training.f32

(change to training subdirectory)
cd ../training 
python bin2hdf5.py …/src/training.f32 80000000 138 training.h5
python rnn_train.py
python dump_rnn_float.py weights.hdf5 rnn_data.c rnn_data.h orig
cp rnn_data.c ../src/   

(change to percepnet directory)
cd ~/percepnet/
make clean
make

(change to example subdirectory)
cd examples 
./rnnoise_demo test2.raw test2_denoised.raw

----------------------------------------------------------------
More:
The performance of this version needs furthur optimization and tuning.
In some cases, it is worse than Rnnoise.
Any comments on how to optimize/tune are welcome.

test_gr in src/ is used to test the classical processing, using computed g, r (not from deep learning model) directly to check 
whether g,r work not not.

The overall framework is based on https://github.com/xiph/rnnoise
And the speech signal processing codes are from  https://github.com/jzi040941/PercepNet
Compared with Rnnoise, the training data is standardized, and use float(not quantized) when conver to rnn_data.c.
And during training, clip_norm is set to 0.1, or loss will be NAN.

The training data are from:
https://github.com/microsoft/DNS-Challenge

Wavfiles processing codes(wav.h, wav.c) are from https://faculty.fiu.edu/~wgillam/wavfiles.html