Skip to content

zhanglei1949/federatedSpeechCommands

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Convolutional neural networks for Google speech commands data set with PyTorch.

With federated Learning

Run federated_train_speech_commands_gpu.py for simulating federated training with multiple clients, for example

python federated_train_speech_commands.py --model=vgg11 --optim=sgd --lr-scheduler=plateau --learning-rate=0.01 --lr-scheduler-patience=5 --max-epochs=1 --batch-size=156 --clients=2 --matrix-size=500
CUDA_VISIBLE_DEVICES=0,1 python federated_train_speech_commands_cpu_v5_collect_gradient.py --model=conv --optim=sgd --lr-scheduler=plateau --learning-rate=0.01 --lr-scheduler-patience=5 --max-epochs=1 --batch-size=128 --clients=3 --matrix-size=100  --num-threads=10

General

We, xuyuan and tugstugi, have participated in the Kaggle competition TensorFlow Speech Recognition Challenge and reached the 10-th place. This repository contains a simplified and cleaned up version of our team's code.

Features

  • 1x32x32 mel-spectrogram as network input
  • single network implementation both for CIFAR10 and Google speech commands data sets
  • faster audio data augmentation on STFT
  • Kaggle private LB scores evaluated on 150.000+ audio files

Results

Due to time limit of the competition, we have trained most of the nets with sgd using ReduceLROnPlateau for 70 epochs. For the training parameters and dependencies, see TRAINING.md. Earlier stopping the train process will sometimes produce a better score in Kaggle.

        Model         CIFAR10
test set
accuracy
Speech Commands
test set
accuracy
Speech Commands
test set
accuracy with crop
Speech Commands
Kaggle private LB
score
Speech Commands
Kaggle private LB
score with crop
        Remarks        
VGG19 BN 93.56% 97.337235% 97.527432% 0.87454 0.88030
ResNet32 - 96.181419% 96.196050% 0.87078 0.87419
WRN-28-10 - 97.937089% 97.922458% 0.88546 0.88699
WRN-28-10-dropout 96.22% 97.702999% 97.717630% 0.89580 0.89568
WRN-52-10 - 98.039503% 97.980980% 0.88159 0.88323 another trained model has 97.52%/0.89322
ResNext29 8x64 - 97.190929% 97.161668% 0.89533 0.89733 our best model during competition
DPN92 - 97.190929% 97.249451% 0.89075 0.89286
DenseNet-BC (L=100, k=12) 95.52% 97.161668% 97.147037% 0.88946 0.89134
DenseNet-BC (L=190, k=40) - 97.117776% 97.147037% 0.89369 0.89521

Results with Mixup

After the competition, some of the networks were retrained using mixup: Beyond Empirical Risk Minimization by Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin and David Lopez-Paz.

        Model         CIFAR10
test set
accuracy
Speech Commands
test set
accuracy
Speech Commands
test set
accuracy with crop
Speech Commands
Kaggle private LB
score
Speech Commands
Kaggle private LB
score with crop
        Remarks        
VGG19 BN - 97.483541% 97.542063% 0.89521 0.89839
WRN-52-10 - 97.454279% 97.498171% 0.90273 0.90355 same score as the 16-th place in Kaggle

About

Speech recognition with federated learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published