✨ Add speech recognition training & other improvements #147

arxyzan · 2024-02-07T14:23:13Z

Pull Request

Description

This PR mainly adds support for ASR regarding dataset, training, collator, etc. But in the process some other improvments and bug fixes were also introduced.

Changes

Add SpeechRecognitionDataset and SpeechRecognitionDataCollator
Add SpeechRecognitionMetricsHandler
Add training script for ASR in examples
Fix bugs in Tokenizer
Fix bugs in AudioFeatureExtractor
Fix bugs in config
Add dataset load script to templates for ASR
Improve tests

Related Issues

Mainly focuses on #72

Checklist

I have read and followed the project's contributing guidelines.
My code follows the project's coding style.
I have tested my changes thoroughly.
I have updated the documentation if necessary.
All existing tests pass.
I have added new tests to cover my changes.
My changes do not introduce any new warnings or errors.

Additional Comments

Reviewer Instructions

Author's Note

Setting `max_length` in tokenizer call had unexpected behavior

…ength_for_padding()`

arxyzan added 30 commits January 29, 2024 23:19

✨ Implement metrics handler for speech recognition

f5b6ab4

🍱 Add train_speech_recognition.py

fe7ff00

Merge branch 'main' into asr-training

63af54b

🐛 Fix resolve_inputs_length_for_padding bug

7a9a7b6

✏️ Minor

e621612

✏️ Minor

fa3c006

🐛 Fix fields bug in Config

4672331

🐛 Fix bug in AudioFeatureExtractor.pad()

f507b11

✨ Add SpeechRecognitionDataCollator

56632f9

🐛 Fix wrong attributes in WhisperBPEConfig

7068c2f

🐛 Handle bugs in WhisperSpeechRecognition

bbbf902

✨ Add dataset loading script for ASR

35a16a4

✨ Add SpeechRecognitionDataset

d505b09

🐛 Fix tokenizer max_length bug

6b1a7c7

Setting `max_length` in tokenizer call had unexpected behavior

✏️ Improve logging robustness in Trainer

f3b6988

✏️ Update train_speech_recognition.py

bc30588

✏️ Rename padding -> padding_type in `data_utils.resolve_inputs_l…

c4dd743

…ength_for_padding()`

🧪 Add speech_recognition to tests for datasets and trainer

c95c4f5

🧪 Improve flexibility of tests in test_datasets.py

19574b6

🧪 Ignore errors for rmtree in test_trainer.py

3f41b11

✏️ Fix minor issues for asr

3eb23d5

🧪 Limit max input lengths to prevent crash in CI

fae959d

✨ Add clean_cache function to utils

f3c28d1

🧪 Clean cache after every train process

e8eaa29

🧪 Add CI_MODE to tests.yml

b37cd26

🐛 Fix minor bug in speech_recognition_dataset.py

cdea95d

🧪 Minor change

d9a298e

🧪 Clean cache in test_datasets.py

045b820

🧪 Minor change

d00c30a

🧪 Clean cache after every test

ac6a8db

arxyzan added 6 commits February 6, 2024 12:25

🧪 Limit sizes in test_trainer.py

0e241ee

🧪 Limit sizes in test_trainer.py

3cd1275

🚑 Fix a silent critical bug in Tokenizer.__call__

fc23537

🧪 Minor renamings

ea8ef2a

🧪 Minor

ae61af3

🐛 Fix wrong cache dir in speech_recognition_dataset.py

fb77c6b

arxyzan merged commit d7dadd6 into main Feb 7, 2024
1 check passed

arxyzan mentioned this pull request Feb 7, 2024

FineTune Whisper #72

Closed

arxyzan deleted the asr-training branch February 10, 2024 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add speech recognition training & other improvements #147

✨ Add speech recognition training & other improvements #147

arxyzan commented Feb 7, 2024

✨ Add speech recognition training & other improvements #147

✨ Add speech recognition training & other improvements #147

Conversation

arxyzan commented Feb 7, 2024

Pull Request

Description

Changes

Related Issues

Checklist

Additional Comments

Reviewer Instructions

Author's Note