Update faster-whisper based on SYSTRAN fork #7

aleksandr-smechov · 2024-05-28T00:56:15Z

No description provided.

* add distil-large-v3 * Update README.md * use fp16 weights from Systran

* Bugfix: code breaks if audio is empty Regression since #732 PR

* Foolproof: Disable VAD if clip_timestamps is in use Prevent silly things to happen.

* CUDA version note and updated instructions in README * ctranslate2 downgrade note, cuDNN v9 consideration * clearer note on cuDNN v9 package

* add hotword params --------- Co-authored-by: jax <[email protected]>

* Clarify documentation for hotwords * Remove redundant type specifications

Spelling correction for copy/pasters

* Fix #839 Changed the code from updating the TranscriptionOptions class instead of the options object which likely was the cause of unexpected behaviour

…847) * chore: add distil models to WhisperModel init docstring and download_model docstring

Docker file improvements Co-authored-by: Fedir Zadniprovskyi <[email protected]>

Co-authored-by: Fedir Zadniprovskyi <[email protected]>

* Fix window_size_samples to 512 * Update SileroVADModel * Replace ONNX file with V5 version

* Filter out non_speech_tokens in suppressed tokens

…y Enhancements (#856) Batching Support, Speed Boosts, and Quality Enhancements --------- Co-authored-by: Hargun Mujral <[email protected]> Co-authored-by: MahmoudAshraf97 <[email protected]>

…eline` and fix word timestamps for batched inference (#921) * fix word timestamps for batched inference * remove hf pipeline

* revert back to using PyAV instead of torch audio * Update audio.py

Replace Pyannote VAD with Silero to reduce code duplication and requirements

* pad to 3000 instead of `feature_extractor.nb_max_frames` * correct trimming for batched features

Co-authored-by: Mahmoud Ashraf <[email protected]>

#965 (comment)

* replace `NamedTuple` with `dataclass` * add deprecation warnings

* Update README.md * Update README.md * Update version.py * Update README.md * Update README.md * Update README.md

… initial timestamp is not zero (#1141) Co-authored-by: Mahmoud Ashraf <[email protected]>

* Supported new options for batched transcriptions: * `language_detection_threshold` * `language_detection_segments` * Updated `WhisperModel.detect_language` function to include the improved language detection from #732 and added docstrings, it's now used inside `transcribe` function. * Removed the following functions as they are no longer needed: * `WhisperModel.detect_language_multi_segment` and its test * `BatchedInferencePipeline.get_language_and_tokenizer` * Added tests for empty audios

* Added test for `multilingual` option with english-german audio * removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"` * use the correct `encoder_output` for language detection in sequential transcription * enabled `multilingual` functionality for batched inference

* update version * Update CPU benchmarks * Updated GPU benchmarks * .. * more gpu benchmarks

* Add Open-dubbing into community projects * Update URL

Co-authored-by: Mahmoud Ashraf <[email protected]>

sanchit-gandhi and others added 30 commits March 26, 2024 14:58

Add support for distil-large-v3 (#755)

a67e0e4

* add distil-large-v3 * Update README.md * use fp16 weights from Systran

Update project github link to SYSTRAN (#746)

e0c3a9e

Bugfix: code breaks if audio is empty (#768)

8ae82c8

* Bugfix: code breaks if audio is empty Regression since #732 PR

Foolproof: Disable VAD if clip_timestamps is in use (#769)

b024972

* Foolproof: Disable VAD if clip_timestamps is in use Prevent silly things to happen.

make faster_whisper.assets as a valid python package to distribute (#772

91c8307

) (#774)

CUDA version and updated installation instructions (#785)

3d1de60

* CUDA version note and updated instructions in README * ctranslate2 downgrade note, cuDNN v9 consideration * clearer note on cuDNN v9 package

Loosening tokenizers version constraint (#804)

46080e5

Feature/add hotwords (#731)

847fec4

* add hotword params --------- Co-authored-by: jax <[email protected]>

Add benchmarking logic for memory, wer and speed (#773)

6eec077

Support initializing more whisper model args (#807)

8d5e6d5

Clarify documentation for hotwords (#817)

49a80eb

* Clarify documentation for hotwords * Remove redundant type specifications

Allow av to include version 12. (#819)

e11d585

Bump version to 1.0.2 (#816)

2f6913e

Add Dockerfile example (#828)

2036d12

Update README.md (#841)

a1c3583

Spelling correction for copy/pasters

Fix #839 incorrect clip_timestamps being used in model (#842)

4acdb5c

* Fix #839 Changed the code from updating the TranscriptionOptions class instead of the options object which likely was the cause of unexpected behaviour

Add distil models to WhisperModel init and download_model docstrings (#…

f53be1e

…847) * chore: add distil models to WhisperModel init docstring and download_model docstring

Docker file improvements (#848)

65551c0

Docker file improvements Co-authored-by: Fedir Zadniprovskyi <[email protected]>

docs: add 'faster-whisper-server' community integration (#861)

bced5f0

Co-authored-by: Fedir Zadniprovskyi <[email protected]>

Upgrade to Silero-Vad V5 (#884)

8d400e9

* Fix window_size_samples to 512 * Update SileroVADModel * Replace ONNX file with V5 version

Improve language detection when using clip_timestamps (#867)

8862bee

Bump version to 1.0.3 (#887)

c22db51

Filter out non_speech_tokens in suppressed tokens (#898)

1195359

* Filter out non_speech_tokens in suppressed tokens

Fix language detection with non-speech audio (#895)

fbcf58b

New PR for Faster Whisper: Batching Support, Speed Boosts, and Qualit…

eb83902

…y Enhancements (#856) Batching Support, Speed Boosts, and Quality Enhancements --------- Co-authored-by: Hargun Mujral <[email protected]> Co-authored-by: MahmoudAshraf97 <[email protected]>

Make vad-related parameters configurable for batched inference. (#923)

83a368e

Remove the usage of transformers.pipeline from `BatchedInferencePip…

d57c5b4

…eline` and fix word timestamps for batched inference (#921) * fix word timestamps for batched inference * remove hf pipeline

revert back to using PyAV instead of torchaudio (#961)

42b8681

* revert back to using PyAV instead of torch audio * Update audio.py

Update Dockerfile to ensure compatibility with CT2==4.5.0

574e256

Use Silero VAD in Batched Mode (#936)

2dbca5e

Replace Pyannote VAD with Silero to reduce code duplication and requirements

MahmoudAshraf97 and others added 25 commits October 25, 2024 15:50

Add support for turbo model (#1090)

b2da055

typo: trubo -> turbo (#1092)

c2a1da1

Use correct features padding for encoder input (#1101)

2386843

* pad to 3000 instead of `feature_extractor.nb_max_frames` * correct trimming for batched features

Revert CPU default threads to 4 (#965)

f978fa2

Co-authored-by: Mahmoud Ashraf <[email protected]>

Revert CPU default threads to 0

814472f

#965 (comment)

replace NamedTuple with dataclass (#1105)

203dddb

* replace `NamedTuple` with `dataclass` * add deprecation warnings

Update cuda instructions in readme (#1125)

fb65cd3

* Update README.md * Update README.md * Update version.py * Update README.md * Update README.md * Update README.md

change language_detection_threshold default value (#1134)

c2bf036

Update WhisperModel documentation to list all available models (#1137)

8f01aee

Remove torch dependency, Faster numpy Feature extraction (#1106)

3e0ba86

Add progress bar to WhisperModel.transcribe (#1138)

85e61ea

fix: Use correct seek value in output, fix word timestamps when the…

53bbe54

… initial timestamp is not zero (#1141) Co-authored-by: Mahmoud Ashraf <[email protected]>

Cleanup of BatchedInferencePipeline (#1135)

be9fb36

Fix list index out of range in word timestamps (#1157)

f830c6f

Add new tests (#1158)

491852e

use jiwer instead of evaluate in benchmarks (#1159)

9c8ef76

remove log_prob_low_threshold (#1160)

08f6900

Bump version to 1.1.0 and update benchmarks (#1161)

97a4785

* update version * Update CPU benchmarks * Updated GPU benchmarks * .. * more gpu benchmarks

Upgrade CI to 3.9 and drop Python 3.8 support(#1184)

22a5238

Brings back original VAD parameters naming (#1181)

8327d8c

Make batched suppress_tokens behaviour same as in sequential (#1194)

f32c0e8

Add Open-dubbing into community projects (#1034)

b568fae

* Add Open-dubbing into community projects * Update URL

Reduce VAD memory usage (#1198)

1b24f28

Co-authored-by: Mahmoud Ashraf <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update faster-whisper based on SYSTRAN fork #7

Update faster-whisper based on SYSTRAN fork #7

aleksandr-smechov commented May 28, 2024

Update faster-whisper based on SYSTRAN fork #7

Are you sure you want to change the base?

Update faster-whisper based on SYSTRAN fork #7

Conversation

aleksandr-smechov commented May 28, 2024