Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update faster-whisper based on SYSTRAN fork #7

Open
wants to merge 55 commits into
base: master
Choose a base branch
from

Conversation

aleksandr-smechov
Copy link

No description provided.

sanchit-gandhi and others added 30 commits March 26, 2024 14:58
* add distil-large-v3

* Update README.md

* use fp16 weights from Systran
* Bugfix: code breaks if audio is empty

Regression since #732 PR
* Foolproof: Disable VAD if clip_timestamps is in use

Prevent silly things to happen.
* CUDA version note and updated instructions in README

* ctranslate2 downgrade note, cuDNN v9 consideration

* clearer note on cuDNN v9 package
* add hotword params

---------

Co-authored-by: jax <[email protected]>
* Clarify documentation for hotwords

* Remove redundant type specifications
Spelling correction for copy/pasters
* Fix #839

Changed the code from updating the TranscriptionOptions class instead of the options object which likely was the cause of unexpected behaviour
…847)

* chore: add distil models to WhisperModel init docstring and download_model docstring
Docker file improvements

Co-authored-by: Fedir Zadniprovskyi <[email protected]>
* Fix window_size_samples to 512

* Update SileroVADModel

* Replace ONNX file with V5 version
* Filter out non_speech_tokens in suppressed tokens
…y Enhancements (#856)

Batching Support, Speed Boosts, and Quality Enhancements

---------

Co-authored-by: Hargun Mujral <[email protected]>
Co-authored-by: MahmoudAshraf97 <[email protected]>
…eline` and fix word timestamps for batched inference (#921)

* fix word timestamps for batched inference

* remove hf pipeline
* revert back to using PyAV instead of torch audio

* Update audio.py
Replace Pyannote VAD with Silero to reduce code duplication and requirements
MahmoudAshraf97 and others added 25 commits October 25, 2024 15:50
* pad to 3000 instead of `feature_extractor.nb_max_frames`

* correct trimming for batched features
* replace `NamedTuple` with `dataclass`

* add deprecation warnings
* Update README.md

* Update README.md

* Update version.py

* Update README.md

* Update README.md

* Update README.md
… initial timestamp is not zero (#1141)

Co-authored-by: Mahmoud Ashraf <[email protected]>
* Supported new options for batched transcriptions:
  * `language_detection_threshold`
  * `language_detection_segments`
* Updated `WhisperModel.detect_language` function to include the improved language detection from #732  and added docstrings, it's now used inside `transcribe` function.
* Removed the following functions as they are no longer needed:
  * `WhisperModel.detect_language_multi_segment` and its test
  * `BatchedInferencePipeline.get_language_and_tokenizer`
* Added tests for empty audios
* Added test for `multilingual` option with english-german audio
* removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"`
* use the correct `encoder_output` for language detection in sequential transcription
* enabled `multilingual` functionality for batched inference
* update version

* Update CPU benchmarks

* Updated GPU benchmarks

* ..

* more gpu benchmarks
* Add Open-dubbing into community projects

* Update URL
Co-authored-by: Mahmoud Ashraf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.