Skip to content

tikikun/voiceover

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceOver CLI Tool

Voiceover

The VoiceOver CLI tool converts blog posts written in Markdown into a verbally readable and audible format, making them ideal for creating podcast-like content. You can easily generate high-quality speech audio from your Substack blog posts.

Table of Contents

Installation

You can install the VoiceOver CLI tool directly via pip by cloning the repository from GitHub:

pip install git+https://github.com/tikikun/voiceover.git

This command installs all necessary dependencies automatically.

Usage

The VoiceOver CLI reads a Markdown file containing your blog post, rewrites it into a verbally readable format, and generates speech audio using a pre-trained Text-to-Speech (TTS) model.

Basic Usage

To generate audio from a Markdown file:

voiceover --input-file path/to/blog_post.md --output-dir ./outputs/

This command reads the content of blog_post.md, rewrites it for verbal readability, uses a default reference audio, and saves the generated speech audio as output.wav in the specified output directory.

Switching Between Models

You can switch between 4-bit and 8-bit models by specifying the appropriate pretrained model name:

  • 4-bit Model:

    voiceover --input-file path/to/blog_post.md --pretrained-model alandao/f5-tts-mlx-4bit --output-dir ./outputs/
  • 8-bit Model:

    voiceover --input-file path/to/blog_post.md --pretrained-model alandao/f5-tts-mlx-8bit --output-dir ./outputs/

Advanced Options

Reference Audio

Provide a custom reference audio file using the -ra option:

voiceover --input-file path/to/blog_post.md -ra path/to/reference_audio.wav --output-dir ./outputs/

Sampling Parameters

Adjust sampling parameters such as steps, method, speed, cfg-strength, and sway-sampling-coef:

voiceover --input-file path/to/blog_post.md --steps 48 --method midpoint --speed 1.2 --cfg-strength 2.5 --sway-sampling-coef -1.5 --output-dir ./outputs/

Transcript Generation Parameters

Configure additional parameters used during transcript chunking:

voiceover --input-file path/to/blog_post.md --repo my-custom-repo --guide-prompt "Custom guide prompt." --verbose --max-tokens 15000 --top-p 0.9 --temp 0.5 --output-dir ./outputs/

Arguments

Argument Type Description
--input-file string Path to the input Markdown file containing the blog post.
--ref-audio string Optional path to the reference audio file.
--output-dir string Directory where the output audio will be saved.
--steps int Number of sampling steps for generating audio.
--method str Sampling method (euler or midpoint).
--speed float Speed factor for audio generation.
--cfg-strength float Strength of configuration guidance during sampling.
--sway-sampling-coef float Coefficient for sway sampling.
--repo string Model repository name used for chunking transcripts.
--guide-prompt string Guide prompt for model generation.
--verbose boolean Enable verbose mode for detailed logging.
--max-tokens int Maximum number of tokens for each chunked transcript.
--top-p float Top p value for token selection during transcript generation.
--temp float Temperature value controlling randomness during transcript generation.
--pretrained-model string Pre-trained TTS model name to load. Choose between alandao/f5-tts-mlx-4bit and alandao/f5-tts-mlx-8bit.

Examples

  • Basic Example: Generate audio from a Markdown file without any additional options.

    voiceover --input-file example_blog_post.md --output-dir results/
  • Using 8-bit Model: Specify the 8-bit model for higher quality but potentially larger resource consumption.

    voiceover --input-file example_blog_post.md --pretrained-model alandao/f5-tts-mlx-8bit --output-dir results/
  • Custom Reference Audio: Use a specific WAV file as the reference audio.

    voiceover --input-file example_blog_post.md --ref-audio my_reference_audio.wav --output-dir results/
  • Advanced Configuration: Customize various parameters for better control over the audio generation process.

    voiceover --input-file example_blog_post.md --steps 64 --method midpoint --speed 1.5 --cfg-strength 2.8 --sway-sampling-coef -2.0 --output-dir results/

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License. See LICENSE for more information.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages