Skip to content

Commit

Permalink
Merge pull request #180 from souzatharsis/feat/newTTS
Browse files Browse the repository at this point in the history
v0.4.0 - add Google's TTS models
  • Loading branch information
souzatharsis authored Nov 16, 2024
2 parents b4c5d4b + e6f2b5a commit a5b707c
Show file tree
Hide file tree
Showing 58 changed files with 1,727 additions and 448 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog


## [0.4.0] - 2024-11-16

### Added
- Add Google Singlespeaker (Journey) and Multispeaker TTS models
- Fixed limitations of Google Multispeaker TTS model: 5000 bytes input limite and 500 bytes per turn limit.
- Updated tests and docs accordingly

## [0.3.6] - 2024-11-13

### Added
Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,24 +31,23 @@ Unlike closed-source UI-based tools focused primarily on research synthesis (e.g
[![Star History Chart](https://api.star-history.com/svg?repos=souzatharsis/podcastfy&type=Date&theme=dark)](https://api.star-history.com/svg?repos=souzatharsis/podcastfy&type=Date&theme=dark)

## Audio Examples 🔊
This sample collection is also [available at audio.com](https://audio.com/thatupiso/collections/podcastfy).
This sample collection was generated using this [Python Notebook](usage/examples.ipynb).

### Images

| Image Set | Description | Audio |
| Audio | Description | Image Set |
|:--|:--|:--|
| <img src="data/images/Senecio.jpeg" alt="Senecio, 1922 (Paul Klee)" width="20%" height="auto"> <img src="data/images/connection.jpg" alt="Connection of Civilizations (2017) by Gheorghe Virtosu " width="21.5%" height="auto"> | Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu | [<span style="font-size: 25px;">🔊</span>](https://audio.com/thatupiso/audio/output-file-abstract-art) |
| <img src="data/images/japan_1.jpg" alt="The Great Wave off Kanagawa, 1831 (Hokusai)" width="20%" height="auto"> <img src="data/images/japan2.jpg" alt="Takiyasha the Witch and the Skeleton Spectre, c. 1844 (Kuniyoshi)" width="21.5%" height="auto"> | The Great Wave off Kanagawa, 1831 (Hokusai) and Takiyasha the Witch and the Skeleton Spectre, c. 1844 (Kuniyoshi) | [<span style="font-size: 25px;">🔊</span>](https://audio.com/thatupiso/audio/output-file-japan) |
| <img src="data/images/taylor.png" alt="Taylor Swift" width="28%" height="auto"> <img src="data/images/monalisa.jpeg" alt="Mona Lisa" width="10.5%" height="auto"> | Pop culture icon Taylor Swift and Mona Lisa, 1503 (Leonardo da Vinci) | [<span style="font-size: 25px;">🔊</span>](https://audio.com/thatupiso/audio/taylor-monalisa) |
| <video src="usage/video/senecio.mp4"></video> | Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu | <img src="data/images/Senecio.jpeg" alt="Senecio, 1922 (Paul Klee)" width="20%" height="auto"> <img src="data/images/connection.jpg" alt="Connection of Civilizations (2017) by Gheorghe Virtosu " width="21.5%" height="auto"> |
| <video src="usage/video/japan.mp4"></video> | The Great Wave off Kanagawa, 1831 (Hokusai) and Takiyasha the Witch and the Skeleton Spectre, c. 1844 (Kuniyoshi) | <img src="data/images/japan_1.jpg" alt="The Great Wave off Kanagawa, 1831 (Hokusai)" width="20%" height="auto"> <img src="data/images/japan2.jpg" alt="Takiyasha the Witch and the Skeleton Spectre, c. 1844 (Kuniyoshi)" width="21.5%" height="auto"> |
| <video src="usage/video/taylor.mp4"></video> | Pop culture icon Taylor Swift and Mona Lisa, 1503 (Leonardo da Vinci) | <img src="data/images/taylor.png" alt="Taylor Swift" width="28%" height="auto"> <img src="data/images/monalisa.jpeg" alt="Mona Lisa" width="10.5%" height="auto"> |


### Text
| Content Type | Description | Audio | Source |
|--------------|-------------|-------|--------|
| Youtube Video | YCombinator on LLMs | [Audio](https://audio.com/thatupiso/audio/ycombinator-llms) | [YouTube](https://www.youtube.com/watch?v=eBVi_sLaYsc) |
| PDF | Book: Networks, Crowds, and Markets | [Audio](https://audio.com/thatupiso/audio/networks) | book pdf |
| Research Paper | Climate Change in France | [Audio](https://audio.com/thatupiso/audio/agro-paper) | [PDF](./data/pdf/s41598-024-58826-w.pdf) |
| Website | My Personal Website | [Audio](https://audio.com/thatupiso/audio/tharsis) | [Website](https://www.souzatharsis.com) |
| Website + YouTube | My Personal Website + YouTube Video on AI | [Audio](https://audio.com/thatupiso/audio/tharsis-ai) | [Website](https://www.souzatharsis.com), [YouTube](https://www.youtube.com/watch?v=sJE1dE2dulg) |
| Audio | Description | Content Type | Source |
|-------|-------------|--------------|--------|
| <video src="usage/video/taylor.mp4"></video> | Person Website | Website | [Website](www.souzatharsis.com) |
| [Audio](https://soundcloud.com/high-lander123/amodei?in=high-lander123/sets/podcastfy-sample-audio-longform&si=b8dfaf4e3ddc4651835e277500384156) | Lex Fridman Podcast: Dario Amodei Anthropic's CEO | Youtube | [Youtube](https://www.youtube.com/watch?v=ugvHCXCOmm4) |
| [Audio](https://soundcloud.com/high-lander123/benjamin?in=high-lander123/sets/podcastfy-sample-audio-longform&si=dca7e2eec1c94252be18b8794499959a&utm_source=clipboard&utm_medium=text&utm_campaign=social_sharing) | Benjamin Franklin's Autobiography | Youtube | [Book](https://www.youtube.com/watch?v=ugvHCXCOmm4) |

### Multi-Lingual Text
| Language | Content Type | Description | Audio | Source |
Expand All @@ -58,7 +57,7 @@ This sample collection is also [available at audio.com](https://audio.com/thatup

## Features ✨

- Generate conversational content from multiple sources and formats (images, websites, YouTube, and PDFs).
- Generate conversational content from multiple sources and formats (images, text, websites, YouTube, and PDFs).
- Generate shorts (2-5 minutes) or longform (30+ minutes) podcasts.
- Customize transcript and audio generation (e.g., style, language, structure).
- Generate transcripts using 100+ LLM models (OpenAI, Anthropic, Google etc).
Expand All @@ -75,13 +74,13 @@ This sample collection is also [available at audio.com](https://audio.com/thatup
- [Podcastfy-HuggingFace App](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo)
- [Podcastfy-UI](https://github.com/giulioco/podcastfy-ui)

## Updates 🚀
## Updates 🚀🚀

### v0.3.6+ release
- Generate shorts or longform podcasts!
- Generate podcasts from input topic using real-time internet search
### v0.4.0+ release
- Released new Multi-Speaker TTS model (is it the one NotebookLM uses?!?)
- Generate short or longform podcasts
- Generate podcasts from input topic using grounded real-time web search
- Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
- Integrate with Google's Multispeaker TTS model for high-quality audio generation

See [CHANGELOG](CHANGELOG.md) for more details.

Expand Down Expand Up @@ -112,13 +111,14 @@ python -m podcastfy.client --url <url1> --url <url2>

- [Python Package Quickstart](podcastfy.ipynb)

- [How to](usage/how-to.md)

- [Python Package Reference Manual](https://podcastfy.readthedocs.io/en/latest/podcastfy.html)

- [REST API Reference Manual](usage/api.md)

- [CLI](usage/cli.md)

- [How to](usage/how-to.md)

Experience Podcastfy with our [HuggingFace](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo) 🤗 Spaces app. (Note: This UI app is less extensively tested than the Python package.)

Expand All @@ -132,7 +132,7 @@ Podcastfy offers a range of customization options to tailor your AI-generated po

## License

This software is licensed under [Apache 2.0](LICENSE). [Here](usage/license-guide.md) are a few instructions if you would like to use podcastfy in your software.
This software is licensed under [Apache 2.0](LICENSE). See [instructions](usage/license-guide.md) if you would like to use podcastfy in your software.

## Contributing 🤝

Expand Down
25 changes: 0 additions & 25 deletions data/transcripts/Tharsis.txt

This file was deleted.

2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
project = 'podcastfy'
copyright = '2024, Tharsis T. P. Souza'
author = 'Tharsis T. P. Souza'
release = 'v0.3.1'
release = 'v0.4.0'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
9 changes: 1 addition & 8 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ The rapid expansion of digital content across various formats has intensified th

See [audio samples](https://github.com/souzatharsis/podcastfy?tab=readme-ov-file#audio-examples-).

<!--
# Use Cases

`Podcastfy` is designed to serve a wide range of applications, including:
Expand All @@ -55,7 +54,6 @@ See [audio samples](https://github.com/souzatharsis/podcastfy?tab=readme-ov-file
- **Researchers** can convert research papers, visual data, and technical content into conversational audio. This makes it easier for a wider audience, including those with disabilities, to consume and understand complex scientific information. Researchers can also create audio summaries of their work to enhance accessibility.

- **Accessibility Advocates** can use `Podcastfy` to promote digital accessibility by providing a tool that converts multimodal content into auditory formats. This helps individuals with visual impairments, dyslexia, or other disabilities that make it challenging to consume written or visual content.
-->


# Implementation and Architecture
Expand Down Expand Up @@ -190,7 +188,6 @@ generate_podcast(
The roles are set to "expert developer" and "learning developer" to create a natural teaching dynamic. The dialogue structure follows a logical progression from concept introduction through implementation and best practices. The engagement_techniques parameter ensures the content remains practical and applicable by incorporating code examples, real-world applications, and troubleshooting guidance. A moderate creativity setting (0.4) maintains technical accuracy while allowing for engaging explanations and examples.


<!--
## Storytelling Adventure

The following Python code demonstrates how to generate a storytelling podcast:
Expand Down Expand Up @@ -359,7 +356,6 @@ This example demonstrates how to use the `TextToSpeech` class to convert generat
- May require additional processing for users with specific accessibility needs.

These limitations highlight areas for future development and improvement of the framework. Users should carefully consider these constraints when implementing `Podcastfy` for their specific use cases and requirements.
-->

# Limitations

Expand All @@ -372,11 +368,8 @@ These limitations highlight areas for future development and improvement of the

`Podcastfy` contributes to multimodal content accessibility by enabling the programmatic transformation of digital content into conversational audio. The framework addresses accessibility needs through automated content summarization and natural-sounding speech synthesis. Its modular design and configurable options allow for flexible content processing and audio generation workflows that can be adapted for different use cases and requirements.

<!--
As an open-source project, `Podcastfy` benefits from continuous community-driven improvements and adaptations, helping support its long-term value and relevance in meeting evolving user requirements and accessibility standards.
We invite contributions from the community to further enhance the capabilities of `Podcastfy`. Whether it's by adding support for new input modalities, improving the quality of conversation generation, or optimizing the TTS synthesis, we welcome collaboration to make `Podcastfy` more powerful and versatile.
-->


# Acknowledgements

Expand Down
508 changes: 343 additions & 165 deletions podcastfy.ipynb

Large diffs are not rendered by default.

25 changes: 19 additions & 6 deletions podcastfy/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,24 @@
from typing import List, Optional, Dict, Any
import copy

import logging

# Configure logging to show all levels and write to both file and console
""" logging.basicConfig(
level=logging.DEBUG, # Show all levels of logs
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('podcastfy.log'), # Save to file
logging.StreamHandler() # Print to console
]
) """


logger = setup_logger(__name__)

app = typer.Typer()

os.environ["LANGCHAIN_TRACING_V2"] = "false"
os.environ["LANGCHAIN_TRACING_V2"] = "False"


def process_content(
Expand Down Expand Up @@ -70,7 +82,10 @@ def process_content(
content_extractor = ContentExtractor()

content_generator = ContentGenerator(
api_key=config.GEMINI_API_KEY, conversation_config=conv_config.to_dict()
is_local=is_local,
model_name=model_name,
api_key_label=api_key_label,
conversation_config=conv_config.to_dict()
)

combined_content = ""
Expand Down Expand Up @@ -102,16 +117,13 @@ def process_content(
combined_content,
image_file_paths=image_paths or [],
output_filepath=transcript_filepath,
is_local=is_local,
model_name=model_name,
api_key_label=api_key_label,
longform=longform
)

if generate_audio:
api_key = None
if tts_model != "edge":
api_key = getattr(config, f"{tts_model.upper()}_API_KEY")
api_key = getattr(config, f"{tts_model.upper().replace('MULTI', '')}_API_KEY")

text_to_speech = TextToSpeech(
model=tts_model,
Expand Down Expand Up @@ -300,6 +312,7 @@ def generate_podcast(
Optional[str]: Path to the final podcast audio file, or None if only generating a transcript.
"""
try:
print("Generating podcast...")
# Load default config
default_config = load_config()

Expand Down
6 changes: 3 additions & 3 deletions podcastfy/config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
content_generator:
llm_model: "gemini-1.5-pro-latest"
meta_llm_model: "gemini-1.5-flash"
meta_llm_model: "gemini-1.5-pro-latest"
max_output_tokens: 8192
prompt_template: "souzatharsis/podcastfy_multimodal_cleanmarkup"
prompt_commit: "b2365f11"
longform_prompt_template: "souzatharsis/podcastfy_longform"
longform_prompt_commit: "d6ac4601"
longform_prompt_commit: "acfdbc91" #"ff865019"
cleaner_prompt_template: "souzatharsis/podcastfy_longform_clean"
cleaner_prompt_commit: "8c110a0b"
rewriter_prompt_template: "souzatharsis/podcast_rewriter"
rewriter_prompt_commit: "6789eeca"
rewriter_prompt_commit: "8ee296fb"
content_extractor:
youtube_url_patterns:
- "youtube.com"
Expand Down
Loading

0 comments on commit a5b707c

Please sign in to comment.