Skip to content

Commit

Permalink
Merge pull request #175 from souzatharsis/feat/longform
Browse files Browse the repository at this point in the history
Feat/longform
  • Loading branch information
souzatharsis authored Nov 13, 2024
2 parents 83edfd8 + b0a33cf commit 20085dd
Show file tree
Hide file tree
Showing 17 changed files with 1,374 additions and 257 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Changelog

## [0.3.6] - 2024-11-13

### Added
- Add longform podcast generation support
- Users can now generate longer podcasts (20-30+ minutes) using the `--longform` flag in CLI or `longform=True` in Python API
- Implements "Content Chunking with Contextual Linking" technique for coherent long-form content
- Configurable via `max_num_chunks` and `min_chunk_size` parameters in conversation config
- `word_count` parameter removed from conversation config as it's no longer used

## [0.3.3] - 2024-11-08

### Breaking Changes
Expand Down
23 changes: 13 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,9 @@ https://github.com/user-attachments/assets/f1559e70-9cf9-4576-b48b-87e7dad1dd0b
![GitHub Repo stars](https://img.shields.io/github/stars/souzatharsis/podcastfy)
</div>

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, images, YouTube videos, as well as user provided topics.


Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, YouTube videos, as well as images.

Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM ❤️), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources, enabling customization and scale.
Unlike closed-source UI-based tools focused primarily on research synthesis (e.g. NotebookLM ❤️), Podcastfy focuses on open source, programmatic and bespoke generation of engaging, conversational content from a multitude of multi-modal sources, enabling customization and scale.

[![Star History Chart](https://api.star-history.com/svg?repos=souzatharsis/podcastfy&type=Date&theme=dark)](https://api.star-history.com/svg?repos=souzatharsis/podcastfy&type=Date&theme=dark)

Expand Down Expand Up @@ -61,16 +59,26 @@ This sample collection is also [available at audio.com](https://audio.com/thatup
## Features ✨

- Generate conversational content from multiple sources and formats (images, websites, YouTube, and PDFs).
- Customize transcript and audio generation (e.g., style, language, structure, length).
- Generate shorts (2-5 minutes) or longform (30+ minutes) podcasts.
- Customize transcript and audio generation (e.g., style, language, structure).
- Generate transcripts using 100+ LLM models (OpenAI, Anthropic, Google etc).
- Leverage local LLMs for transcript generation for increased privacy and control.
- Integrate with advanced text-to-speech models (OpenAI, Google, ElevenLabs, and Microsoft Edge).
- Provide multi-language support for global content creation.
- Integrate seamlessly with CLI and Python packages for automated workflows.

## Built with Podcastfy 🚀

- [OpenNotebook](https://www.open-notebook.ai/)
- [SurfSense](https://www.surfsense.net/)
- [Podcast-llm](https://github.com/evandempsey/podcast-llm)
- [Podcastfy-HuggingFace App](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo)
- [Podcastfy-UI](https://github.com/giulioco/podcastfy-ui)

## Updates 🚀

### v0.3.0+ release
- Generate longform podcasts
- Generate podcasts from input topic using real-time internet search
- Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
- Integrate with Google's Multispeaker TTS model for high-quality audio generation
Expand Down Expand Up @@ -121,11 +129,6 @@ Podcastfy offers a range of customization options to tailor your AI-generated po
- Choose to run [Local LLMs](usage/local_llm.md) (156+ HuggingFace models)
- Set [System Settings](usage/config_custom.md) (e.g. output directory settings)

## Built with Podcastfy 🛠️

- [OpenNotebook](www.open-notebook.ai)
- [Podcastfy-UI](https://github.com/giulioco/podcastfy-ui)
- [Podcastfy-Gradio App](https://huggingface.co/spaces/thatupiso/Podcastfy.ai_demo)

## License

Expand Down
1 change: 1 addition & 0 deletions TESTIMONIALS.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
- "Love that you casually built an open source version of the most popular product Google built in the last decade"
- "Your library was very straightforward to work with. You did Amazing work brother 🙏"
- "I think it's awesome that you were inspired/recognize how hard it is to beat NotebookLM's quality, but you did an *incredible* job with this! It sounds incredible, and it's open-source! Thank you for being amazing!"
65 changes: 62 additions & 3 deletions podcastfy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,22 @@
"metadata": {},
"source": [
"# Podcastfy \n",
"Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI"
"Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Features\n",
"\n",
"- Support multiple input sources (text, images, websites, YouTube, and PDFs).\n",
"- Generate shorts (2-5 minutes) or longform (30+ minutes) podcasts.\n",
"- Customize transcript and audio generation (e.g., style, language, structure).\n",
"- Generate transcripts using 100+ LLM models (OpenAI, Anthropic, Google etc).\n",
"- Leverage local LLMs for transcript generation for increased privacy and control.\n",
"- Integrate with advanced text-to-speech models (OpenAI, Google, ElevenLabs, and Microsoft Edge).\n",
"- Provide multi-language support for global content creation."
]
},
{
Expand All @@ -19,6 +34,7 @@
"- Generate a podcast from text content\n",
" - Single URL\n",
" - Multiple URLs\n",
" - Generate longform podcasts\n",
" - Generate transcript only\n",
" - Generate audio from transcript\n",
" - Processing PDFs\n",
Expand Down Expand Up @@ -124,8 +140,7 @@
],
"source": [
"from podcastfy.client import generate_podcast\n",
"audio_file = generate_podcast(urls=[\"https://abcnews.go.com/US/water-frost-detected-mars-volcanoes-significant-discovery-study/story?id=110993572\"], \n",
" transcript_only=True)"
"audio_file = generate_podcast(urls=[\"https://abcnews.go.com/US/water-frost-detected-mars-volcanoes-significant-discovery-study/story?id=110993572\"])"
]
},
{
Expand Down Expand Up @@ -271,6 +286,50 @@
"However, this particular transcript did not pickup on my Podcastify's content solely focusing on the youtube video. This may happen as the AI-Podcast hosts may pick a particular concept from one of the provided sources and develop a conversation around that. There is room for improvement in guiding the AI-Podcasts hosts to strike a good balance of content coverage among the provided input sources."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate longform podcasts\n",
"\n",
"\n",
"By default, Podcastfy generates shortform podcasts. However, users can generate longform podcasts by setting the `longform` parameter to `True`.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"generate_podcast(urls=[\"<website>\"], longform=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"LLMs have a limited ability to output long text responses. Most LLMs have a `max_output_tokens` of around 4096 and 8192 tokens. Hence, long-form podcast transcript generation is challeging. We have implemented a technique I call \"Content Chunking with Contextual Linking\" to enable long-form podcast generation by breaking down the input content into smaller chunks and generating a conversation for each chunk while ensuring the combined transcript is coherent and linked to the original input.\n",
"\n",
"By default, shortform podcasts (default configuration) generate audio of about 2-5 minutes while longform podcasts may reach 20-30 minutes.\n",
"\n",
"### Adjusting longform podcast length\n",
"\n",
"Users may adjust lonform podcast length by setting the following parameters in your customization params (see later section \"Conversation Customization\"):\n",
"- `max_num_chunks` (default: 7): Sets maximum number of rounds of discussions.\n",
"- `min_chunk_size` (default: 600): Sets minimum number of characters to generate a round of discussion.\n",
"\n",
"A \"round of discussion\" is the output transcript obtained from a single LLM call. The higher the `max_num_chunks` and the lower the `min_chunk_size`, the longer the generated podcast will be.\n",
"Today, this technique allows the user to generate long-form podcasts of any length if input content is long enough. However, the conversation quality may decrease and its length may converge to a maximum if `max_num_chunks`/`min_chunk_size` is to high/low particularly if input content length is limited.\n",
"\n",
"Current implementation limitations:\n",
"- Images are not yet supported for longform podcast generation.\n",
"- Base LLM model is fixed to Gemini\n",
"\n",
"Above limitations are somewhat easily fixable however we chose to make updates in smaller but quick iterations rather than making all-in changes.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
2 changes: 1 addition & 1 deletion podcastfy/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# This file can be left empty for now
__version__ = "0.3.3" # or whatever version you're on
__version__ = "0.3.5" # or whatever version you're on
30 changes: 25 additions & 5 deletions podcastfy/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ def process_content(
model_name: Optional[str] = None,
api_key_label: Optional[str] = None,
topic: Optional[str] = None,
longform: bool = False
):
"""
Process URLs, a transcript file, image paths, or raw text to generate a podcast or transcript.
Expand All @@ -54,7 +55,6 @@ def process_content(
# Update with provided config if any
if conversation_config:
conv_config.configure(conversation_config)

# Get output directories from conversation config
tts_config = conv_config.get("text_to_speech", {})
output_directories = tts_config.get("output_directories", {})
Expand All @@ -64,21 +64,29 @@ def process_content(
with open(transcript_file, "r") as file:
qa_content = file.read()
else:
# Initialize content_extractor if needed
content_extractor = None
if urls or topic or (text and longform and len(text.strip()) < 100):
content_extractor = ContentExtractor()

content_generator = ContentGenerator(
api_key=config.GEMINI_API_KEY, conversation_config=conv_config.to_dict()
)

combined_content = ""
if urls or topic:
content_extractor = ContentExtractor()


if urls:
logger.info(f"Processing {len(urls)} links")
contents = [content_extractor.extract_content(link) for link in urls]
combined_content += "\n\n".join(contents)

if text:
combined_content += f"\n\n{text}"
if longform and len(text.strip()) < 100:
logger.info("Text too short for direct long-form generation. Extracting context...")
expanded_content = content_extractor.generate_topic_content(text)
combined_content += f"\n\n{expanded_content}"
else:
combined_content += f"\n\n{text}"

if topic:
topic_content = content_extractor.generate_topic_content(topic)
Expand All @@ -97,6 +105,7 @@ def process_content(
is_local=is_local,
model_name=model_name,
api_key_label=api_key_label,
longform=longform
)

if generate_audio:
Expand Down Expand Up @@ -171,6 +180,12 @@ def main(
topic: str = typer.Option(
None, "--topic", "-tp", help="Topic to generate podcast about"
),
longform: bool = typer.Option(
False,
"--longform",
"-lf",
help="Generate long-form content (only available for text input without images)"
),
):
"""
Generate a podcast or transcript from a list of URLs, a file containing URLs, a transcript file, image files, or raw text.
Expand Down Expand Up @@ -204,6 +219,7 @@ def main(
model_name=llm_model_name,
api_key_label=api_key_label,
topic=topic,
longform=longform
)
else:
urls_list = urls or []
Expand All @@ -227,6 +243,7 @@ def main(
model_name=llm_model_name,
api_key_label=api_key_label,
topic=topic,
longform=longform
)

if transcript_only:
Expand Down Expand Up @@ -259,6 +276,7 @@ def generate_podcast(
llm_model_name: Optional[str] = None,
api_key_label: Optional[str] = None,
topic: Optional[str] = None,
longform: bool = False,
) -> Optional[str]:
"""
Generate a podcast or transcript from a list of URLs, a file containing URLs, a transcript file, or image files.
Expand Down Expand Up @@ -324,6 +342,7 @@ def generate_podcast(
model_name=llm_model_name,
api_key_label=api_key_label,
topic=topic,
longform=longform
)
else:
urls_list = urls or []
Expand All @@ -349,6 +368,7 @@ def generate_podcast(
model_name=llm_model_name,
api_key_label=api_key_label,
topic=topic,
longform=longform
)

except Exception as e:
Expand Down
11 changes: 9 additions & 2 deletions podcastfy/config.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
content_generator:
gemini_model: "gemini-1.5-pro-latest"
llm_model: "gemini-1.5-pro-latest"
meta_llm_model: "gemini-1.5-flash"
max_output_tokens: 8192
prompt_template: "souzatharsis/podcastfy_multimodal_cleanmarkup"
prompt_commit: "6c74ab51"
prompt_commit: "b2365f11"
longform_prompt_template: "souzatharsis/podcastfy_longform"
longform_prompt_commit: "d6ac4601"
cleaner_prompt_template: "souzatharsis/podcastfy_longform_clean"
cleaner_prompt_commit: "8c110a0b"
rewriter_prompt_template: "souzatharsis/podcast_rewriter"
rewriter_prompt_commit: "6789eeca"
content_extractor:
youtube_url_patterns:
- "youtube.com"
Expand Down
Loading

0 comments on commit 20085dd

Please sign in to comment.