diff --git a/README.md b/README.md index fec2ab2..821730e 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,8 @@ This blueprint demonstrate how you can use open-source models & tools to convert input documents into a podcast featuring two speakers. It is designed to work on most local setups or with [GitHub Codespaces](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb), meaning no external API calls or GPU access is required. This makes it more accessible and privacy-friendly by keeping everything local. +
+ ### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/document-to-podcast/). ### Built with diff --git a/demo/notebook.ipynb b/demo/notebook.ipynb new file mode 100644 index 0000000..5a72c61 --- /dev/null +++ b/demo/notebook.ipynb @@ -0,0 +1,404 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Document to Podcast" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Source code: https://github.com/mozilla-ai/document-to-podcast" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Docs: https://mozilla-ai.github.io/document-to-podcast/" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebooks goes through the process of transforming documents into engaging podcast episodes involves an integration of pre-processing, LLM-powered transcript generation, and text-to-speech generation.\n", + "\n", + "For educational purposes, the \"low level\" API is used.\n", + "\n", + "You can check the [Command Line Interface](https://mozilla-ai.github.io/document-to-podcast/cli/) for a simpler usage." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## GPU Check" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, you'll need to enable GPUs for the notebook:\n", + "\n", + "- Navigate to `Edit`→`Notebook Settings`\n", + "- Select T4 GPU from the Hardware Accelerator section\n", + "- Click `Save` and accept.\n", + "\n", + "Next, we'll confirm that we can connect to the GPU:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "\n", + "if not torch.cuda.is_available():\n", + " raise RuntimeError(\"GPU not available\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Installing dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install --quiet https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.4-cu122/llama_cpp_python-0.3.4-cp310-cp310-linux_x86_64.whl\n", + "%pip install --quiet document-to-podcast" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Uploading data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import files\n", + "\n", + "uploaded = files.upload()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loading and cleaning data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Docs for this Step](https://mozilla-ai.github.io/document-to-podcast/step-by-step-guide/#step-1-document-pre-processing)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "from document_to_podcast.preprocessing import DATA_CLEANERS, DATA_LOADERS\n", + "\n", + "input_file = list(uploaded.keys())[0]\n", + "suffix = Path(input_file).suffix\n", + "\n", + "data_loader = DATA_LOADERS[suffix]\n", + "data_cleaner = DATA_CLEANERS[suffix]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "raw_text = data_loader(input_file)\n", + "print(f\"Number of characters before cleaning: {len(raw_text)}\")\n", + "print(raw_text[:200])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "clean_text = data_cleaner(raw_text)\n", + "print(f\"Number of characters after cleaning: {len(clean_text)}\")\n", + "print(clean_text[:200])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Downloading and loading models" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Docs for this Step](https://mozilla-ai.github.io/document-to-podcast/step-by-step-guide/#step-2-podcast-script-generation)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this demo, we are using the following models:\n", + " - [OLMoE-1B-7B-0924-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct-GGUF)\n", + " - [OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf](https://huggingface.co/OuteAI/OuteTTS-0.2-500M-GGUF)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can check the [Customization Guide](https://mozilla-ai.github.io/document-to-podcast/customization/) for more information on how to use different models." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from document_to_podcast.inference.model_loaders import (\n", + " load_llama_cpp_model,\n", + " load_outetts_model,\n", + ")\n", + "\n", + "text_model = load_llama_cpp_model(\n", + " \"allenai/OLMoE-1B-7B-0924-Instruct-GGUF/olmoe-1b-7b-0924-instruct-q8_0.gguf\"\n", + ")\n", + "speech_model = load_outetts_model(\n", + " \"OuteAI/OuteTTS-0.2-500M-GGUF/OuteTTS-0.2-500M-FP16.gguf\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "max_characters = text_model.n_ctx() * 4\n", + "if len(clean_text) > max_characters:\n", + " print(\n", + " f\"Input text is too big ({len(clean_text)}).\"\n", + " f\" Using only a subset of it ({max_characters}).\"\n", + " )\n", + " clean_text = clean_text[:max_characters]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Podcast generation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Docs for this Step](https://mozilla-ai.github.io/document-to-podcast/step-by-step-guide/#step-3-audio-podcast-generation)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Speaker configuration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from document_to_podcast.config import Speaker\n", + "\n", + "speakers = [\n", + " {\n", + " \"id\": 1,\n", + " \"name\": \"Laura\",\n", + " \"description\": \"The main host. She explains topics clearly using anecdotes and analogies, teaching in an engaging and captivating way.\",\n", + " \"voice_profile\": \"female_1\",\n", + " },\n", + " {\n", + " \"id\": 2,\n", + " \"name\": \"Jon\",\n", + " \"description\": \"The co-host. He keeps the conversation on track, asks curious follow-up questions, and reacts with excitement or confusion, often using interjections like hmm or umm.\",\n", + " \"voice_profile\": \"male_1\",\n", + " },\n", + "]\n", + "\n", + "speakers_str = \"\\n\".join(\n", + " str(Speaker.model_validate(speaker))\n", + " for speaker in speakers\n", + " if all(speaker.get(x, None) for x in [\"name\", \"description\", \"voice_profile\"])\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prompt Configuration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PROMPT = \"\"\"\n", + "You are a podcast scriptwriter generating engaging and natural-sounding conversations in JSON format.\n", + "The script features the following speakers:\n", + "{SPEAKERS}\n", + "Instructions:\n", + "- Write dynamic, easy-to-follow dialogue.\n", + "- Include natural interruptions and interjections.\n", + "- Avoid repetitive phrasing between speakers.\n", + "- Format output as a JSON conversation.\n", + "Example:\n", + "{\n", + " \"Speaker 1\": \"Welcome to our podcast! Today, we're exploring...\",\n", + " \"Speaker 2\": \"Hi! I'm excited to hear about this. Can you explain...\",\n", + " \"Speaker 1\": \"Sure! Imagine it like this...\",\n", + " \"Speaker 2\": \"Oh, that's cool! But how does...\"\n", + "}\n", + "\"\"\"\n", + "system_prompt = PROMPT.replace(\"{SPEAKERS}\", speakers_str)\n", + "print(system_prompt)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model inference" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "\n", + "from document_to_podcast.inference.text_to_speech import text_to_speech\n", + "from document_to_podcast.inference.text_to_text import text_to_text_stream\n", + "from IPython.display import display, Audio\n", + "\n", + "podcast_audio = []\n", + "podcast_script = \"\"\n", + "text = \"\"\n", + "for chunk in text_to_text_stream(\n", + " clean_text, text_model, system_prompt=system_prompt.strip()\n", + "):\n", + " text += chunk\n", + " if text.endswith(\"\\n\") and \"Speaker\" in text:\n", + " podcast_script += text\n", + " print(text)\n", + "\n", + " speaker_id = re.search(r\"Speaker (\\d+)\", text).group(1)\n", + " voice_profile = next(\n", + " speaker[\"voice_profile\"]\n", + " for speaker in speakers\n", + " if speaker[\"id\"] == int(speaker_id)\n", + " )\n", + " speech = text_to_speech(\n", + " text.split(f'\"Speaker {speaker_id}\":')[-1],\n", + " speech_model,\n", + " voice_profile,\n", + " )\n", + " podcast_audio.append(speech)\n", + " display(Audio(speech, rate=speech_model.audio_codec.sr))\n", + " text = \"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Save the results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can download the results from the file explorer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open(\"podcast.txt\", \"w\") as f:\n", + " f.write(podcast_script)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import soundfile as sf\n", + "\n", + "sf.write(\n", + " \"podcast.wav\",\n", + " np.concatenate(podcast_audio),\n", + " samplerate=speech_model.audio_codec.sr,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/getting-started.md b/docs/getting-started.md index 890cf09..01f52b3 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -1,38 +1,55 @@ -Get started with Document-to-Podcast using one of the two options below: **GitHub Codespaces** for a hassle-free setup or **Local Installation** for running on your own machine. - +Get started with Document-to-Podcast using one of the options below: --- -### ☁️ **Option 1: GitHub Codespaces** +## Setup options + +=== "☁️ Google Colab (GPU)" + + The easiest way to play with the code on a GPU, for free. + + Click the button below to launch the project directly in Google Colab: + + + +=== "☁️ GitHub Codespaces" + + Click the button below to launch the project directly in GitHub Codespaces: + + + + Once the Codespaces environment launches, inside the terminal, start the Streamlit demo by running: + + ```bash + python -m streamlit run demo/app.py + ``` + +=== "💻 pip Installation" -The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces: + You can install the project from Pypi: -[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb) + ```bash + pip install document-to-podcast + ``` -Once the Codespaces environment launches, inside the terminal, start the Streamlit demo by running: -```bash -python -m streamlit run demo/app.py -``` + Check the [Command Line Interface](./cli.md) guide. +=== "💻 Editable Installation" -### 💻 **Option 2: Local Installation** -1.**Clone the Repository** + 1. **Clone the Repository** -Inside your terminal, run: -```bash - git clone https://github.com/mozilla-ai/document-to-podcast.git - cd document-to-podcast -``` -2. **Install Dependencies** + ```bash + git clone https://github.com/mozilla-ai/document-to-podcast.git + cd document-to-podcast + ``` - Inside your terminal, run: + 2. **Install the project and its Dependencies** -```bash -pip install -e . -``` -3. **Run the Demo** + ```bash + pip install -e . + ``` - Inside your terminal, start the Streamlit demo by running: + 3. **Run the Demo** -```bash -python -m streamlit run demo/app.py -``` + ```bash + python -m streamlit run demo/app.py + ``` diff --git a/mkdocs.yml b/mkdocs.yml index e4e48bf..2e4efa1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -30,6 +30,7 @@ theme: - assets/custom.css features: - content.code.copy + - content.tabs.link markdown_extensions: - pymdownx.highlight: @@ -39,7 +40,8 @@ markdown_extensions: - pymdownx.inlinehilite - pymdownx.snippets - pymdownx.superfences - + - pymdownx.tabbed: + alternate_style: true plugins: - search - mkdocstrings: