-
Notifications
You must be signed in to change notification settings - Fork 277
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
souzatharsis
committed
Nov 6, 2024
1 parent
c1fd8c8
commit 8067b98
Showing
11 changed files
with
918 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
|
||
# Podcastfy REST API Documentation | ||
|
||
## Overview | ||
|
||
The Podcastfy API allows you to programmatically generate AI podcasts from various input sources. This document outlines the API endpoints and their usage. | ||
|
||
## Using cURL with Podcastfy API | ||
|
||
### Prerequisites | ||
1. Confirm cURL installation: | ||
```bash | ||
curl --version | ||
``` | ||
|
||
### API Request Flow | ||
Making a prediction requires two sequential requests: | ||
1. POST request to initiate processing - returns an `EVENT_ID` | ||
2. GET request to fetch results - uses the `EVENT_ID` to fetch results | ||
|
||
Between step 1 and 2, there is a delay of 1-3 minutes. We are working on reducing this delay and implementing a way to notify the user when the podcast is ready. Thanks for your patience! | ||
|
||
### Basic Request Structure | ||
```bash | ||
# Step 1: POST request to initiate processing | ||
# Make sure to include http:// or https:// in the URL | ||
curl -X POST https://thatupiso-podcastfy-ai-demo.hf.space/gradio_api/call/process_inputs \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"data": [ | ||
"text_input", | ||
"https://yourwebsite.com", | ||
[], # pdf_files | ||
[], # image_files | ||
"gemini_key", | ||
"openai_key", | ||
"elevenlabs_key", | ||
2000, # word_count | ||
"engaging,fast-paced", # conversation_style | ||
"main summarizer", # roles_person1 | ||
"questioner", # roles_person2 | ||
"Introduction,Content,Conclusion", # dialogue_structure | ||
"PODCASTFY", # podcast_name | ||
"YOUR PODCAST", # podcast_tagline | ||
"openai", # tts_model | ||
0.7, # creativity_level | ||
"" # user_instructions | ||
] | ||
}' | ||
|
||
# Step 2: GET request to fetch results | ||
curl -N https://thatupiso-podcastfy-ai-demo.hf.space/gradio_api/call/process_inputs/$EVENT_ID | ||
|
||
|
||
# Example output result | ||
event: complete | ||
data: [{"path": "/tmp/gradio/bcb143f492b1c9a6dbde512557541e62f090bca083356be0f82c2e12b59af100/podcast_81106b4ca62542f1b209889832a421df.mp3", "url": "https://thatupiso-podcastfy-ai-demo.hf.space/gradio_a/gradio_api/file=/tmp/gradio/bcb143f492b1c9a6dbde512557541e62f090bca083356be0f82c2e12b59af100/podcast_81106b4ca62542f1b209889832a421df.mp3", "size": null, "orig_name": "podcast_81106b4ca62542f1b209889832a421df.mp3", "mime_type": null, "is_stream": false, "meta": {"_type": "gradio.FileData"}}] | ||
|
||
``` | ||
|
||
You can download the file by extending the URL prefix "https://thatupiso-podcastfy-ai-demo.hf.space/gradio_a/gradio_api/file=" with the path to the file in variable `path`. (Note: The variable "url" above has a bug introduced by Gradio, so please ignore it.) | ||
|
||
### Parameter Details | ||
| Index | Parameter | Type | Description | | ||
|-------|-----------|------|-------------| | ||
| 0 | text_input | string | Direct text input for podcast generation | | ||
| 1 | urls_input | string | URLs to process (include http:// or https://) | | ||
| 2 | pdf_files | array | List of PDF files to process | | ||
| 3 | image_files | array | List of image files to process | | ||
| 4 | gemini_key | string | Google Gemini API key | | ||
| 5 | openai_key | string | OpenAI API key | | ||
| 6 | elevenlabs_key | string | ElevenLabs API key | | ||
| 7 | word_count | number | Target word count for podcast | | ||
| 8 | conversation_style | string | Conversation style descriptors (e.g. "engaging,fast-paced") | | ||
| 9 | roles_person1 | string | Role of first speaker | | ||
| 10 | roles_person2 | string | Role of second speaker | | ||
| 11 | dialogue_structure | string | Structure of dialogue (e.g. "Introduction,Content,Conclusion") | | ||
| 12 | podcast_name | string | Name of the podcast | | ||
| 13 | podcast_tagline | string | Podcast tagline | | ||
| 14 | tts_model | string | Text-to-speech model ("gemini", "openai", "elevenlabs", or "edge") | | ||
| 15 | creativity_level | number | Level of creativity (0-1) | | ||
| 16 | user_instructions | string | Custom instructions for generation | | ||
|
||
|
||
## Using Python | ||
|
||
### Installation | ||
|
||
```bash | ||
pip install gradio_client | ||
``` | ||
|
||
### Quick Start | ||
|
||
```python | ||
from gradio_client import Client, handle_file | ||
|
||
client = Client("thatupiso/Podcastfy.ai_demo") | ||
``` | ||
|
||
### API Endpoints | ||
|
||
#### Generate Podcast (`/process_inputs`) | ||
|
||
Generates a podcast from provided text, URLs, PDFs, or images. | ||
|
||
##### Parameters | ||
|
||
| Parameter | Type | Required | Default | Description | | ||
|-----------|------|----------|---------|-------------| | ||
| text_input | str | Yes | - | Raw text input for podcast generation | | ||
| urls_input | str | Yes | - | Comma-separated URLs to process | | ||
| pdf_files | List[filepath] | Yes | None | List of PDF files to process | | ||
| image_files | List[filepath] | Yes | None | List of image files to process | | ||
| gemini_key | str | No | "" | Google Gemini API key | | ||
| openai_key | str | No | "" | OpenAI API key | | ||
| elevenlabs_key | str | No | "" | ElevenLabs API key | | ||
| word_count | float | No | 2000 | Target word count for podcast | | ||
| conversation_style | str | No | "engaging,fast-paced,enthusiastic" | Conversation style descriptors | | ||
| roles_person1 | str | No | "main summarizer" | Role of first speaker | | ||
| roles_person2 | str | No | "questioner/clarifier" | Role of second speaker | | ||
| dialogue_structure | str | No | "Introduction,Main Content Summary,Conclusion" | Structure of dialogue | | ||
| podcast_name | str | No | "PODCASTFY" | Name of the podcast | | ||
| podcast_tagline | str | No | "YOUR PERSONAL GenAI PODCAST" | Podcast tagline | | ||
| tts_model | Literal['openai', 'elevenlabs', 'edge'] | No | "openai" | Text-to-speech model | | ||
| creativity_level | float | No | 0.7 | Level of creativity (0-1) | | ||
| user_instructions | str | No | "" | Custom instructions for generation | | ||
|
||
##### Returns | ||
|
||
| Type | Description | | ||
|------|-------------| | ||
| filepath | Path to generated audio file | | ||
|
||
##### Example Usage | ||
|
||
```python | ||
from gradio_client import Client, handle_file | ||
|
||
client = Client("thatupiso/Podcastfy.ai_demo") | ||
|
||
# Generate podcast from URL | ||
result = client.predict( | ||
text_input="", | ||
urls_input="https://example.com/article", | ||
pdf_files=[], | ||
image_files=[], | ||
gemini_key="your-gemini-key", | ||
openai_key="your-openai-key", | ||
word_count=1500, | ||
conversation_style="casual,informative", | ||
podcast_name="Tech Talk", | ||
tts_model="openai", | ||
creativity_level=0.8 | ||
) | ||
|
||
print(f"Generated podcast: {result}") | ||
``` | ||
|
||
### Error Handling | ||
|
||
The API will return appropriate error messages for: | ||
- Invalid API keys | ||
- Malformed input | ||
- Failed file processing | ||
- TTS generation errors | ||
|
||
### Rate Limits | ||
|
||
Please be aware of the rate limits for the underlying services: | ||
- Gemini API | ||
- OpenAI API | ||
- ElevenLabs API | ||
|
||
## Notes | ||
|
||
- At least one input source (text, URL, PDF, or image) must be provided | ||
- API keys are required for corresponding services | ||
- The generated audio file format is MP3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Podcastfy Advanced Configuration Guide | ||
|
||
Podcastfy uses a `config.yaml` file to manage various settings and parameters. This guide explains each configuration option available in the file. | ||
|
||
|
||
|
||
## Content Generator | ||
|
||
- `gemini_model`: "gemini-1.5-pro-latest" | ||
- The Gemini AI model used for content generation. | ||
- `max_output_tokens`: 8192 | ||
- Maximum number of tokens for the output generated by the AI model. | ||
- `temperature`: 1 | ||
- Controls randomness in the AI's output. 0 means deterministic responses. Range for gemini-1.5-pro: 0.0 - 2.0 (default: 1.0) | ||
- `langchain_tracing_v2`: false | ||
- Enables LangChain tracing for debugging and monitoring. If true, requires langsmith api key | ||
|
||
## Content Extractor | ||
|
||
- `youtube_url_patterns`: | ||
- Patterns to identify YouTube URLs. | ||
- Current patterns: "youtube.com", "youtu.be" | ||
|
||
## Website Extractor | ||
|
||
- `markdown_cleaning`: | ||
- `remove_patterns`: | ||
- Patterns to remove from extracted markdown content. | ||
- Current patterns remove image links, hyperlinks, and URLs. | ||
|
||
## YouTube Transcriber | ||
|
||
- `remove_phrases`: | ||
- Phrases to remove from YouTube transcriptions. | ||
- Current phrase: "[music]" | ||
|
||
## Logging | ||
|
||
- `level`: "INFO" | ||
- Default logging level. | ||
- `format`: "%(asctime)s - %(name)s - %(levelname)s - %(message)s" | ||
- Format string for log messages. | ||
|
||
|
||
## Website Extractor | ||
|
||
- `markdown_cleaning`: | ||
- `remove_patterns`: | ||
- Additional patterns to remove from extracted markdown content: | ||
- '\[.*?\]': Remove square brackets and their contents | ||
- '\(.*?\)': Remove parentheses and their contents | ||
- '^\s*[-*]\s': Remove list item markers | ||
- '^\s*\d+\.\s': Remove numbered list markers | ||
- '^\s*#+': Remove markdown headers | ||
- `unwanted_tags`: | ||
- HTML tags to be removed during extraction: | ||
- 'script', 'style', 'nav', 'footer', 'header', 'aside', 'noscript' | ||
- `user_agent`: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' | ||
- User agent string to be used for web requests | ||
- `timeout`: 10 | ||
- Request timeout in seconds for web scraping | ||
|
||
|
Oops, something went wrong.