Merge branch 'main' into grit-prod

getgrit · Jun 12, 2024 · 2bd453a · 2bd453a
2 parents 4369bcd + fb96f07
commit 2bd453a
Show file tree

Hide file tree

Showing 172 changed files with 206,752 additions and 6,511 deletions.
diff --git a/.gitignore b/.gitignore
@@ -59,3 +59,4 @@ myenv/*
 litellm/proxy/_experimental/out/404/index.html
 litellm/proxy/_experimental/out/model_hub/index.html
 litellm/proxy/_experimental/out/onboarding/index.html
+litellm/tests/log.txt
diff --git a/docs/my-website/docs/assistants.md b/docs/my-website/docs/assistants.md
@@ -150,7 +150,7 @@ $ litellm --config /path/to/config.yaml
 ```bash
 curl "http://0.0.0.0:4000/v1/assistants?order=desc&limit=20" \
   -H "Content-Type: application/json" \
-  -H "Authorization: Bearer sk-1234" \
+  -H "Authorization: Bearer sk-1234"
 ```
 
 **Create a Thread**
@@ -162,6 +162,14 @@ curl http://0.0.0.0:4000/v1/threads \
   -d ''
 ```
 
+**Get a Thread**
+
+```bash
+curl http://0.0.0.0:4000/v1/threads/{thread_id} \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-1234"
+```
+
 **Add Messages to the Thread**
 
 ```bash

diff --git a/docs/my-website/docs/caching/all_caches.md b/docs/my-website/docs/caching/all_caches.md
@@ -212,6 +212,94 @@ If you run the code two times, response1 will use the cache from the first run t
 
 </TabItem>
 
+</Tabs>
+
+## Switch Cache On / Off Per LiteLLM Call 
+
+LiteLLM supports 4 cache-controls:
+
+- `no-cache`: *Optional(bool)* When `True`, Will not return a cached response, but instead call the actual endpoint. 
+- `no-store`: *Optional(bool)* When `True`, Will not cache the response. 
+- `ttl`: *Optional(int)* - Will cache the response for the user-defined amount of time (in seconds).
+- `s-maxage`: *Optional(int)* Will only accept cached responses that are within user-defined range (in seconds).
+
+[Let us know if you need more](https://github.com/BerriAI/litellm/issues/1218)
+<Tabs>
+<TabItem value="no-cache" label="No-Cache">
+
+Example usage `no-cache` - When `True`, Will not return a cached response
+
+```python
+response = litellm.completion(
+        model="gpt-3.5-turbo",
+        messages=[
+            {
+                "role": "user",
+                "content": "hello who are you"
+            }
+        ],
+        cache={"no-cache": True},
+    )
+```
+
+</TabItem>
+
+<TabItem value="no-store" label="No-Store">
+
+Example usage `no-store` - When `True`, Will not cache the response. 
+
+```python
+response = litellm.completion(
+        model="gpt-3.5-turbo",
+        messages=[
+            {
+                "role": "user",
+                "content": "hello who are you"
+            }
+        ],
+        cache={"no-store": True},
+    )
+```
+
+</TabItem>
+
+<TabItem value="ttl" label="ttl">
+Example usage `ttl` - cache the response for 10 seconds
+
+```python
+response = litellm.completion(
+        model="gpt-3.5-turbo",
+        messages=[
+            {
+                "role": "user",
+                "content": "hello who are you"
+            }
+        ],
+        cache={"ttl": 10},
+    )
+```
+
+</TabItem>
+
+<TabItem value="s-maxage" label="s-maxage">
+Example usage `s-maxage` - Will only accept cached responses for 60 seconds
+
+```python
+response = litellm.completion(
+        model="gpt-3.5-turbo",
+        messages=[
+            {
+                "role": "user",
+                "content": "hello who are you"
+            }
+        ],
+        cache={"s-maxage": 60},
+    )
+```
+
+</TabItem>
+
+
 </Tabs>
 
 ## Cache Context Manager - Enable, Disable, Update Cache

diff --git a/docs/my-website/docs/observability/raw_request_response.md b/docs/my-website/docs/observability/raw_request_response.md
@@ -0,0 +1,46 @@
+import Image from '@theme/IdealImage';
+
+# Raw Request/Response Logging
+
+See the raw request/response sent by LiteLLM in your logging provider (OTEL/Langfuse/etc.).
+
+**on SDK**
+```python
+# pip install langfuse 
+import litellm
+import os
+
+# log raw request/response
+litellm.log_raw_request_response = True
+
+# from https://cloud.langfuse.com/
+os.environ["LANGFUSE_PUBLIC_KEY"] = ""
+os.environ["LANGFUSE_SECRET_KEY"] = ""
+# Optional, defaults to https://cloud.langfuse.com
+os.environ["LANGFUSE_HOST"] # optional
+
+# LLM API Keys
+os.environ['OPENAI_API_KEY']=""
+
+# set langfuse as a callback, litellm will send the data to langfuse
+litellm.success_callback = ["langfuse"] 
+
+# openai call
+response = litellm.completion(
+  model="gpt-3.5-turbo",
+  messages=[
+    {"role": "user", "content": "Hi 👋 - i'm openai"}
+  ]
+)
+```
+
+**on Proxy**
+
+```yaml
+litellm_settings:
+  log_raw_request_response: True
+```
+
+**Expected Log**
+
+<Image img={require('../../img/raw_request_log.png')}/>
diff --git a/...jects/llmcord.py (Discord LLM Chatbot).md → docs/my-website/docs/projects/llm_cord.md b/...jects/llmcord.py (Discord LLM Chatbot).md → docs/my-website/docs/projects/llm_cord.md
@@ -1,3 +1,5 @@
+# llmcord.py
+
 llmcord.py lets you and your friends chat with LLMs directly in your Discord server. It works with practically any LLM, remote or locally hosted.
 
 Github: https://github.com/jakobdylanc/discord-llm-chatbot
diff --git a/docs/my-website/docs/providers/anthropic.md b/docs/my-website/docs/providers/anthropic.md
@@ -11,7 +11,7 @@ LiteLLM supports
 
 :::info
 
-Anthropic API fails requests when `max_tokens` are not passed. Due to this litellm passes `max_tokens=4096` when no `max_tokens` are passed
+Anthropic API fails requests when `max_tokens` are not passed. Due to this litellm passes `max_tokens=4096` when no `max_tokens` are passed.
 
 :::
 
@@ -229,17 +229,6 @@ assert isinstance(
 
 ```
 
-### Setting `anthropic-beta` Header in Requests
-
-Pass the the `extra_headers` param to litellm, All headers will be forwarded to Anthropic API
-
-```python
-response = completion(
-    model="anthropic/claude-3-opus-20240229",
-    messages=messages,
-    tools=tools,
-)
-```
 
 ### Forcing Anthropic Tool Use