Skip to content

Commit

Permalink
Merge branch 'main' into grit-prod
Browse files Browse the repository at this point in the history
  • Loading branch information
morgante committed Jun 12, 2024
2 parents 4369bcd + fb96f07 commit 2bd453a
Show file tree
Hide file tree
Showing 172 changed files with 206,752 additions and 6,511 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,4 @@ myenv/*
litellm/proxy/_experimental/out/404/index.html
litellm/proxy/_experimental/out/model_hub/index.html
litellm/proxy/_experimental/out/onboarding/index.html
litellm/tests/log.txt
10 changes: 9 additions & 1 deletion docs/my-website/docs/assistants.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ $ litellm --config /path/to/config.yaml
```bash
curl "http://0.0.0.0:4000/v1/assistants?order=desc&limit=20" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-H "Authorization: Bearer sk-1234"
```

**Create a Thread**
Expand All @@ -162,6 +162,14 @@ curl http://0.0.0.0:4000/v1/threads \
-d ''
```

**Get a Thread**

```bash
curl http://0.0.0.0:4000/v1/threads/{thread_id} \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234"
```

**Add Messages to the Thread**

```bash
Expand Down
88 changes: 88 additions & 0 deletions docs/my-website/docs/caching/all_caches.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,94 @@ If you run the code two times, response1 will use the cache from the first run t

</TabItem>

</Tabs>

## Switch Cache On / Off Per LiteLLM Call

LiteLLM supports 4 cache-controls:

- `no-cache`: *Optional(bool)* When `True`, Will not return a cached response, but instead call the actual endpoint.
- `no-store`: *Optional(bool)* When `True`, Will not cache the response.
- `ttl`: *Optional(int)* - Will cache the response for the user-defined amount of time (in seconds).
- `s-maxage`: *Optional(int)* Will only accept cached responses that are within user-defined range (in seconds).

[Let us know if you need more](https://github.com/BerriAI/litellm/issues/1218)
<Tabs>
<TabItem value="no-cache" label="No-Cache">

Example usage `no-cache` - When `True`, Will not return a cached response

```python
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "hello who are you"
}
],
cache={"no-cache": True},
)
```

</TabItem>

<TabItem value="no-store" label="No-Store">

Example usage `no-store` - When `True`, Will not cache the response.

```python
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "hello who are you"
}
],
cache={"no-store": True},
)
```

</TabItem>

<TabItem value="ttl" label="ttl">
Example usage `ttl` - cache the response for 10 seconds

```python
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "hello who are you"
}
],
cache={"ttl": 10},
)
```

</TabItem>

<TabItem value="s-maxage" label="s-maxage">
Example usage `s-maxage` - Will only accept cached responses for 60 seconds

```python
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "hello who are you"
}
],
cache={"s-maxage": 60},
)
```

</TabItem>


</Tabs>

## Cache Context Manager - Enable, Disable, Update Cache
Expand Down
46 changes: 46 additions & 0 deletions docs/my-website/docs/observability/raw_request_response.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import Image from '@theme/IdealImage';

# Raw Request/Response Logging

See the raw request/response sent by LiteLLM in your logging provider (OTEL/Langfuse/etc.).

**on SDK**
```python
# pip install langfuse
import litellm
import os

# log raw request/response
litellm.log_raw_request_response = True

# from https://cloud.langfuse.com/
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
# Optional, defaults to https://cloud.langfuse.com
os.environ["LANGFUSE_HOST"] # optional

# LLM API Keys
os.environ['OPENAI_API_KEY']=""

# set langfuse as a callback, litellm will send the data to langfuse
litellm.success_callback = ["langfuse"]

# openai call
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hi 👋 - i'm openai"}
]
)
```

**on Proxy**

```yaml
litellm_settings:
log_raw_request_response: True
```
**Expected Log**
<Image img={require('../../img/raw_request_log.png')}/>
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# llmcord.py

llmcord.py lets you and your friends chat with LLMs directly in your Discord server. It works with practically any LLM, remote or locally hosted.

Github: https://github.com/jakobdylanc/discord-llm-chatbot
13 changes: 1 addition & 12 deletions docs/my-website/docs/providers/anthropic.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ LiteLLM supports

:::info

Anthropic API fails requests when `max_tokens` are not passed. Due to this litellm passes `max_tokens=4096` when no `max_tokens` are passed
Anthropic API fails requests when `max_tokens` are not passed. Due to this litellm passes `max_tokens=4096` when no `max_tokens` are passed.

:::

Expand Down Expand Up @@ -229,17 +229,6 @@ assert isinstance(

```

### Setting `anthropic-beta` Header in Requests

Pass the the `extra_headers` param to litellm, All headers will be forwarded to Anthropic API

```python
response = completion(
model="anthropic/claude-3-opus-20240229",
messages=messages,
tools=tools,
)
```

### Forcing Anthropic Tool Use

Expand Down
Loading

0 comments on commit 2bd453a

Please sign in to comment.