Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
codelion authored Nov 13, 2024
1 parent 6a3ffa7 commit 3ff58a3
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,25 @@ response = client.chat.completions.create(
)
```

You can also use the alternate decoding techniques like `cot_decoding` and `entropy_decoding` directly with the local inference server.

```python
response = client.chat.completions.create(
model="meta-llama/Llama-3.2-1B-Instruct",
messages=messages,
temperature=0.2,
extra_body={
"decoding": "cot_decoding", # or "entropy_decoding"
# CoT specific params
"k": 10,
"aggregate_paths": True,
# OR Entropy specific params
"top_k": 27,
"min_p": 0.03,
}
)
```

### Starting the optillm proxy with an external server (e.g. llama.cpp or ollama)

- Set the `OPENAI_API_KEY` env variable to a placeholder value
Expand Down

0 comments on commit 3ff58a3

Please sign in to comment.