Skip to content

Commit

Permalink
docs: add a snippet explaining the streaming example
Browse files Browse the repository at this point in the history
  • Loading branch information
wsxiaoys authored Oct 1, 2023
1 parent f7ecab5 commit 1fd3adb
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions website/blog/2023-09-30-stream-laziness-in-tabby/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,16 @@ async function client() {
// we know our stream is infinite, so there's no need to check `done`.
const { value } = await reader.read();
console.log(`read ${value}`);
await sleep(10ms);
}
}

server(llm());
client();
```

In this example, we are creating an async generator to mimic a LLM that produces string tokens. We then create an HTTP endpoint that wraps the generator, as well as a client that reads values from the HTTP stream. It's important to note that our generator logs `producing ${i}`, and our client logs `read ${value}`. The LLM inference could take an arbitrary amount of time to complete, simulated by a 1000ms sleep in the generator.

## Stream Laziness

If you were to run this program, you'd notice something interesting. We'll observe the LLM continuing to output `producing ${i}` even after the client has finished reading three times. This might seem obvious, given that the LLM is generating an infinite stream of integers. However, it represents a problem: our server must maintain an ever-expanding queue of items that have been pushed in but not pulled out.
Expand Down

0 comments on commit 1fd3adb

Please sign in to comment.