Update doc for server arguments #2742

simveit · 2025-01-05T19:37:20Z

Motivation

As explained here the current documentation of the backend needs update which we intend to implement here.

Checklist

Update documentation as needed, including docstrings or example tutorials.

zhaochenyang20

I love the detailed and educational docs of parameters. Having two suggestions:

We are documenting the official usage, so we can move the educational part to other unoffical repos, like my ML sys tutorial. 😂
Keep the things concise. If we want to explain the concept, I think just one sentence of educational explanation and give a link to details, which could be bettter.

zhaochenyang20 · 2025-01-06T19:34:10Z

docs/backend/server_arguments.md

-```
-
-</details>
+## Model and tokenizer


Cool. But for the docs, always keep one first-order title # and several second-order title ##, do not use forth-order title ####.

Adjusted to include Server Arguments title.

docs/backend/server_arguments.md

zhaochenyang20 · 2025-01-06T20:09:16Z

docs/backend/server_arguments.md

+
+* `tp_size`: This parameter is important if we have multiple GPUs and our model doesn't fit on a single GPU. *Tensor parallelism* means we distribute our model weights over multiple GPUs. Note that his technique is mainly aimed at *memory efficency* and not at a *higher throughput* as there is inter GPU communication needed to obtain the final output of each layer. For better understanding of the concept you may look for example [here](https://pytorch.org/tutorials/intermediate/TP_tutorial.html#how-tensor-parallel-works).
+
+* `stream_interval`: If we stream the output to the user this parameter determines at which interval we perform streaming. The interval length is measured in tokens.


I am not so sure. Could you double check this and make it more clear.

I will look this more carefully up. For now I left it as to do and come back to it at the end.

docs/backend/server_arguments.md

zhaochenyang20 · 2025-01-06T20:10:44Z

docs/backend/server_arguments.md

+
+* `random_seed`: Can be used to enforce deterministic behavior. 
+
+* `constrained_json_whitespace_pattern`: When using `Outlines` grammar backend we can use this to allow JSON with syntatic newlines, tabs or multiple spaces.


I think we can create a ## for constraint decoding parameters.

In general I think that we could restructure the whole sections. I suggest to do that after I included all parameters.

docs/backend/server_arguments.md

…ments-docs

…nto feature/server-arguments-docs

zhaochenyang20

Perfect! Thanks so much for help!

docs/backend/server_arguments.md

zhaochenyang20 · 2025-01-14T23:51:45Z

docs/backend/server_arguments.md

+* `dist_init_addr`: The TCP address used for initializing PyTorch’s distributed backend (e.g. `192.168.0.2:25000`).
+* `nnodes`: Total number of nodes in the cluster.
+* `node_rank`: Rank (ID) of this node among the `nnodes` in the distributed setup.
+

 ## Model override args in JSON


better call this ## Constraint Decoding

docs/backend/server_arguments.md

…lang into feature/server-arguments-docs

zhaochenyang20

Amazing work! We are close to the end!

zhaochenyang20 · 2025-01-16T20:21:49Z

docs/backend/server_arguments.md

-```
-
-</details>
+## Model and tokenizer


docs/backend/server_arguments.md

zhaochenyang20

Great, we are close to the end. Are there any parameters left? If not, after fixing these parameters, we can let yineng to review.

docs/backend/server_arguments.md

zhaochenyang20

Great! We made it!

zhaochenyang20 · 2025-01-18T22:20:09Z

docs/backend/server_arguments.md

@@ -66,7 +66,7 @@ In this document we aim to give an overview of the possible arguments when deplo
 * `watchdog_timeout`: Adjusts the watchdog thread’s timeout before killing the server if batch generation takes too long.
 * `download_dir`: Use to override the default Hugging Face cache directory for model weights.
 * `base_gpu_id`: Use to adjust first GPU used to distribute the model across available GPUs.
-
+* `allow_auto_truncate`: Automatically truncate requests that exceed the maximum input length.


zhaochenyang20 · 2025-01-18T22:21:21Z

@zhyncs Wait for a final go over.

Added model arguments

452a766

simveit force-pushed the feature/server-arguments-docs branch 2 times, most recently from fbc1a63 to abb44cf Compare January 6, 2025 19:13

Added sections on tensor and data parallelism

0a288d7

simveit force-pushed the feature/server-arguments-docs branch from abb44cf to 0a288d7 Compare January 6, 2025 19:18

zhaochenyang20 requested changes Jan 6, 2025

View reviewed changes

Shortened existing texts, added more parameters

b939c56

simveit force-pushed the feature/server-arguments-docs branch from 58efd67 to b939c56 Compare January 8, 2025 16:51

simveit added 4 commits January 9, 2025 16:48

Merge remote-tracking branch 'upstream/main' into feature/server-argu…

9084cfe

…ments-docs

Added arguments

4c0ae15

Merge remote-tracking branch 'upstream/main' into feature/server-argu…

17819d2

…ments-docs

Merge remote-tracking branch 'origin/feature/server-arguments-docs' i…

49b0815

…nto feature/server-arguments-docs

zhaochenyang20 requested changes Jan 14, 2025

View reviewed changes

zhaochenyang20 and others added 7 commits January 14, 2025 15:56

Merge branch 'main' into feature/server-arguments-docs

5cf32be

Merge branch 'sgl-project:main' into feature/server-arguments-docs

15a3e87

Adjusted descriptions

ce845c4

Merge branch 'sgl-project:main' into feature/server-arguments-docs

efa4e9c

Merge branch 'feature/server-arguments-docs' of github.com:simveit/sg…

d190c6e

…lang into feature/server-arguments-docs

Completed all arguments

30e818e

Remove argument

8759299

zhaochenyang20 requested changes Jan 16, 2025

View reviewed changes

simveit added 2 commits January 18, 2025 16:46

Refined argument description

342aa20

Merge branch 'main' into feature/server-arguments-docs

95ce466

zhaochenyang20 requested changes Jan 18, 2025

View reviewed changes

Adjusted descriptions.

452acac

zhaochenyang20 reviewed Jan 18, 2025

View reviewed changes

zhaochenyang20 marked this pull request as ready for review January 18, 2025 22:20

Merge branch 'main' into feature/server-arguments-docs

9e4788b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update doc for server arguments #2742

Update doc for server arguments #2742

simveit commented Jan 5, 2025

zhaochenyang20 left a comment

zhaochenyang20 Jan 6, 2025

simveit Jan 8, 2025

zhaochenyang20 Jan 16, 2025

zhaochenyang20 Jan 6, 2025

simveit Jan 8, 2025

zhaochenyang20 Jan 8, 2025

zhaochenyang20 Jan 6, 2025

simveit Jan 8, 2025

zhaochenyang20 left a comment

zhaochenyang20 Jan 14, 2025

zhaochenyang20 left a comment

zhaochenyang20 Jan 16, 2025

zhaochenyang20 left a comment

zhaochenyang20 left a comment

zhaochenyang20 Jan 18, 2025

zhaochenyang20 commented Jan 18, 2025


		* `tp_size`: This parameter is important if we have multiple GPUs and our model doesn't fit on a single GPU. Tensor parallelism means we distribute our model weights over multiple GPUs. Note that his technique is mainly aimed at memory efficency and not at a higher throughput as there is inter GPU communication needed to obtain the final output of each layer. For better understanding of the concept you may look for example [here](https://pytorch.org/tutorials/intermediate/TP_tutorial.html#how-tensor-parallel-works).

		* `stream_interval`: If we stream the output to the user this parameter determines at which interval we perform streaming. The interval length is measured in tokens.


		* `random_seed`: Can be used to enforce deterministic behavior.

		* `constrained_json_whitespace_pattern`: When using `Outlines` grammar backend we can use this to allow JSON with syntatic newlines, tabs or multiple spaces.

Update doc for server arguments #2742

Are you sure you want to change the base?

Update doc for server arguments #2742

Conversation

simveit commented Jan 5, 2025

Motivation

Checklist

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaochenyang20 left a comment

Choose a reason for hiding this comment

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaochenyang20 commented Jan 18, 2025