Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for docs on deepspeed inference #482

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

josemduarte
Copy link
Collaborator

The main issue is that the parameter name did not coincide with the code.

Question: could the deepspeed extra dependency on CUTLASS be baked into the docker image? I could give it a try but I'm wondering if someone has come across issues with that.

@josemduarte
Copy link
Collaborator Author

Question: could the deepspeed extra dependency on CUTLASS be baked into the docker image? I could give it a try but I'm wondering if someone has come across issues with that.

This patch would do the basic setup (that would still require compilation at first run): josemduarte@0183be7 .

@ljarosch
Copy link
Collaborator

ljarosch commented Dec 7, 2024

Good catch on the wrong parameter name! CUTLASS should be installed with the install_third_party_dependencies.sh script though, which is already in the docs (see Step 3).

You're right though that this step is currently missing from the Docker image, we will update that.

@@ -147,7 +147,7 @@ Some commonly used command line flags are here. A full list of flags can be view

The **DeepSpeed DS4Sci_EvoformerAttention kernel** is a memory-efficient attention kernel developed as part of a collaboration between OpenFold and the DeepSpeed4Science initiative.

If your system supports deepseed, using deepspeed generally leads an inference speedup of 2 - 3x without significant additional memory use. You may specify this option by selecting the `--use_deepspeed_inference` argument.
If your system supports deepspeed, using deepspeed generally leads an inference speedup of 2 - 3x without significant additional memory use. You may specify this option by selecting the `--use_deepspeed_evoformer_attention` argument. An additional requirement for this option is the [CUTLASS repository](https://github.com/NVIDIA/cutlass). You will need to clone it and set environment variable `CUTLASS_PATH` to point to it, see [instructions](https://www.deepspeed.ai/tutorials/ds4sci_evoformerattention/).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josemduarte Would you mind removing the line about CUTLASS given that it should be covered by the standard setup already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants