Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getter and setter methods for compile_backend across accelerators. #5299

Merged

Conversation

vshekhawat-hlab
Copy link
Contributor

@vshekhawat-hlab vshekhawat-hlab commented Mar 19, 2024

Add getter and setter methods for compile_backend across accelerators, which provide a mechanism to retrieve the compile backend. These APIs handle user-defined backend selection and raise a ValueError with informative error messages for unsupported backends.

@vshekhawat-hlab vshekhawat-hlab changed the title Added get_compile_backend API in accelrators. Added get_compile_backend API in accelerator. Mar 19, 2024
@vshekhawat-hlab
Copy link
Contributor Author

Hi @loadams ,

Added support for retrieving the preferred compile backend from the accelerator. However, I'm unsure about the preferred backend for each accelerator. Therefore, I've set the default to 'inductor'. Could you please review this?

@tjruwase
Copy link
Contributor

@vshekhawat-hlab, thanks for adding this API. I think it is a great start to addressing generalization issue raised by @delock. However, there is a bit more work required to make this consistent with

class CompileConfig(DeepSpeedConfigModel):
.
Also, I think there are few questions that need resolving. I will post them below as separate comment to reduce clutter.

@tjruwase
Copy link
Contributor

Since users can specify compile_backend (string or Callable) through either ds_config or set_backend(), which value takes precedence between user specification and preferred/default backend in accelerator class?

I think user specification should have higher precedence.

@tjruwase
Copy link
Contributor

tjruwase commented Mar 19, 2024

Assuming user specification has higher precedence, how can users select the accelerator preferred backend without having to read the code? I see two options:

  1. Users can implicitly select the accelerator preferred backend by leaving backend undefined in ds_config, and by not using set_backend() API, or
  2. Alternatively, users can explicitly use a special backend name, e.g., "accelerator".

@tjruwase
Copy link
Contributor

What is the expected behavior if user specifies backend that is not supported by accelerator? I see three options:

  1. Fail immediately with meaningful error message.
  2. Disable compilation with warning message.
  3. Switch to accelerator preferred backend with warning message.

@tjruwase
Copy link
Contributor

@delock, @tohtana, @umchand, @vshekhawat-hlab, @loadams, I will appreciate your thoughts on the above questions. Thanks!

@vshekhawat-hlab
Copy link
Contributor Author

vshekhawat-hlab commented Mar 20, 2024

Thanks @tjruwase for review.

My views on above:
Query1) Yes, i think user specification should get higher precedence.
Query3) I think option 1, as the user is expected to run with the user-specified backend, that is the reason user passed the backend. However option3 is also good option.

Updated get_compile_backend, let me know your views on this.

def get_compile_backend(self, backend=None):
    supported_backends = torch._dynamo.list_backends()
    if backend is None:
        return preferred_backend
    elif backend in supported_backends:
        return backend
    else:
        raise ValueError(f"{backend} not supported by {self.device_name()}. Supported Backends are {supported_backends }") 

`

@loadams
Copy link
Contributor

loadams commented Mar 20, 2024

Thanks @tjruwase for review.

My views on above: Query1) Yes, i think user specification should get higher precedence. Query3) I think option 1, as the user is expected to run with the user-specified backend, that is the reason user passed the backend. However option3 is also good option.

Updated get_compile_backend, let me know your views on this.

def get_compile_backend(self, backend=None):
    supported_backends = torch._dynamo.list_backends()
    if backend is None:
        return preferred_backend
    elif backend in supported_backends:
        return backend
    else:
        raise ValueError(f"{backend} not supported by {self.device_name()}. Supported Backends are {supported_backends }") 

`

I agree on this, I think the best option if the user specifies a backend that is not supported is to fail immediately with a meaningful error message.

And I think having it be implicit except if they override in ds_config or via set_backend() would then take precedence.

@delock
Copy link
Collaborator

delock commented Mar 21, 2024

Agree that if user explicitly specifiy backend then should fail immedately. Also leaving backend undefined as indication for default value is a sound behavior.

vshekhawat-hlab and others added 3 commits March 21, 2024 11:28
Refactored the get_compile_backend method to handle user backend selection
 more efficiently. If no backend is specified, it now defaults to accelerator
preferred backend. Additionally, improved error handling for unsupported backends
 by providing informative error messages.
@vshekhawat-hlab
Copy link
Contributor Author

@tjruwase , @loadams , @delock ,

Can you please review the change. Is the current change is okay?

@loadams
Copy link
Contributor

loadams commented Apr 15, 2024

@tjruwase , @loadams , @delock ,

Can you please review the change. Is the current change is okay?

Thanks @vshekhawat-hlab, we will work on reviewing this.

@loadams loadams requested review from tohtana and umchand April 15, 2024 20:37
@tjruwase
Copy link
Contributor

@vshekhawat-hlab, apologies for the delay. Based on all the feedback, the main change needed in my mind is to split into a get_compile_backend() and set_compile_backend(), so that the get_ API can be side effect-free.

  1. get_compile_backend() should return the current backend of the accelerator.
  2. Each accelerator can initialize the current backend as appropriate, probably to the default/preferred value.
  3. set_compile_backend() allows users to modify the backend. It should perform the necessary sanity checks, including those performed by current get_ API

@vshekhawat-hlab, @tohtana, @umchand, @delock, @loadams, I will appreciate your thoughts on the above.

@loadams
Copy link
Contributor

loadams commented Apr 23, 2024

@vshekhawat-hlab, apologies for the delay. Based on all the feedback, the main change needed in my mind is to split into a get_compile_backend() and set_compile_backend(), so that the get_ API can be side effect-free.

  1. get_compile_backend() should return the current backend of the accelerator.
  2. Each accelerator can initialize the current backend as appropriate, probably to the default/preferred value.
  3. set_compile_backend() allows users to modify the backend. It should perform the necessary sanity checks, including those performed by current get_ API

@vshekhawat-hlab, @tohtana, @umchand, @delock, @loadams, I will appreciate your thoughts on the above.

I like that approach, I think it makes the most sense to match existing APIs and provide default behavior and a normal way to override it.

@vshekhawat-hlab
Copy link
Contributor Author

vshekhawat-hlab commented Apr 23, 2024

Hi @tjruwase,

Agree with comment.

Couple of questions:

  • In set API, user has to pass backend, None args can't be allowed here. Right?
  • Do you see any more checks that can be added in set API?

Code snippet with get and set API.

    def get_compile_backend(self):
        return self.compile_backend

    def set_compile_backend(self, backend):
        supported_backends = torch._dynamo.list_backends()
        if backend in supported_backends:
            self.compile_backend = backend
        else:
            raise ValueError(
                f"{backend} not supported by {self.device_name()}. Supported Backends are {supported_backends}")

@tjruwase
Copy link
Contributor

  • In set API, user has to pass backend, None args can't be allowed here. Right?

Yes, it makes sense to reject None backend.

@tjruwase
Copy link
Contributor

@vshekhawat-hlab, the proposed snippet looks good to me. I suspect that individual accelerators might later apply optimizations such as caching the supported backends list in the constructor. But your current proposal is a simple and clean start. Thanks for helping.

@vshekhawat-hlab vshekhawat-hlab changed the title Added get_compile_backend API in accelerator. Add getter and setter methods for compile_backend across accelerators. Apr 24, 2024
@vshekhawat-hlab
Copy link
Contributor Author

@vshekhawat-hlab, the proposed snippet looks good to me. I suspect that individual accelerators might later apply optimizations such as caching the supported backends list in the constructor. But your current proposal is a simple and clean start. Thanks for helping.

Updated the PR as discussed. Please review.

@loadams loadams added this pull request to the merge queue Apr 24, 2024
Merged via the queue into microsoft:master with commit fa8458b Apr 24, 2024
15 checks passed
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
microsoft#5299)

Add getter and setter methods for `compile_backend` across accelerators,
which provide a mechanism to retrieve the compile backend. These APIs
handle user-defined backend selection and raise a `ValueError` with
informative error messages for unsupported backends.

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024
microsoft#5299)

Add getter and setter methods for `compile_backend` across accelerators,
which provide a mechanism to retrieve the compile backend. These APIs
handle user-defined backend selection and raise a `ValueError` with
informative error messages for unsupported backends.

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants