[Quantization/Parameter] WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes #11622

cennn · 2024-12-30T09:12:51Z

FIX: issue-10612, pull-10609

Problem:

Parameter subclasses are not fully compatible with the following aspects:
torch.compile: There are compatibility issues when using parameter subclasses in the context of torch.compile.
offloadedTensor: Parameter subclasses do not work well with tensor subclasses either.

Solution:

Remove all parameter subclasses and instead add the necessary properties and functions directly onto the raw nn.Parameter to achieve the required characteristics for quantization parameters. This approach mainly involves rewriting the code that defines and inherits parameter subclasses in the following way, and it requires minimal modifications to the parts of the code that call these parameter subclasses.

Example Code Changes:

Original Definition:

class PackedvLLMParameter(ModelWeightParameter):
    def __init__(self,
                 packed_factor: Union[int, Fraction],
                 packed_dim: int,
                 marlin_tile_size: Optional[int] = None,
                 **kwargs):
        self._packed_factor = packed_factor
        self._packed_dim = packed_dim
        self._marlin_tile_size = marlin_tile_size
        super().__init__(**kwargs)

    @property
    def packed_dim(self):
        return self._packed_dim

    @property
    def packed_factor(self):
        return self._packed_factor

    @property
    def marlin_tile_size(self):
        return self._marlin_tile_size

    def adjust_shard_indexes_for_packing(self, shard_size, shard_offset):
        return _adjust_shard_indexes_for_packing(
            shard_size=shard_size,
            shard_offset=shard_offset,
            packed_factor=self.packed_factor,
            marlin_tile_size=self.marlin_tile_size)

New Definition:

def PackedvLLMParameter(data: torch.Tensor, **kwargs) -> Parameter:
    param = Parameter(data, requires_grad=False)
    wrap_base_vllm_parameter(param, **kwargs)
    wrap_column_vllm_parameter(param, **kwargs)
    wrap_row_vllm_parameter(param, **kwargs)
    wrap_packed_vllm_parameter(param, **kwargs)
    return param


def wrap_packed_vllm_parameter(param: Parameter,
                               packed_factor: Union[int, Fraction],
                               packed_dim: int,
                               marlin_tile_size: Optional[int] = None,
                               **kwargs) -> None:
    def adjust_shard_indexes_for_packing(shard_size, shard_offset):
        return _adjust_shard_indexes_for_packing(
            shard_size=shard_size,
            shard_offset=shard_offset,
            packed_factor=packed_factor,
            marlin_tile_size=marlin_tile_size)

    param.packed_factor = packed_factor
    param.packed_dim = packed_dim
    param.marlin_tile_size = marlin_tile_size
    param.adjust_shard_indexes_for_packing = adjust_shard_indexes_for_packing
    add_param_feature(param, Features.Packed)

Unchanged Call Sites:
The parts of the code that call these parameter subclasses do not need to be modified. For example:

qweight = PackedvLLMParameter(
    data=torch.empty(
        input_size_per_partition // self.quant_config.pack_factor,
        output_size_per_partition,
        dtype=torch.int32,
    ),
    input_dim=0,
    output_dim=1,
    packed_dim=0,
    packed_factor=self.quant_config.pack_factor,
    weight_loader=weight_loader)

Verified Tests:

vllm serve Qwen/Qwen2.5-0.5B-Instruct --quantization fp8
vllm serve Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4
vllm serve Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4 --quantization gptq
vllm serve Qwen/Qwen2-1.5B-Instruct-AWQ
vllm serve Qwen/Qwen2-1.5B-Instruct-AWQ --quantization awq
vllm serve nm-testing/tinyllama-oneshot-w4a16-channel-v2
vllm serve nm-testing/llama7b-one-shot-2_4-w4a16-marlin24-t
vllm serve nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change

…to add functionalities

github-actions · 2024-12-30T09:13:01Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-12-30T09:36:38Z

As discussed, please fix the format.

youkaichao · 2024-12-30T12:48:38Z

@dsikka can you please take a look?

dsikka

Please keep the docstrings and typing added for all the original parameters

vllm/model_executor/parameter.py

mergify · 2025-01-09T03:45:28Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @cennn.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

dsikka · 2025-01-14T04:01:20Z

Will rereview soon

youkaichao · 2025-01-14T05:23:34Z

@dsikka upon reflection, I find this wrap_row_vllm_parameter is quite confusing and tricky for newcomers. I'm thinking about another approach, where we keep the current class, but does not inherit nn.Parameter.

previous code:

            weight_g_idx = RowvLLMParameter(data=torch.empty(
                input_size_per_partition,
                dtype=torch.int32,
            ),
                                            input_dim=0,
                                            weight_loader=weight_loader)

intended:

weight_g_idx = nn.Parameter(torch.empty(
                input_size_per_partition,
                dtype=torch.int32,
            ))
weight_g_idx.vllm_parameter = RowvLLMParameter(data=weight_g_idx, input_dim=0, weight_loader=weight_loader)

this way, the only change is, RowvLLMParameter is not subclass of nn.Parameter.

does this sound better to you?

dsikka · 2025-01-16T21:41:28Z

@dsikka upon reflection, I find this wrap_row_vllm_parameter is quite confusing and tricky for newcomers. I'm thinking about another approach, where we keep the current class, but does not inherit nn.Parameter.

previous code:

            weight_g_idx = RowvLLMParameter(data=torch.empty(
                input_size_per_partition,
                dtype=torch.int32,
            ),
                                            input_dim=0,
                                            weight_loader=weight_loader)

intended:

weight_g_idx = nn.Parameter(torch.empty(
                input_size_per_partition,
                dtype=torch.int32,
            ))
weight_g_idx.vllm_parameter = RowvLLMParameter(data=weight_g_idx, input_dim=0, weight_loader=weight_loader)

this way, the only change is, RowvLLMParameter is not subclass of nn.Parameter.

does this sound better to you?

This looks better.
Would RowvLLMParameter inherit from anything or just be stand alone?
You would need to update how the weight loader is called within the weight_loader_v2 methods in linear.py

…tization

cennn · 2025-01-18T03:46:49Z

This looks better. Would RowvLLMParameter inherit from anything or just be stand alone? You would need to update how the weight loader is called within the weight_loader_v2 methods in linear.py

@dsikka All quantization class inheritances are kept, except that the BasevLLMParameter does not inherit from nn.Parameter. Here's the PR implemented in this new way (it has already been tested). It's convenient for you to compare the two implementations. You can review it to see if this approach is clearer.

#12158

replace parameter subclasses with raw nn.Parameter by using wrap_xxx …

eb6b394

…to add functionalities

cennn changed the title ~~Replace parameter subclasses with raw nn.Parameter with additional attributes~~ WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes Dec 30, 2024

DarkLight1337 requested a review from youkaichao December 30, 2024 09:13

cennn changed the title ~~WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes~~ [Quantization/Parameter] WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes Dec 30, 2024

bash format.sh

fbb425c

youkaichao mentioned this pull request Jan 8, 2025

[Bugfix] Add checks for LoRA and CPU offload #11810

Merged

dsikka reviewed Jan 8, 2025

View reviewed changes

mergify bot added the needs-rebase label Jan 9, 2025

Merge remote-tracking branch 'upstream/main' into quantization

fe511da

mergify bot removed the needs-rebase label Jan 13, 2025

cennn added 4 commits January 13, 2025 05:21

maintain docstring && add type hint && descriptive feature name

6cbcddf

fix format

1e8af5d

fix isort format

3dcc3ef

fix parameter.py __all__

e6fe7be

cennn added 2 commits January 17, 2025 08:08

Merge branch 'main' of https://github.com/vllm-project/vllm into quan…

2f5c55c

…tization

add _is_1d_and_scalar in master

ec1f112

cennn mentioned this pull request Jan 17, 2025

[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution #12158

Open

fix format

63eba04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization/Parameter] WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes #11622

[Quantization/Parameter] WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes #11622

cennn commented Dec 30, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 30, 2024

youkaichao commented Dec 30, 2024

youkaichao commented Dec 30, 2024

dsikka left a comment

mergify bot commented Jan 9, 2025

dsikka commented Jan 14, 2025

youkaichao commented Jan 14, 2025

dsikka commented Jan 16, 2025

cennn commented Jan 18, 2025

[Quantization/Parameter] WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes #11622

Are you sure you want to change the base?

[Quantization/Parameter] WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes #11622

Conversation

cennn commented Dec 30, 2024 • edited by github-actions bot Loading

FIX: issue-10612, pull-10609

Problem:

Solution:

Example Code Changes:

Verified Tests:

github-actions bot commented Dec 30, 2024

youkaichao commented Dec 30, 2024

youkaichao commented Dec 30, 2024

dsikka left a comment

Choose a reason for hiding this comment

mergify bot commented Jan 9, 2025

dsikka commented Jan 14, 2025

youkaichao commented Jan 14, 2025

dsikka commented Jan 16, 2025

cennn commented Jan 18, 2025

cennn commented Dec 30, 2024 •

edited by github-actions bot

Loading