Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow linear to take a >2D weight and a >1D bias. #3073

Merged
merged 7 commits into from
Oct 4, 2024
Merged

Conversation

wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Oct 1, 2024

As long as the extra dimensions are DID-parallel.

This allows a distributed transformer layer to use linear (instead of matmul+add) for speed and will simplify the pending #3045.

To avoid ambiguity, this PR also removes the support for 1D weight and 0D bias; otherwise, it's unclear whether a 2D weight is one device dimension plus a non-device or two non-devices. This support can be added back by changing the thunder-to-nvFuser bridge to convert a 1D/0D linear to unsqueeze followed by a 2D/1D linear followed by a squeeze.

@wujingyue
Copy link
Collaborator Author

!build

csrc/ops/composite.cpp Outdated Show resolved Hide resolved
@wujingyue wujingyue force-pushed the wjy/prefer branch 2 times, most recently from aa7b681 to ad951e0 Compare October 2, 2024 20:13
Base automatically changed from wjy/prefer to main October 2, 2024 21:29
@wujingyue
Copy link
Collaborator Author

!build

@wujingyue
Copy link
Collaborator Author

!build

@wujingyue wujingyue force-pushed the wjy/three branch 3 times, most recently from ee5b81d to 96ed5e8 Compare October 3, 2024 01:44
@wujingyue
Copy link
Collaborator Author

!build

@wujingyue wujingyue force-pushed the wjy/three branch 2 times, most recently from 3d060ab to c61cdc1 Compare October 3, 2024 04:56
@wujingyue wujingyue marked this pull request as ready for review October 3, 2024 05:04
@wujingyue wujingyue requested review from Priya2698 and cowanmeg and removed request for cowanmeg October 3, 2024 05:04
@wujingyue
Copy link
Collaborator Author

!build

@wujingyue
Copy link
Collaborator Author

!build

Copy link
Collaborator

@cowanmeg cowanmeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just some little things about adding comments

tests/python/opinfo_input_generators.py Show resolved Hide resolved
csrc/ir/nodes.cpp Show resolved Hide resolved
@wujingyue wujingyue added the enhancement New feature or request label Oct 4, 2024
@wujingyue
Copy link
Collaborator Author

!build

@wujingyue
Copy link
Collaborator Author

!build

@wujingyue wujingyue merged commit 9d39b6c into main Oct 4, 2024
10 of 11 checks passed
@wujingyue wujingyue deleted the wjy/three branch October 4, 2024 23:05
wujingyue added a commit that referenced this pull request Oct 5, 2024
This PR fixes a bug introduced in #3073. This bug causes
`nvFuser.Tensor` to have a different rank than the corresponding
`TensorView`. This didn't trigger any test failure until I wrote a more
complicated test that `slice`s the output of a linear.

Question for @rdspring1 and/or @kevinstephano: shouldn't this bug be
caught earlier? I guess when the Python frontend finalizes the
definition it should have checked the output `nvFuser.Tensor`s are
consistent with the output `TensorView`s. Wdyt?
wujingyue added a commit that referenced this pull request Oct 7, 2024
Similar to #3073, `sdpfa_fwd` shouldn't assume DIDs are available at
definition time. Instead, treat extra preceding dimensions as batch at
definition time and check they are device parallel at evaluation time.

This is required to land #3115.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants