-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to generate val consts #2900
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
import torch | ||
from datetime import datetime | ||
|
||
sizes = [2**i for i in range(2, 22)] # {4, 2097152} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: IIRC, the reduction size is computed with respect to fusion inputs, so it tends to grow very fast. For a two-layer MLP (#2905), the reduction size of the second linear output is already 4 * hidden_size * hidden_size
, close to 2M. I'd imagine the whole transformer block will generate an even larger reduction size, much larger than the max size specified here. What's the implication of getTolerance querying a size larger than the max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would use twice the max error in the list(
Fuser/csrc/validator_utils.cpp
Lines 134 to 138 in 1158543
} else { | |
// If we hit the end of the list, return twice the max error we | |
// measured | |
abs_tol = sum_tolerance_entry[sum_tolerance_entry.size() - 1][1] * 2.; | |
} |
If we have very few examples with larger (~2M) reduction sizes, it would be simpler to set a threshold manually. If we have several such examples, we may benefit from having more cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have several such examples, we may benefit from having more cases.
In case you are looking for concrete use cases, I believe https://github.com/NVIDIA/Fuser/blob/main/tests/cpp/test_multidevice_transformer.cpp#L569-L578 will give you reduction sizes much larger than 2M, when you change validation to use testValidate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other issue with MLP is we have compounding error from the running consecutive ops
Adds script to reproduce the computation that was used when generating the validation tolerances:
Fuser/csrc/validator_utils.h
Lines 25 to 46 in 892b7ac