Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to generate val consts #2900

Merged
merged 4 commits into from
Sep 10, 2024
Merged

Conversation

Priya2698
Copy link
Collaborator

Adds script to reproduce the computation that was used when generating the validation tolerances:

struct ValidationConstants {
// Tolerances generated from randn + add + sum fusion
// compared against double precision
std::array<std::array<double, 2>, 20> sum_tolerances_float = {
{{4, 1.68222e-06}, {8, 2.23704e-06}, {16, 2.95788e-06},
{32, 4.4778e-06}, {64, 6.75395e-06}, {128, 8.57934e-06},
{256, 1.30594e-05}, {512, 2.19122e-05}, {1024, 3.3451e-05},
{2048, 5.78476e-05}, {4096, 0.000108292}, {8192, 0.00012207},
{16384, 0.000136882}, {32768, 0.000248561}, {65536, 0.000407594},
{131072, 0.000500901}, {262144, 0.000923019}, {524288, 0.00156909},
{1048576, 0.00223107}, {2097152, 0.00343043}}};
// Tolerances generated from randn + add + sum fusion
// compared against double precision
std::array<std::array<double, 2>, 20> sum_tolerances_half = {
{{4, 0.00390625}, {8, 0.0078125}, {16, 0.0078125},
{32, 0.0155334}, {64, 0.0156269}, {128, 0.0312042},
{256, 0.0312548}, {512, 0.0619979}, {1024, 0.0625103},
{2048, 0.124686}, {4096, 0.12501}, {8192, 0.24945},
{16384, 0.250049}, {32768, 0.498946}, {65536, 0.500071},
{131072, 0.985087}, {262144, 1.00006}, {524288, 1.99234},
{1048576, 2.00032}, {2097152, 3.99073}}};

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tools/generate_validation_tolerances.py Outdated Show resolved Hide resolved
tools/generate_validation_tolerances.py Outdated Show resolved Hide resolved
tools/generate_validation_tolerances.py Outdated Show resolved Hide resolved
import torch
from datetime import datetime

sizes = [2**i for i in range(2, 22)] # {4, 2097152}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: IIRC, the reduction size is computed with respect to fusion inputs, so it tends to grow very fast. For a two-layer MLP (#2905), the reduction size of the second linear output is already 4 * hidden_size * hidden_size, close to 2M. I'd imagine the whole transformer block will generate an even larger reduction size, much larger than the max size specified here. What's the implication of getTolerance querying a size larger than the max?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would use twice the max error in the list(

} else {
// If we hit the end of the list, return twice the max error we
// measured
abs_tol = sum_tolerance_entry[sum_tolerance_entry.size() - 1][1] * 2.;
}
). So far, for the unit tests, these existing sizes have been sufficient.
If we have very few examples with larger (~2M) reduction sizes, it would be simpler to set a threshold manually. If we have several such examples, we may benefit from having more cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have several such examples, we may benefit from having more cases.

In case you are looking for concrete use cases, I believe https://github.com/NVIDIA/Fuser/blob/main/tests/cpp/test_multidevice_transformer.cpp#L569-L578 will give you reduction sizes much larger than 2M, when you change validation to use testValidate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other issue with MLP is we have compounding error from the running consecutive ops

@Priya2698 Priya2698 merged commit 694ec59 into main Sep 10, 2024
5 checks passed
@Priya2698 Priya2698 deleted the pm/upload_gen_const_script branch September 10, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants