Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Ir Lowering: foundation #3535

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

samnordmann
Copy link
Collaborator

@samnordmann samnordmann commented Dec 5, 2024

What

Refactor Host Ir lowering. https://jirasw.nvidia.com/browse/NVFUSER-105
Dependent on:

Why

How

  • move files multidevice/lower_communication to host_ir/lower
  • Move most of MultiDeviceExecutor's constructor logic to host_ir/lower

I divided in incremental commits to ease the review. The relevant commits start at move HostIr lowering to host ir folder and special class, the commits before that belong to #3524

@samnordmann
Copy link
Collaborator Author

!test --diff

@samnordmann samnordmann changed the title Host ir lowering first refactor Host Ir Lowering: foundation Dec 5, 2024
@samnordmann
Copy link
Collaborator Author

!test

@samnordmann
Copy link
Collaborator Author

!test

@samnordmann
Copy link
Collaborator Author

@wujingyue the CI failures seen on the previous commits seem to be caused by a too small tolerance rate on single device test. It doesn't seem related to the present PR
https://gitlab-master.nvidia.com/dl/pytorch/fuser-gh-mirror/-/jobs/127287825

Copy link
Collaborator

@wujingyue wujingyue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

csrc/host_ir/lower.cpp Show resolved Hide resolved
csrc/host_ir/lower.cpp Show resolved Hide resolved
auto& mesh = tv->getDeviceMesh().vector();
std::copy(mesh.begin(), mesh.end(), std::inserter(ret, ret.end()));
} else {
ret.insert(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Collaborator Author

@samnordmann samnordmann Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

involvedDevices is now called from HostIrEvaluator which can take in single device fusion where TVs don't have device mesh. In a single device Fusion, it is natural that this function returns device 0

tests/cpp/test_resharding.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@samnordmann samnordmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you have further questions or comments

auto& mesh = tv->getDeviceMesh().vector();
std::copy(mesh.begin(), mesh.end(), std::inserter(ret, ret.end()));
} else {
ret.insert(0);
Copy link
Collaborator Author

@samnordmann samnordmann Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

involvedDevices is now called from HostIrEvaluator which can take in single device fusion where TVs don't have device mesh. In a single device Fusion, it is natural that this function returns device 0

csrc/host_ir/lower.cpp Show resolved Hide resolved
csrc/host_ir/lower.cpp Show resolved Hide resolved
tests/cpp/test_multidevice_pipeline.cpp Outdated Show resolved Hide resolved
@samnordmann
Copy link
Collaborator Author

!test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants