Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified successfully, but not displayed on ALLTOALL? #25

Open
2 tasks done
yiguCM opened this issue Nov 15, 2024 · 3 comments
Open
2 tasks done

Modified successfully, but not displayed on ALLTOALL? #25

yiguCM opened this issue Nov 15, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@yiguCM
Copy link

yiguCM commented Nov 15, 2024

NVIDIA Open GPU Kernel Modules Version

550.90.07-p2p

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 22.04.5 LTS

Kernel Release

Linux 6.8.0-47-generic NVIDIA#47~22.04.1-Ubuntu

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 4090 ~ GPU 7: NVIDIA GeForce RTX 4090

Describe the bug

Hi, I successfully modified BAR1 and enabled the P2P function, which increased the performance of P2PTEST, but when I performed the NCCL test, I found that the performance of ALL TO ALL scenarios decreased. Why is this?
Is it because the BAR1 register is too large? Can we only open P2P and not modify BAR1?
image
image

To Reproduce

/nccl-tests/build/alltoall_perf -b 8 -e 8G -f 2 -g 8

Bug Incidence

Always

nvidia-bug-report.log.gz

image
image
image

More Info

No response

@yiguCM yiguCM added the bug Something isn't working label Nov 15, 2024
@xiaobuding-cx
Copy link

@yiguCM Hi, I encountered the same issue as you. Have you resolved it?

@yiguCM
Copy link
Author

yiguCM commented Dec 23, 2024 via email

@mylesgoose
Copy link

Seems to be solved with multiple gpu. On the epyc cpu. https://github.com/aikitoria/open-gpu-kernel-modules
Screenshot_20241226_080252_Brave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants