forked from openucx/ucx
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete topo map #41
Open
shizhibao
wants to merge
162
commits into
kunpengcompute:huawei
Choose a base branch
from
shizhibao:optimize_topo_module
base: huawei
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Delete topo map #41
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(cherry picked from commit 48acfc1)
(cherry picked from commit cda835b)
…_check UCT: Disable the cm transport if the ib_ucm.ko module is not loaded - v1.6.x
UCX won't pass host memory to rocm_ipc, so remove the CMA code in IPC which is used to support host memory. Signed-off-by: Qiang Yu <[email protected]>
UCT/ROCM/IPC: remove unused CMA code
dlerror(), which is called from dynamic module loader, is calling asprintf() which is allocating memory. Under valgrind, reloc hooks are not catching intra-glibc calls to malloc, so need special hook for asprintf().
UCM: Add reloc hooks for (v)asprintf - v1.6.x
Since now libucm.so is not linked with "nodelete", it may be unloaded before it installs malloc hooks. In this case we would not have any environment strings, so no need to release. If it did install hooks, it means the library was aready reopened with RTLD_NODELETE, so no need the strings anyway. This fixes a case where libucm.so was unloaded before hooks were installed, and called clearenv() which removed all environment variables from the program, leading to segfault. Since we are not clearing the environment now, we also cannot release the allocated strings. This is not reported as memory leak by Valgrind, since they are still reachable from the global array of environ strings.
GTEST/JENKINS: Fix CUDA testing - v1.6.x
…-v1.6.x UCM: Avoid releasing environment strings in library destructor - v1.6.x
…et_orig_v1_6_x UCM: Make ucm_reloc_get_orig() a static function - v1.6.x
(cherry picked from commit e56425c)
(cherry picked from commit 086f664)
Also, unite with purge code and add unit test. (cherry picked from commit 5dfef6d)
- in some cases reply to FC_HARD_REQ could not be sent immediately due to lack of HW resources, in this case request is pushed into arbiter. But in case if peer is falled into same situation - it could cause deadlock. - fix: add FC grand request with high priority to send it out-of-order (cherry picked from commit 0e3f71a)
(cherry picked from commit fb78062)
(cherry picked from commit c4d448d)
…r-head-v1.6 UCS/ARBITER: Add function to push element to group head, fixed RC FC deadlock - v1.6
…_v1.6.x UCT/DC: Fix OOO support for TM DCIs - v1.6.x
…nd-v1.6.x UCM/TEST: Fix handling of shmat(SHM_REMAP|SHM_RND) - v1.6.x
nvidia-driver-devel replaced xorg-x11-drv-nvidia-devel, and we can have boolean OR expressions only with rpm >= v4.13 (on rh7.4 it's v4.11).
…ild-deps-v1.6.x SPEC: Remove CUDA BuildRequires, since package name is not consistent - v1.6.x
rpmbuild includes them in %postun scriptlet, which causes issues with RPM uninstall step.
fix no memory fault
Fix UD skb reuse on HNS RoCE
solve the init ep lane fault
Fix UT test for discontig datatype
Test case set for HMPI
Create function_test.sh for automatic function test
Add communicator operation test cases
upgrade ucx to 1.9.0
FEAT: UPGRADE UCX TO 1.9.0
Fixed the problem of long running time of some test scripts
fix bug for big messages of TCP
Add hypertest cases
Modify GTEST to match the refactored code
Reduce the memory of the step structure
ChenQiangFYQ
force-pushed
the
huawei
branch
from
September 16, 2021 11:17
7c1acbb
to
fc4f5b7
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.