Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support gdr malloc&free #8

Merged
merged 1 commit into from
Dec 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ USE_ILUVATAR_COREX ?= 0
USE_CAMBRICON ?= 0

# set to empty if not provided
DEVICE_HOME ?=
CCL_HOME ?=
DEVICE_HOME ?=
CCL_HOME ?=

ifeq ($(strip $(DEVICE_HOME)),)
ifeq ($(USE_NVIDIA), 1)
DEVICE_HOME = /usr/local/cuda
else ifeq ($(USE_ILUVATAR_COREX), 1)
DEVICE_HOME = /usr/local/corex
else ifeq ($(USE_CAMBRICON), 1)
DEVICE_HOME = /torch/neuware_home
DEVICE_HOME = $(NEUWARE_HOME)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEUWARE_HOME是系统默认环境变量?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlu的docker中都默认设置了这个环境变量,不需要用户自己设置。

else
DEVICE_HOME = /usr/local/cuda
endif
Expand All @@ -27,7 +27,7 @@ ifeq ($(strip $(CCL_HOME)),)
else ifeq ($(USE_ILUVATAR_COREX), 1)
CCL_HOME = /usr/local/corex
else ifeq ($(USE_CAMBRICON), 1)
CCL_HOME = /torch/neuware_home
CCL_HOME = $(NEUWARE_HOME)
else
CCL_HOME = /usr/local/nccl/build
endif
Expand Down
7 changes: 5 additions & 2 deletions flagcx/adaptor/cncl_adaptor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,11 @@ flagcxResult_t cnclAdaptorCommInitRank(flagcxHomoComm_t *comm, int nranks, flagc
if (*comm == NULL) {
flagcxCalloc(comm, 1);
}
return (flagcxResult_t)cnclInitComms(&(*comm)->base, 1/*num_comm*/, &rank/*dev_list*/,
&rank/*rank_list*/, nranks, (cnclCliqueId *)commId);
unsigned int device_count = 0;
DEVCHECK(cnrtGetDeviceCount(&device_count));
int dev_id = rank % device_count;
return (flagcxResult_t)c2f_ret_map[cnclInitComms(&(*comm)->base, 1/*num_comm*/, &dev_id/*dev_list*/,
&rank/*rank_list*/, nranks, (cnclCliqueId *)commId)];
}

//TODO: unsupported
Expand Down
12 changes: 9 additions & 3 deletions flagcx/adaptor/mlu_adaptor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ flagcxResult_t mluAdaptorDeviceMemcpy(void *dst, void *src, size_t size, flagcxM
if (stream == NULL) {
DEVCHECK(cnrtMemcpy(dst, src, size, memcpy_type_map[type]));
} else {
DEVCHECK(cnrtMemcpyAsync_V3(dst, src, size, stream->base, memcpy_type_map[type]));
DEVCHECK(cnrtMemcpyAsync_V2(dst, src, size, stream->base, memcpy_type_map[type]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里后续会升级导致兼容问题么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个貌似没有更好的办法,只能锚定cambricon的某个版本,后续更新cambricon docker后再升级接口。

}
return flagcxSuccess;
}
Expand Down Expand Up @@ -73,12 +73,18 @@ flagcxResult_t mluAdaptorGetVendor(char *vendor) {
}

flagcxResult_t mluAdaptorGdrMemAlloc(void **ptr, size_t size, void *memHandle) {
// TODO: Implement GDR memory allocation
if (ptr == NULL) {
return flagcxInvalidArgument;
}
DEVCHECK(cnrtMalloc(ptr, size));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接用cnrtMalloc就可以了?不需要再进行额外操作?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参照cuda代码,添加了防呆。mlu申请gdr内存时暂时没有设置属性的函数,因此暂不需要其他其他操作。

return flagcxSuccess;
}

flagcxResult_t mluAdaptorGdrMemFree(void *ptr, void *memHandle) {
// TODO: Implement GDR memory free
if (ptr == NULL) {
return flagcxSuccess;
}
DEVCHECK(cnrtFree(ptr));
return flagcxSuccess;
}

Expand Down