-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/MLX5: various optimizations #1012
base: master
Are you sure you want to change the base?
TL/MLX5: various optimizations #1012
Conversation
ucc_tl_mlx5_alltoall_t *a2a = team->a2a; | ||
int node_size = a2a->node.sbgp->group_size; | ||
int net_size = a2a->net.sbgp->group_size; | ||
int op_msgsize = node_size * a2a->max_msg_size * UCC_TL_TEAM_SIZE(team) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
align
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here git-clang-format
reverts to this form, which I also find more readable
int block_msgsize = block_h * block_w * task->alltoall.msg_size; | ||
ucc_status_t status = UCC_OK; | ||
int node_grid_w = node_size / block_w; | ||
int node_nbr_blocks = (node_size * node_size) / (block_h * block_w); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
align
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here git-clang-format
revert to this form, which I also find more readable
Can one of the admins verify this patch? |
TL/MLX5: add npolls cfg for FANIN TL/MLX5: knomial fanin TL/MLX5: add prints and profile events TL/MLX5: remove debug prints
32d0718
to
43dd1d7
Compare
tiny bit more robust print blocks dimensions fully working configurable batch_size, serialization, and pollings
clean and working TL/MLX5: add more config for block dimensions force longer by default
lintrunner cleaning
43dd1d7
to
e12410c
Compare
e12410c
to
8312300
Compare
What
This PR contains various optimizations for TL/MLX5/a2a. In order of importance/relevance:
We might want to merge this PR as is, or to divide it into several smaller ones. But this branch is at least a pointer for a working version, that can be used as is for performance experimentation.
TODO:
One important optimization that is yet to be implemented is to support using several NICs. So far, our algorithm only uses one NIC.
cc @lappazos @x41lakazam