-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast implementation of Select for most cases on CPU #687
base: master
Are you sure you want to change the base?
Conversation
Fixes #684 Measured: enes.student.tiny11 xzcat sources.shuf.xz |head -n 10000 var (Cascade Lake) single core based on intgemm_reintegrated_computestats branch Before Total time: 66.69077s wall After Total time: 61.20206s wall
@snukky Cool, the checks are already doing their job. |
idxBase += idxStride; | ||
} | ||
}); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do an if {} else {}
here instead of the empty return. Could you also mention that the code below is execute if the above conditions are not met and that this code is generic but slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's also add a positive and a negative example where each branch is used? There are a couple of them in tests_operators, I think at least one will still go to the default version?
const functional::Shape &inShape, | ||
const functional::Shape &idxShape, | ||
int axisCPU, | ||
Backend backend) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a different name than Backend
. We use backend for CPU/GPU device things, had me confused for a sec.
Nice change. You mentioned in the issue that this improves things for your tiny models, but not for the WNGT models? Can you briefly say here what the difference is? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change indeed. LGTM!
Ping @kpu |
@kpu if you address the review comments this is ready to merge. |
I know, will fix it this week. |
Ping :P |
@kpu Beep bop boop |
The existing implementation of Select for CPU is very slow.
Reimplemented the
Select
function for CPU usingstd::copy
for the case when stuff to copy after the axis is contiguous, which it always is in practice.Fixes #684
Measured:
enes.student.tiny11
xzcat sources.shuf.xz |head -n 10000
var (Cascade Lake) single core
based on intgemm_reintegrated_computestats branch
Before Total time: 66.69077s wall
After Total time: 61.20206s wall
Testing
Ran Marian unit tests (which already includes a tesf tor Select)
Ran enes student on Linux with
cmake .. -DCMAKE_BUILD_TYPE=Debug -DCOMPILE_CUDA=OFF -DUSE_FBGEMM=ON -DUSE_SENTENCEPIECE=ON -DCOMPILE_TESTS=ON
Checklist