-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(gpu): avoid synchronizations in the keybundle #1505
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @guillermo-oyarzun! This PR looks good to me, thanks a lot! I just have a small comment about a possible simplification, but very minor. I don't fully understand why the __syncthreads() at line 119 is needed, you wrote it's to avoid having the monomial degrees overwritten but I don't understand how they could be overwritten at that stage.
bootstrapping_key, g, rev_lwe_iteration, glwe_id, level_id, | ||
grouping_factor, 2 * polynomial_size, glwe_dimension, level_count); | ||
const Torus *bsk_poly = bsk_slice + poly_id * params::degree; | ||
int offset = get_start_ith_ggsw_offset(2 * polynomial_size, glwe_dimension, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you call this function with 2*polynomial_size, and in the function polynomial_size / 2 is used to return the result: this could be simplified I think.
get_ith_mask_kth_block
is used both in the classical PBS and the multi-bit PBS: with the classical PBS the bootstrap key is passed in the Fourier domain, and our FFT compresses the polynomial in the standard domain to a polynomial of polynomial_size / 2 size in the Fourier domain. On the other hand, the multi-bit PBS takes the bootstrap key in the standard domain as input, hence why we call get_multi_bit_ith_lwe_gth_group_kth_block
with 2 * polynomial_size
for the multi-bit PBS. This logic could be made simpler btw, it's quite convoluted at the moment: it's inherited from the time when we only had the classical PBS to deal with. We added the multi-bit logic on top of it without refactoring this. Not to be done in this PR though, should be something separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do the small change of removing the redundant multiplication and division, and then we think about the rest in other PR, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sounds good, I'll open an issue to refactor get_ith_mask_kth_block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backends/tfhe-cuda-backend/cuda/src/pbs/programmable_bootstrap_multibit.cuh
Outdated
Show resolved
Hide resolved
04d2646
to
d651f68
Compare
d651f68
to
39dd8ad
Compare
closes: #667
PR content/description
Check-list: