-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gpu): improve full propagation in sum and sub #1763
feat(gpu): improve full propagation in sum and sub #1763
Conversation
6b37fe7
to
44cb537
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @guillermo-oyarzun! Thanks a lot for this PR 🙏 Here comes a first review: I don't know the details of the implementation so it's hard for me to go through all the logic though. Maybe if you walk me through it it would help.
multi_gpu_alloc_lwe_async(streams, gpu_indexes, active_gpu_count, | ||
lwe_after_ks_vec, num_radix_blocks, | ||
params.small_lwe_dimension + 1); | ||
multi_gpu_alloc_many_lwe_async(streams, gpu_indexes, active_gpu_count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this function could be renamed: multi_gpu_alloc_lwe_many_lut_output_async
?
cudaStream_t const *streams, uint32_t const *gpu_indexes, | ||
uint32_t gpu_count, Torus *lwe_array, int_radix_params params, | ||
int_shifted_blocks_and_states_memory<Torus> *mem, void *const *bsks, | ||
Torus *const *ksks, uint32_t num_blocks, uint32_t lut_stride, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe num_blocks -> num_radix_blocks
tfhe/src/core_crypto/gpu/mod.rs
Outdated
message_modulus, | ||
); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to add this to core crypto, do we?
@@ -227,6 +285,18 @@ impl CudaServerKey { | |||
streams.synchronize(); | |||
} | |||
|
|||
pub fn unchecked_add_assign_with_packing<T: CudaIntegerRadixCiphertext>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have this entry point on the Rust side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is something needed for the signed overflowing add/sub, not actually tested in this PR, I could remove it and just included in the other PR we will have
/// | ||
/// - `streams` __must__ be synchronized to guarantee computation has finished, and inputs must | ||
/// not be dropped until streams is synchronized | ||
pub(crate) unsafe fn new_propagate_single_carry_assign_async<T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we name this one propagate_single_carry_assign_async
and remove the old one?
where | ||
T: CudaIntegerRadixCiphertext, | ||
{ | ||
self.propagate_fast_single_carry_assign_async(ct, streams, input_carry, requested_flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we keep only the fast version on the Rust side, and remove the old one?
3742e90
to
e5a75fe
Compare
f1ada0d
to
7d6a51f
Compare
7d6a51f
to
8e2b993
Compare
closes: please link all relevant issues
PR content/description
Check-list: