feat(gpu): improve full propagation in sum and sub #1763

guillermo-oyarzun · 2024-11-08T11:17:35Z

closes: please link all relevant issues

PR content/description

Check-list:

Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)
Relevant issues are marked as resolved/closed, related issues are linked in the description
Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

agnesLeroy

Hey @guillermo-oyarzun! Thanks a lot for this PR 🙏 Here comes a first review: I don't know the details of the implementation so it's hard for me to go through all the logic though. Maybe if you walk me through it it would help.

backends/tfhe-cuda-backend/cuda/include/integer/integer.h

backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h

agnesLeroy · 2024-11-08T14:16:50Z

backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h

+      multi_gpu_alloc_lwe_async(streams, gpu_indexes, active_gpu_count,
+                                lwe_after_ks_vec, num_radix_blocks,
+                                params.small_lwe_dimension + 1);
+      multi_gpu_alloc_many_lwe_async(streams, gpu_indexes, active_gpu_count,


Maybe this function could be renamed: multi_gpu_alloc_lwe_many_lut_output_async?

backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h

agnesLeroy · 2024-11-08T14:47:58Z

backends/tfhe-cuda-backend/cuda/src/integer/integer.cuh

+    cudaStream_t const *streams, uint32_t const *gpu_indexes,
+    uint32_t gpu_count, Torus *lwe_array, int_radix_params params,
+    int_shifted_blocks_and_states_memory<Torus> *mem, void *const *bsks,
+    Torus *const *ksks, uint32_t num_blocks, uint32_t lut_stride,


Maybe num_blocks -> num_radix_blocks

agnesLeroy · 2024-11-08T14:52:57Z

tfhe/src/core_crypto/gpu/mod.rs

+        message_modulus,
+    );
+}
+


I don't think we need to add this to core crypto, do we?

agnesLeroy · 2024-11-08T14:54:08Z

tfhe/src/integer/gpu/server_key/radix/add.rs

@@ -227,6 +285,18 @@ impl CudaServerKey {
        streams.synchronize();
    }

+    pub fn unchecked_add_assign_with_packing<T: CudaIntegerRadixCiphertext>(


Do we need to have this entry point on the Rust side?

it is something needed for the signed overflowing add/sub, not actually tested in this PR, I could remove it and just included in the other PR we will have

agnesLeroy · 2024-11-08T14:55:02Z

tfhe/src/integer/gpu/server_key/radix/mod.rs

+    ///
+    /// - `streams` __must__ be synchronized to guarantee computation has finished, and inputs must
+    ///   not be dropped until streams is synchronized
+    pub(crate) unsafe fn new_propagate_single_carry_assign_async<T>(


Couldn't we name this one propagate_single_carry_assign_async and remove the old one?

agnesLeroy · 2024-11-08T14:55:23Z

tfhe/src/integer/gpu/server_key/radix/mod.rs

+    where
+        T: CudaIntegerRadixCiphertext,
+    {
+        self.propagate_fast_single_carry_assign_async(ct, streams, input_carry, requested_flag)


Couldn't we keep only the fast version on the Rust side, and remove the old one?

guillermo-oyarzun requested review from agnesLeroy, tmontaigu and bbarbakadze November 8, 2024 11:17

guillermo-oyarzun self-assigned this Nov 8, 2024

cla-bot bot added the cla-signed label Nov 8, 2024

guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch from 6b37fe7 to 44cb537 Compare November 8, 2024 11:56

agnesLeroy reviewed Nov 8, 2024

View reviewed changes

guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch 15 times, most recently from 3742e90 to e5a75fe Compare November 17, 2024 17:08

guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch 3 times, most recently from f1ada0d to 7d6a51f Compare November 22, 2024 09:15

feat(gpu): improve full propagation in sum and sub

8e2b993

guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch from 7d6a51f to 8e2b993 Compare November 22, 2024 17:35

agnesLeroy approved these changes Nov 25, 2024

View reviewed changes

zama-bot added the approved label Nov 25, 2024

agnesLeroy merged commit 81e11a6 into main Nov 25, 2024
99 of 105 checks passed

agnesLeroy deleted the go/refactor/improve-full-propagation-and-sum-algorithms branch November 25, 2024 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gpu): improve full propagation in sum and sub #1763

feat(gpu): improve full propagation in sum and sub #1763

guillermo-oyarzun commented Nov 8, 2024

agnesLeroy left a comment

agnesLeroy Nov 8, 2024 •

edited

Loading

agnesLeroy Nov 8, 2024

agnesLeroy Nov 8, 2024

agnesLeroy Nov 8, 2024

guillermo-oyarzun Nov 12, 2024

agnesLeroy Nov 8, 2024

agnesLeroy Nov 8, 2024

feat(gpu): improve full propagation in sum and sub #1763

feat(gpu): improve full propagation in sum and sub #1763

Conversation

guillermo-oyarzun commented Nov 8, 2024

PR content/description

Check-list:

agnesLeroy left a comment

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024

Choose a reason for hiding this comment

guillermo-oyarzun Nov 12, 2024

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024 •

edited

Loading