Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gpu): improve full propagation in sum and sub #1763

Merged

Conversation

guillermo-oyarzun
Copy link
Member

closes: please link all relevant issues

PR content/description

Check-list:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Relevant issues are marked as resolved/closed, related issues are linked in the description
  • Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

@guillermo-oyarzun guillermo-oyarzun self-assigned this Nov 8, 2024
@cla-bot cla-bot bot added the cla-signed label Nov 8, 2024
@guillermo-oyarzun guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch from 6b37fe7 to 44cb537 Compare November 8, 2024 11:56
Copy link
Contributor

@agnesLeroy agnesLeroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @guillermo-oyarzun! Thanks a lot for this PR 🙏 Here comes a first review: I don't know the details of the implementation so it's hard for me to go through all the logic though. Maybe if you walk me through it it would help.

multi_gpu_alloc_lwe_async(streams, gpu_indexes, active_gpu_count,
lwe_after_ks_vec, num_radix_blocks,
params.small_lwe_dimension + 1);
multi_gpu_alloc_many_lwe_async(streams, gpu_indexes, active_gpu_count,
Copy link
Contributor

@agnesLeroy agnesLeroy Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this function could be renamed: multi_gpu_alloc_lwe_many_lut_output_async?

cudaStream_t const *streams, uint32_t const *gpu_indexes,
uint32_t gpu_count, Torus *lwe_array, int_radix_params params,
int_shifted_blocks_and_states_memory<Torus> *mem, void *const *bsks,
Torus *const *ksks, uint32_t num_blocks, uint32_t lut_stride,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe num_blocks -> num_radix_blocks

message_modulus,
);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to add this to core crypto, do we?

@@ -227,6 +285,18 @@ impl CudaServerKey {
streams.synchronize();
}

pub fn unchecked_add_assign_with_packing<T: CudaIntegerRadixCiphertext>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have this entry point on the Rust side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is something needed for the signed overflowing add/sub, not actually tested in this PR, I could remove it and just included in the other PR we will have

///
/// - `streams` __must__ be synchronized to guarantee computation has finished, and inputs must
/// not be dropped until streams is synchronized
pub(crate) unsafe fn new_propagate_single_carry_assign_async<T>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we name this one propagate_single_carry_assign_async and remove the old one?

where
T: CudaIntegerRadixCiphertext,
{
self.propagate_fast_single_carry_assign_async(ct, streams, input_carry, requested_flag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we keep only the fast version on the Rust side, and remove the old one?

@guillermo-oyarzun guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch 15 times, most recently from 3742e90 to e5a75fe Compare November 17, 2024 17:08
@guillermo-oyarzun guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch 3 times, most recently from f1ada0d to 7d6a51f Compare November 22, 2024 09:15
@guillermo-oyarzun guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch from 7d6a51f to 8e2b993 Compare November 22, 2024 17:35
@agnesLeroy agnesLeroy merged commit 81e11a6 into main Nov 25, 2024
99 of 105 checks passed
@agnesLeroy agnesLeroy deleted the go/refactor/improve-full-propagation-and-sum-algorithms branch November 25, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants