fix: avoid additional array allocation in host to device transfer #2966

Alexandr-Solovev · 2024-11-04T12:03:01Z

Description

PR introduces fix for memcpy function in oneDAL. It saves memory and improve the performance of transferring data from host to device and back and gives an opportunity to use memcpy directly.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

Alexandr-Solovev · 2024-11-04T12:03:30Z

/intelci: run

ethanglaser

Looks like #2375 was just missing the wait_and_throw() :(

Alexandr-Solovev · 2024-11-05T16:30:09Z

Please dont merge, until the additional check on local machines

Alexandr-Solovev · 2024-11-05T16:39:43Z

Looks like #2375 was just missing the wait_and_throw() :(

not sure, looks like in all places where the function is called there is additional wait_and_throw()

Alexandr-Solovev · 2024-11-20T10:59:52Z

/intelci: run

cpp/oneapi/dal/table/backend/convert.cpp

Alexandr-Solovev · 2024-11-21T08:10:06Z

/intelci: run

Alexandr-Solovev · 2024-11-21T08:10:43Z

/intelci: run

icfaust

Made comments help to me understand. Would it be worth just adding some? Otherwise good to go by me (if my comments in the review are correct)

icfaust · 2024-11-21T09:24:47Z

cpp/oneapi/dal/table/backend/homogen_kernels.cpp

@@ -130,33 +130,20 @@ static void pull_row_major_impl(const Policy& policy,

        auto src_data = origin_data.get_data() + origin_offset * origin_dtype_size;
        auto dst_data = block_data.get_mutable_data();
-
-        if (block_info.get_column_count() > 1) {


Just because I am not familiar, it looks like just removing the if and relying on the for loop (a simplification here).

icfaust · 2024-11-21T09:28:02Z

cpp/oneapi/dal/table/backend/convert.cpp

+        dal::detail::check_mul_overflow(src_element_size_in_bytes, element_count);
+    const std::int64_t src_stride_in_bytes =
+        dal::detail::check_mul_overflow(src_element_size_in_bytes, src_stride);
+    if (src_type == dst_type && src_size_in_bytes == dst_size_in_bytes &&


If allocation exists, skip the convert_convert vector and uses memcpy_host2usm

The issue, that when we copy from host to device in different data types(float on host and allocated on device in double), we still just copy bytes, and in the scenario of usage different data types it makes sense to use strides

icfaust · 2024-11-21T09:29:34Z

cpp/oneapi/dal/table/backend/convert.cpp

@@ -227,33 +227,50 @@ sycl::event convert_vector_device2host(sycl::queue& q,
    // To perform conversion, we gather data from device to host in temporary
    // contigious array and then run host conversion function

-    const std::int64_t element_size_in_bytes = dal::detail::get_data_type_size(src_type);
+    const std::int64_t dst_element_size_in_bytes = dal::detail::get_data_type_size(dst_type);


collect more information about the destination to see if a shortcut can be used.

samir-nasibli

Indeed good suggestion! Thank you @Alexandr-Solovev
Could you please share perf improvements?

Alexandr-Solovev · 2024-11-21T13:21:16Z

/intelci: run

ethanglaser · 2024-11-22T16:54:04Z

Are you still seeing seg faults when experimenting locally or has this issue been resolved?

Alexandr-Solovev · 2024-11-26T08:47:42Z

/intelci: run

avoid additional array

b8dce34

Alexandr-Solovev added the bug label Nov 4, 2024

Alexandr-Solovev changed the title ~~avoid additional array~~ fix: avoid additional array allocation in host to device transfer Nov 5, 2024

Alexandr-Solovev marked this pull request as ready for review November 5, 2024 07:39

Alexandr-Solovev requested review from Alexsandruss and samir-nasibli as code owners November 5, 2024 07:39

Alexandr-Solovev requested review from Vika-F, ethanglaser and icfaust November 5, 2024 07:39

ethanglaser approved these changes Nov 5, 2024

View reviewed changes

Alexandr-Solovev changed the title ~~fix: avoid additional array allocation in host to device transfer~~ DO NOT MERGE fix: avoid additional array allocation in host to device transfer Nov 5, 2024

Merge branch 'oneapi-src:main' into dev/asolovev_memory_opt

21db557

Alexandr-Solovev mentioned this pull request Nov 19, 2024

Replace internally developed CSR GEMM with a call to MKL #2959

Merged

9 tasks

Alexandr-Solovev added 3 commits November 19, 2024 19:34

Merge branch 'oneapi-src:main' into dev/asolovev_memory_opt

86915c4

minor fix

00d315f

fix

7c5ac63

Vika-F reviewed Nov 20, 2024

View reviewed changes

cpp/oneapi/dal/table/backend/convert.cpp Outdated Show resolved Hide resolved

Vika-F reviewed Nov 20, 2024

View reviewed changes

cpp/oneapi/dal/table/backend/convert.cpp Outdated Show resolved Hide resolved

Alexandr-Solovev marked this pull request as draft November 20, 2024 12:42

minor fixes

74c2eba

Alexandr-Solovev requested a review from ethanglaser November 21, 2024 08:10

Alexandr-Solovev marked this pull request as ready for review November 21, 2024 08:10

Alexandr-Solovev changed the title ~~DO NOT MERGE fix: avoid additional array allocation in host to device transfer~~ fix: avoid additional array allocation in host to device transfer Nov 21, 2024

icfaust approved these changes Nov 21, 2024

View reviewed changes

samir-nasibli reviewed Nov 21, 2024

View reviewed changes

fixes for ndarray

baee29c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid additional array allocation in host to device transfer #2966

fix: avoid additional array allocation in host to device transfer #2966

Alexandr-Solovev commented Nov 4, 2024 •

edited

Loading

Alexandr-Solovev commented Nov 4, 2024

ethanglaser left a comment

Alexandr-Solovev commented Nov 5, 2024

Alexandr-Solovev commented Nov 5, 2024

Alexandr-Solovev commented Nov 20, 2024

Alexandr-Solovev commented Nov 21, 2024

Alexandr-Solovev commented Nov 21, 2024

icfaust left a comment

icfaust Nov 21, 2024

Alexandr-Solovev Nov 21, 2024

icfaust Nov 21, 2024

Alexandr-Solovev Nov 21, 2024

icfaust Nov 21, 2024

Alexandr-Solovev Nov 21, 2024

samir-nasibli left a comment

Alexandr-Solovev commented Nov 21, 2024

ethanglaser commented Nov 22, 2024

Alexandr-Solovev commented Nov 26, 2024

fix: avoid additional array allocation in host to device transfer #2966

Are you sure you want to change the base?

fix: avoid additional array allocation in host to device transfer #2966

Conversation

Alexandr-Solovev commented Nov 4, 2024 • edited Loading

Description

Alexandr-Solovev commented Nov 4, 2024

ethanglaser left a comment

Choose a reason for hiding this comment

Alexandr-Solovev commented Nov 5, 2024

Alexandr-Solovev commented Nov 5, 2024

Alexandr-Solovev commented Nov 20, 2024

Alexandr-Solovev commented Nov 21, 2024

Alexandr-Solovev commented Nov 21, 2024

icfaust left a comment

Choose a reason for hiding this comment

icfaust Nov 21, 2024

Choose a reason for hiding this comment

Alexandr-Solovev Nov 21, 2024

Choose a reason for hiding this comment

icfaust Nov 21, 2024

Choose a reason for hiding this comment

Alexandr-Solovev Nov 21, 2024

Choose a reason for hiding this comment

icfaust Nov 21, 2024

Choose a reason for hiding this comment

Alexandr-Solovev Nov 21, 2024

Choose a reason for hiding this comment

samir-nasibli left a comment

Choose a reason for hiding this comment

Alexandr-Solovev commented Nov 21, 2024

ethanglaser commented Nov 22, 2024

Alexandr-Solovev commented Nov 26, 2024

Alexandr-Solovev commented Nov 4, 2024 •

edited

Loading