diff --git a/P0009/wg21/data/index.yaml b/P0009/wg21/data/index.yaml index f321068a..cf88f9c7 100644 --- a/P0009/wg21/data/index.yaml +++ b/P0009/wg21/data/index.yaml @@ -83181,6 +83181,14 @@ references: issued: year: 2019 URL: https://wg21.link/p0009r9 + - id: P0009R18 + citation-label: P0009R18 + title: "mdspan" + author: + - family: Christian Trott, D.S. Hollman, Damien Lebrun-Grandie, Mark Hoemmen, Daniel Sunderland, H. Carter Edwards, Bryce Adelstein Lelbach, Mauro Bianco, Ben Sander, Athanasios Iliopoulos, John Michopoulos, Nevin Liber + issued: + year: 2022 + URL: https://wg21.link/p0009r18 - id: P0010R0 citation-label: P0010R0 title: "Adding a subsection for concurrent random number generation in C++17" @@ -104753,6 +104761,14 @@ references: issued: year: 2019 URL: https://wg21.link/p1673r1 + - id: P1673R13 + citation-label: P1673R13 + title: "A free function linear algebra interface based on the BLAS" + author: + - family: Mark Hoemmen, Daisy Hollman, Christian Trott, Daniel Sunderland, Nevin Liber, Alicia Klinvex, Li-Ta Lo, Damien Lebrun-Grandie, Graham Lopez, Peter Caday, Sarah Knepper, Piotr Luszczek, Timothy Costa + issued: + year: 2023 + URL: https://wg21.link/p1673r13 - id: P1674R0 citation-label: P1674R0 title: "Evolving a Standard C++ Linear Algebra Library from the BLAS" @@ -104889,6 +104905,14 @@ references: issued: year: 2019 URL: https://wg21.link/p1684r0 + - id: P1684R5 + citation-label: P1684R5 + title: "mdarray: An Owning Multidimensional Array Analog of mdspan" + author: + - family: Christian Trott, Daisy Hollman, Mark Hoemmen, Daniel Sunderland, Damien Lebrun-Grandie + issued: + year: 2023 + URL: https://wg21.link/p1684r5 - id: P1685R0 citation-label: P1685R0 title: "Make get/set_default_resource replaceable" @@ -107369,6 +107393,14 @@ references: issued: year: 2019 URL: https://wg21.link/p1999r0 + - id: P2630R4 + citation-label: P2630R4 + title: "Submdspan" + author: + - family: Christian Trott, Damien Lebrun-Grandie, Mark Hoemmen, Nevin Liber + issued: + year: 2023 + URL: https://wg21.link/p2630r4 - id: P3141 citation-label: P3141 title: "std::terminates()" diff --git a/P3242/Makefile b/P3242/Makefile new file mode 100644 index 00000000..655cdec6 --- /dev/null +++ b/P3242/Makefile @@ -0,0 +1,3 @@ +include ../P0009/wg21/Makefile + +.DEFAULT_GOAL := $(HTML) diff --git a/P3242/P3242.html b/P3242/P3242.html new file mode 100644 index 00000000..cf727dd4 --- /dev/null +++ b/P3242/P3242.html @@ -0,0 +1,702 @@ + + + + + + + + Copy and fill for mdspan + + + + + + + + +
+
+

Copy and fill for +mdspan

+ + + + + + + + + + + + + + + + + + +
Document #: P3242R0
Date: 2024-04-15
Project: Programming Language C++
+ LEWG
+
Reply-to: + Nicolas Morales
<>
+ Christian Trott
<>
+ Mark Hoemmen
<>
+ Damien Lebrun-Grandie
<>
+
+ +
+
+ +

1 Motivation

+

C++23 introduced mdspan ([P0009R18]), a non-owning +multidimensional array abstraction that has a customizable layout. +Layout customization was originally motivated in [P0009R18] with considerations for +interoperability and performance, particularly on different +architectures. Moreover, [P2630R4] introduced +submdspan, a slicing function that can yield arbitrarily +strided layouts. However, without standard library support, copying +efficiently between mdspans with mixes of complex layouts +is challenging for users.

+

Many applications, including high-performance computing (HPC), image +processing, computer graphics, etc that benefit from mdspan +also would benefit from basic memory operations provided in standard +algorithms such as copy and fill. Indeed, the authors found that a copy +algorithm would have been quite useful in their implementation of the +copying mdarray ([P1684R5]) constructor. A more +constrained form of copy is also included in the standard +linear algebra library ([P1673R13]).

+

However, existing standard library facilities are not sufficient +here. Currently, mdspan does not have iterators or ranges +that represent the span of the mdspan. Additionally, it’s +not entirely clear what this would entail. +std::linalg::copy ([P1673R13]) is limited to +mdspans of rank 2 or lower.

+

Moreover, the manner in which an mdspan is copied (or +filled) is highly performance sensitive, particularly in regards to +caching behavior when traversing mdspan memory. A naive +user implementation is easy to get wrong in addition to being tedious +for higher rank mdspans. Ideally, an implementation would +be free to use information about the layout of the mdspan +known at compile time to perform optimizations; e.g. a continuous span +mdspan copy for trivial types could be implemented with a +memcpy.

+

Finally, providing these generic algorithms would also enable these +operations for types that are representable by mdspan. For +example, this would naturally include mdarray, which is +convertible to mdspan, or for user-defined types whose view +of memory corresponds to mdspans (e.g. an image class or +something similar).

+

1.1 Safety

+

Due to the closed nature of mdspan extents, copy +operations can be checked by the implementation to prevent past-the-end +writes. This is an advantage the proposed copy operation has over the +existing operations in the standard.

+

2 Design

+

The main design direction of this proposal is to provide methods for +copying and filling mdspans that may have differing layouts +and accessors, while allowing implementations to provide efficient +implementations for special cases. For example, if a copy occurs between +two mdspans with the same layout mapping type that is +contiguous and both use default_accessor, the intention is +that this could be implemented by a single memcpy.

+

Furthermore, accessors as a customization point should be enabled, as +with any other mdspan operation. For example, a custom +accessor that checks a condition inside of the access +method should still work and check that condition. It’s worth noting +that there may be a high sensitivity of how much implementations able to +optimize if provided custom accessors. For example, optimizations could +be disabled if using a custom accessor that is identical to the default +accessor.

+

Finally, there is some question as to whether copy and +fill should return a value when applied to +mdspan, as the iterator and ranged-based algorithms do. We +believe that mdspan copy and fill should return void, as +there is no past-the-end iterator that they could reasonably return.

+ +

Currently, we are proposing adding copy and +fill algorithms on mdspan to header +<mdspan>. We considered other options, namely:

+ +

We settled on <mdspan> because as proposed this is +a relatively light-weight addition that reflects operations that are +commonly desired with mdspans. However, the authors are +open to changing this.

+

2.2 Existing copy in +std::linalg

+

[P1673R13] introduced several linear +algebra operations including std::linalg::copy. This +operation only applies to mdspans with rank ≤ 2. +This paper is proposing a version of copy that is not +constrained by the number of ranks and differs from +std::linalg::copy in some important ways outline below.

+

Note that right now the strict addition of copy would +potentially cause the following code to be ambiguous, due to ADL-finding +std::copy:

+
using std::linalg::copy;
+copy(mds1, mds2);
+

One possibility would be to remove std::linalg::copy, as +it is a subset of the proposed std::copy. This was rejected +by the paper authors because of certain requirements in +[linalg.reqs.alg] – that is:

+
+

The function may make arbitrarily many objects of any linear algebra +value type, value-initializing or direct-initializing them with any +existing object of that type.

+
+

This requirement is likely undesirable for a generalized copy +algorithm.

+

There is a similar argument against simply generalizing +std::linalg::copy. In addition to the freedom of +std::linalg::copy to arbitrarily value or +direct-initializing values, using the linear algebra version of copy +would require the use of unnecessary includes and namespaces. It seems +not very ergonomic for a user to have to use +std::linalg::copy and include <linalg> +even if the mdspan operations they are performing are +unrelated to linear algebra.

+

2.3 What the proposal does not +include

+

There are a few additions that are analogous to existing standard +algorithms that are not included in this proposal, both to keep the +proposal small and because some of these algorithms do not make sense in +the context of mdspans:

+ +

3 Wording

+
template<class SrcElementType, class SrcExtents, class SrcLayoutPolicy, class SrcAccessorPolicy,
+         class DstElementType, class DstExtents, class DstLayoutPolicy, class DstAccessorPolicy>
+void copy(mdspan<SrcElementType, SrcExtents, SrcLayoutPolicy, SrcAccessorPolicy> src,
+          mdspan<DstElementType, DstExtents, DstLayoutPolicy, DstAccessorPolicy> dst);
+
+template<class ExecutionPolicy,
+         class SrcElementType, class SrcExtents, class SrcLayoutPolicy, class SrcAccessorPolicy,
+         class DstElementType, class DstExtents, class DstLayoutPolicy, class DstAccessorPolicy>
+void copy(ExecutionPolicy&& policy,
+          mdspan<SrcElementType, SrcExtents, SrcLayoutPolicy, SrcAccessorPolicy> src,
+          mdspan<DstElementType, DstExtents, DstLayoutPolicy, DstAccessorPolicy> dst);
+

1 +Constraints:

+ +

2 +Preconditions:

+ +

3 +Effects: for all unique multidimensional indices +i... in src.extents(), assigns +src[i...] to dst[i...]

+
template<class ElementType, class Extents, class LayoutPolicy, class AccessorPolicy, class T>
+void fill(mdspan<ElementType, Extents, LayoutPolicy, AccessorPolicy> dst, const T& value);
+
+template<class ExecutionPolicy,
+         class ElementType, class Extents, class LayoutPolicy, class AccessorPolicy, class T>
+void fill(ExecutionPolicy&& policy,
+          mdspan<ElementType, Extents, LayoutPolicy, AccessorPolicy> dst, const T& value);
+

4 +Constraints: +std::is_assignable_v<typename mdspan<ElementType, Extents, LayoutPolicy, AccessorPolicy>::reference, const T&>

+

5 +Effects: for all unique multidimensional indices +i... in dst.extents(), assigns +value to dst[i...]

+

4 Acknowledgments

+

Sandia National Laboratories is a multimission laboratory managed and +operated by National Technology & Engineering Solutions of Sandia, +LLC, a wholly owned subsidiary of Honeywell International Inc., for the +U.S. Department of Energy’s National Nuclear Security Administration +under contract DE-NA0003525. This paper describes objective technical +results and analysis. Any subjective views or opinions that might be +expressed in the paper do not necessarily represent the views of the +U.S. Department of Energy or the United States Government.

+

5 References

+
+
+
[P0009R18]
Christian Trott, D.S. Hollman, Damien +Lebrun-Grandie, Mark Hoemmen, Daniel Sunderland, H. Carter Edwards, +Bryce Adelstein Lelbach, Mauro Bianco, Ben Sander, Athanasios +Iliopoulos, John Michopoulos, Nevin Liber. 2022. mdspan.
https://wg21.link/p0009r18
+
+
+
[P1673R13]
Mark Hoemmen, Daisy Hollman, Christian Trott, +Daniel Sunderland, Nevin Liber, Alicia Klinvex, Li-Ta Lo, Damien +Lebrun-Grandie, Graham Lopez, Peter Caday, Sarah Knepper, Piotr +Luszczek, Timothy Costa. 2023. A free function linear algebra interface +based on the BLAS.
https://wg21.link/p1673r13
+
+
+
[P1684R5]
Christian Trott, Daisy Hollman, Mark Hoemmen, +Daniel Sunderland, Damien Lebrun-Grandie. 2023. mdarray: An Owning +Multidimensional Array Analog of mdspan.
https://wg21.link/p1684r5
+
+
+
[P2630R4]
Christian Trott, Damien Lebrun-Grandie, Mark +Hoemmen, Nevin Liber. 2023. Submdspan.
https://wg21.link/p2630r4
+
+
+
+
+ + diff --git a/P3242/P3242.md b/P3242/P3242.md new file mode 100644 index 00000000..88c49e2a --- /dev/null +++ b/P3242/P3242.md @@ -0,0 +1,127 @@ +--- +title: "Copy and fill for `mdspan`" +document: P3242R0 +date: today +audience: LEWG +author: + - name: Nicolas Morales + email: + - name: Christian Trott + email: + - name: Mark Hoemmen + email: + - name: Damien Lebrun-Grandie + email: +--- + +# Motivation + +C++23 introduced `mdspan` ([@P0009R18]), a non-owning multidimensional array abstraction that has a customizable layout. Layout customization was originally motivated in [@P0009R18] with considerations for interoperability and performance, particularly on different architectures. Moreover, [@P2630R4] introduced `submdspan`, a slicing function that can yield arbitrarily strided layouts. However, without standard library support, copying efficiently between `mdspan`s with mixes of complex layouts is challenging for users. + +Many applications, including high-performance computing (HPC), image processing, computer graphics, etc that benefit from `mdspan` also would benefit from basic memory operations provided in standard algorithms such as copy and fill. Indeed, the authors found that a copy algorithm would have been quite useful in their implementation of the copying `mdarray` ([@P1684R5]) constructor. A more constrained form of `copy` is also included in the standard linear algebra library ([@P1673R13]). + +However, existing standard library facilities are not sufficient here. Currently, `mdspan` does not have iterators or ranges that represent the span of the `mdspan`. Additionally, it's not entirely clear what this would entail. `std::linalg::copy` ([@P1673R13]) is limited to `mdspan`s of rank 2 or lower. + +Moreover, the manner in which an `mdspan` is copied (or filled) is highly performance sensitive, particularly in regards to caching behavior when traversing `mdspan` memory. A naive user implementation is easy to get wrong in addition to being tedious for higher rank `mdspan`s. Ideally, an implementation would be free to use information about the layout of the `mdspan` known at compile time to perform optimizations; e.g. a continuous span `mdspan` copy for trivial types could be implemented with a `memcpy`. + +Finally, providing these generic algorithms would also enable these operations for types that are representable by `mdspan`. For example, this would naturally include `mdarray`, which is convertible to `mdspan`, or for user-defined types whose view of memory corresponds to `mdspan`s (e.g. an image class or something similar). + +## Safety + +Due to the closed nature of `mdspan` extents, copy operations can be checked by the implementation to prevent past-the-end writes. This is an advantage the proposed copy operation has over the existing operations in the standard. + +# Design + +The main design direction of this proposal is to provide methods for copying and filling `mdspan`s that may have differing layouts and accessors, while allowing implementations to provide efficient implementations for special cases. For example, if a copy occurs between two `mdspan`s with the same layout mapping type that is contiguous and both use `default_accessor`, the intention is that this could be implemented by a single `memcpy`. + +Furthermore, accessors as a customization point should be enabled, as with any other `mdspan` operation. For example, a custom accessor that checks a condition inside of the `access` method should still work and check that condition. It's worth noting that there may be a high sensitivity of how much implementations able to optimize if provided custom accessors. For example, optimizations could be disabled if using a custom accessor that is identical to the default accessor. + +Finally, there is some question as to whether `copy` and `fill` should return a value when applied to `mdspan`, as the iterator and ranged-based algorithms do. We believe that `mdspan` copy and fill should return void, as there is no past-the-end iterator that they could reasonably return. + +## Header + +Currently, we are proposing adding `copy` and `fill` algorithms on `mdspan` to header ``. We considered other options, namely: + +* ``: This would mean that users of iterator-based algorithms would need to pull in ``. On the other hand, this is where iterator-based `copy` and `fill` live so may be preferable in that sense. +* `` (or similarly any other new header): This seems like overkill for two functions. However, in the future, we may want to add new algorithms for `mdspan` that are not strictly covered by existing algorithms in ``, so this option may be more future proof. + +We settled on `` because as proposed this is a relatively light-weight addition that reflects operations that are commonly desired with `mdspan`s. However, the authors are open to changing this. + +## Existing `copy` in `std::linalg` + +[@P1673R13] introduced several linear algebra operations including `std::linalg::copy`. This operation only applies to `mdspan`s with $rank \le 2$. This paper is proposing a version of `copy` that is not constrained by the number of ranks and differs from `std::linalg::copy` in some important ways outline below. + +Note that right now the strict addition of `copy` would potentially cause the following code to be ambiguous, due to ADL-finding `std::copy`: + +```c++ +using std::linalg::copy; +copy(mds1, mds2); +``` + +One possibility would be to remove `std::linalg::copy`, as it is a subset of the proposed `std::copy`. This was rejected by the paper authors because of certain requirements in \[linalg.reqs.alg\] -- that is: + +> The function may make arbitrarily many objects of any linear algebra value type, value-initializing or direct-initializing them with any existing object of that type. + +This requirement is likely undesirable for a generalized copy algorithm. + +There is a similar argument against simply generalizing `std::linalg::copy`. In addition to the freedom of `std::linalg::copy` to arbitrarily value or direct-initializing values, using the linear algebra version of copy would require the use of unnecessary includes and namespaces. It seems not very ergonomic for a user to have to use `std::linalg::copy` and include `` even if the `mdspan` operations they are performing are unrelated to linear algebra. + +## What the proposal does not include + +There are a few additions that are analogous to existing standard algorithms that are not included in this proposal, both to keep the proposal small and because some of these algorithms do not make sense in the context of `mdspan`s: + +* `std::move`: Perhaps this should be included for completeness's sake. However, it doesn't seem applicable to the typical usage of `mdspan`. +* `(copy|fill)_n`: As a multidimensional view `mdspan` does not in general follow a specific ordering. Memory ordering may not be obvious to calling code, so it's not even clear how these would work. Any applications intending to copy a subset of `mdspan` should use call `copy` on the result of `submdspan`. +* `copy_backward`: As above, there is no specific ordering. A similar effect could be achieved via transformations with a custom layout, similar to `layout_transpose` in [@P1673R13]. +* Other algorithms, include `std::for_each`. `for_each` in particular is a desirable but brings in many unanswered questions that should be addressed in a different paper. + +# Wording + +```c++ +template +void copy(mdspan src, + mdspan dst); + +template +void copy(ExecutionPolicy&& policy, + mdspan src, + mdspan dst); +``` + +[1]{.pnum} *Constraints:* + + * [1.1]{.pnum} `std::is_assignable_v::reference, typename mdspan::reference>` is `true`. + + * [1.2]{.pnum} `is_constructible_v`. + +[2]{.pnum} *Preconditions:* + + * [2.1]{.pnum} `src.extents() == dst.extents()` is `true`. + + * [2.2]{.pnum} `dst.is_unique()` is `true`. + + * [2.3]{.pnum} there is no unique multidimensional index `i...` in `src.extents()` where there exists a multidimensional index `j...` in `dst.extents()` such that `src[i...]` and `dst[j...]` refer to the same element. + +[3]{.pnum} *Effects:* for all unique multidimensional indices `i...` in `src.extents()`, assigns `src[i...]` to `dst[i...]` + + +```c++ +template +void fill(mdspan dst, const T& value); + +template +void fill(ExecutionPolicy&& policy, + mdspan dst, const T& value); +``` + +[4]{.pnum} *Constraints:* `std::is_assignable_v::reference, const T&>` + +[5]{.pnum} *Effects:* for all unique multidimensional indices `i...` in `dst.extents()`, assigns `value` to `dst[i...]` + +# Acknowledgments + +Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.