Skip to content

Commit

Permalink
Integrated to the main unified specification and other updates.
Browse files Browse the repository at this point in the history
* Moved the functionality to clCreateBufferWithProperties,
  thus now requiring 3.0+.
* Single memobj query for fetching the address(es).
* Also other smaller improvements pointed by Kevin.
* Candidate for 1.0.0.
  • Loading branch information
pjaaskel committed Dec 6, 2024
1 parent e9f1bbc commit 9ad04fb
Show file tree
Hide file tree
Showing 5 changed files with 232 additions and 337 deletions.
96 changes: 96 additions & 0 deletions api/cl_ext_buffer_device_address.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
// Copyright 2024 The Khronos Group Inc.
// SPDX-License-Identifier: CC-BY-4.0

include::{generated}/meta/{refprefix}cl_ext_buffer_device_address.txt[]

=== Other Extension Metadata

*Last Modified Date*::
2024-12-06
*IP Status*::
No known IP claims.
*Contributors*::
- Pekka Jääskeläinen, Intel +
- Karol Herbst, Red Hat +
- Henry Linjamäki, Intel +
- Kevin Petit, Arm +

=== Description

This extension provides access to raw device pointers for cl_mem buffers
without requiring a shared virtual address space between the host and
the device.

==== Background

Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
that enables raw pointers in the OpenCL standard. Its coarse-grain
variant is relatively simple to implement on various platforms in terms of
coherency requirements, but it requires mapping the buffer's address range
to the host virtual address space.
However, various higher-level heterogeneous APIs present a memory allocation
routine which can allocate device-only memory and provide raw addresses to
it without guarentees of system-wide uniqueness. For example, minimal
implementations of OpenMP's omp_target_alloc() and CUDA/HIP's
cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device.

Host-device unified addressing might not be a major implementation issue in
systems which can provide virtual memory across the platform, but might
bring challenges in cases where the device presents a global memory with
a disjoint address space (that can also be a physical memory address space) or,
for example, when a barebone embedded system lacks virtual memory support altogether.
This extension is targeted to complement the OpenCL SVM extension by providing
an additional lower-end step in the spectrum of type of pointers/buffers OpenCL
can allocate.

=== New Command

* {clSetKernelArgDevicePointerEXT}

=== New Types

* {cl_mem_device_address_EXT}

=== New Enums

* {cl_mem_properties_TYPE}
** {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
** {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
* {cl_mem_info_TYPE}
** {CL_MEM_DEVICE_ADDRESS_EXT}
* {cl_kernel_exec_info_TYPE}
** {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT}

=== Version History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|====
| *Version* | *Date* | *Author* | *Changes*
| 0.9.0 | 2024-12-06 | Pekka Jääskeläinen, Kevin Petit |
Integrated to the main unified specification.
Moved the functionality to clCreateBufferWithProperties,
thus requiring 3.0+. Single memobj query for fetching the
address(es). Also other smaller improvements pointed by Kevin.
Candidate for final 1.0.0.
| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
Made the allocation flags independent from each other and
renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
same address across all devices in the context, whereas the latter
allows per-device addresses.
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
cases "all", not "any", covering a case where not all devices
can ensure the same address across the context. In that case
CL_INVALID_VALUE can be returned. Defined sub-buffer address
computation to be 'base_addr + origin'. Added error conditions
for clSetKernelExecInfo when the device doesn't support
device pointers.
| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback.
This version describes the first API version that was prototyped
in PoCL and RustiCL using temporary placeholder flag/enum values.
The PoCL implementation and initial discussion on the extension
can be found https://github.com/pocl/pocl/pull/1441[in this PR].
|====
126 changes: 125 additions & 1 deletion api/opencl_runtime_layer.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -594,6 +594,39 @@ include::{generated}/api/version-notes/CL_MEM_DEVICE_HANDLE_LIST_KHR.asciidoc[]
{CL_MEM_DEVICE_HANDLE_LIST_END_KHR_anchor}) to associate with the
external memory handle.
endif::cl_khr_external_memory[]

ifdef::cl_ext_buffer_device_address[]

| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}

include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
| {cl_bool_TYPE}
| When set to CL_TRUE, specifies that the buffer must have a single fixed
device-side address for its lifetime, and the address can be queried via {clGetMemObjectInfo}.

Each device in the context can have their own (fixed) device-side address and
a copy of the created buffer which are synchronized
implicitly by the runtime.

The flag might imply that the buffer will be "pinned" permanently to
a device's memory, but might not be necessarily so, as long as the address
range of the buffer remains constant.

The device addresses of sub-buffers derived from {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT}
allocated buffers can be computed by adding the sub-buffer origin to the
device-specific start address.

| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor}

include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[]
| {cl_bool_TYPE}
| When set to CL_TRUE, the buffer has otherwise the same properties as
when allocated using the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} flag,
but with an additional property that the buffer's address is the same across
all the devices in the context.

endif::cl_ext_buffer_device_address[]

|====

ifdef::cl_khr_external_memory[]
Expand Down Expand Up @@ -660,6 +693,15 @@ ifdef::cl_khr_external_memory[]
** if _properties_ does not include a supported external memory handle and
{CL_MEM_DEVICE_HANDLE_LIST_KHR} is specified as part of _properties_.
endif::cl_khr_external_memory[]
ifdef::cl_ext_buffer_device_address[]
* {CL_INVALID_DEVICE}
** If _properties_ includes either {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} or
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} and there is at least one device in
the context that doesn't support such allocation.
* {CL_INVALID_VALUE}
** If _properties_ includes both {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} and
{CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} at the same time.
endif::cl_ext_buffer_device_address[]

[[memory-flags-table]]
.List of supported memory flag values
Expand Down Expand Up @@ -6149,6 +6191,21 @@ include::{generated}/api/version-notes/CL_MEM_D3D11_RESOURCE_KHR.asciidoc[]
returns the _resource_ argument specified when _memobj_ was created.
endif::cl_khr_d3d11_sharing[]

ifdef::cl_ext_buffer_device_address[]
| {CL_MEM_DEVICE_ADDRESS_EXT_anchor}

include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[]
| {cl_mem_device_address_EXT_TYPE}[]
| If _memobj_ was created using {clCreateBufferWithProperties} with
the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} property set to CL_TRUE,
returns a list of device addresses for the buffer, one for each
device in the context. If the buffer was allocated
with the {CL_MEM_DEVICE_SHARED_ADDRESS_EXT} property,
only one device address is returned.

endif::cl_ext_buffer_device_address[]


|====

// refError
Expand All @@ -6158,6 +6215,12 @@ successfully.
Otherwise, it returns one of the following errors:

* {CL_INVALID_MEM_OBJECT} if _memobj_ is a not a valid memory object.
ifdef::cl_ext_buffer_device_address[]
** Returned for the {CL_MEM_DEVICE_ADDRESS_EXT} query if
the {cl_ext_buffer_device_address_EXT} is not supported or if the
buffer was not allocated with neither {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or
{CL_MEM_DEVICE_SHARED_ADDRESS_EXT}.
endif::cl_ext_buffer_device_address[]
* {CL_INVALID_VALUE} if _param_name_ is not valid, or if size in bytes
specified by _param_value_size_ is < size of return type as described in
the <<mem-info-table,Memory Object Info>> table and _param_value_ is not
Expand Down Expand Up @@ -10454,6 +10517,48 @@ Otherwise, it returns one of the following errors:
required by the OpenCL implementation on the host.
--

ifdef::cl_ext_buffer_device_address[]
[open,refpage='clSetKernelArgDevicePointerEXT',desc='Set a device pointer as the argument value for a specific argument of a kernel.',type='protos']
--
To set a device pointer as the argument value for a specific argument of a
kernel, call the function

include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]

* _kernel_ is a valid kernel object.
* _arg_index_ is the argument index.
Arguments to the kernel are referred by indices that go from 0 for the
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
declared by a kernel.
* _arg_value_ is the device pointer that should be used as the argument value for
argument specified by _arg_index_.
The device pointer specified is the value used by all API calls that enqueue
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
value is changed by a call to {clSetKernelArgDevicePointerEXT} for _kernel_.
The device pointer can only be used for arguments that are declared to be a
pointer to `global` memory allocated with {clCreateBufferWithProperties} with
either the {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT} or {CL_MEM_DEVICE_SHARED_ADDRESS_EXT}
property. The pointer value specified as the argument value
can be the pointer to the beginning of the buffer or any offset into
the buffer region. The device pointer value must be naturally aligned according to
the argument's type.

{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the argument was set
successfully. Otherwise, it returns one of the following errors:

* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
the device pointer.
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
* {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value.
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
by the OpenCL implementation on the device.
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
required by the OpenCL implementation on the host.
--
endif::cl_ext_buffer_device_address[]

[open,refpage='clSetKernelExecInfo',desc='Pass additional information other than argument values to a kernel.',type='protos']
--
To pass additional information other than argument values to a kernel, call
Expand Down Expand Up @@ -10497,6 +10602,19 @@ include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
grain system SVM allocations.
These fine grain system SVM pointers may be passed as arguments or
defined in SVM buffers that are passed as arguments to _kernel_.

ifdef::cl_ext_buffer_device_address[]
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}

include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
| {cl_mem_device_address_EXT_TYPE}[]
| Device pointers must reference locations contained entirely within
buffers that are passed to kernel as arguments, or that are passed
through the execution information. Non-argument device pointers accessed
by the kernel must be specified by passing pointers to those buffers
via this {clSetKernelExecInfo} option.
endif::cl_ext_buffer_device_address[]

|====

// refError
Expand All @@ -10506,7 +10624,13 @@ successfully.
Otherwise, it returns one of the following errors:

* {CL_INVALID_KERNEL} if _kernel_ is a not a valid kernel object.
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support SVM.
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_SVM_PTRS} if no devices in
the context associated with _kernel_ support SVM.
ifdef::cl_ext_buffer_device_address[]
* {CL_INVALID_OPERATION} for {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} if no
device in the context associated with _kernel_ support device pointers.
endif::cl_ext_buffer_device_address[]

* {CL_INVALID_VALUE} if _param_name_ is not valid, if _param_value_ is
`NULL` or if the size specified by _param_value_size_ is not valid.
* {CL_INVALID_OPERATION} if _param_name_ is
Expand Down
Loading

0 comments on commit 9ad04fb

Please sign in to comment.