You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 19, 2024. It is now read-only.
A sycl-powered Python package is said to use "computation follows data" paradigm when its functions/methods infer the queue to which it submits kernel for execution based on (sycl::device, sycl::context) pairs associated with input USM data, encapsulated in sycl::queue.
Any ambiguity in such an inference process raises an error.
This ticket is to discuss a possible resolution for a scenario where offloaded computation is desired for host data (e.g. on C++ side this would correspond to using sycl::buffer to wrap host data for use by SYCL kernels, a real-case scenario of this workflow is likely the need to work on host data too large to fit into GPU memory in its entirety).
This scenario raises a question about how to semantically specify which sycl::queue will kernels be submitted to?
The proposed solution is to introduce target_offload(obj, queue=q) wrapper, so that semantics becomes
cls.fit(target_offload(X_host, queue=q), y_host)
The target_offload function will create a data-only class serving to associate the specified queue to the X_host object.
The responsibility is on authors of cls.fit to recognize such inputs, and infer the intent to offload from the arguments.
target_to_offload called on USMArray usm_ary with same queue keyword argument as usm_ary.queue simply returns the usm_ary itself.
target_to_offload called on usm_ary and queue different from usm_ary.queue raises an error, unless both queues have the same sycl::context in which case the proposed interpretation is that usm_ary pointer is to be used in kernels submitted to the specified queue (no explicit copy is needed).
# the following raises hard errorcls.fit(target_offload(X_cpu_usm_array, queue=gpu_queue), Y_cpu_usm_array)
# the next line is equivalent to cls.fit(X_cpu_usm_array, Y_cpu_usm_array)cls.fit(target_offload(X_cpu_usm_array, queue=cpu_queue), Y_cpu_usm_array)
# here queues X_usm_tile1.queue and q_tile2 have common multi-device contexttarget_offload(X_usm_tile1, queue=q_tile2)
The text was updated successfully, but these errors were encountered:
Is the result supposed to be on the device associated with gpu_queue?
Which package will contain this function? dpctl?
What about restrictions for the objects passed as the first argument to that function? Are they expected to be some data containers or it can be an arbitrary type?
Is the way specifying target device aligned with other API that allocates data on this device or transfers it to the device? All of these functions accept a queue parameter?
I would add also several cents on users ramp up and code transformation from host based to gpu enabled.
It should be in same time simple to understand and to convert code. So let's look from end to end perspective
Here is host only code
import numpy as np
from sklearn.cluster import DBSCAN
X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
Here is current implementation that use context loads operation to GPU. This assumes that X is host data and clustering would be results residing on GPU
import numpy as np
from daal4py.sklearn import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()
from sklearn.cluster import DBSCAN
X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
What would be full code example for new semantics? Because we have to use not only target_offload but also create a queue and we have to explain this to user - what he/she is doing
A sycl-powered Python package is said to use "computation follows data" paradigm when its functions/methods infer the queue to which it submits kernel for execution based on
(sycl::device, sycl::context)
pairs associated with input USM data, encapsulated insycl::queue
.Any ambiguity in such an inference process raises an error.
This ticket is to discuss a possible resolution for a scenario where offloaded computation is desired for host data (e.g. on C++ side this would correspond to using
sycl::buffer
to wrap host data for use by SYCL kernels, a real-case scenario of this workflow is likely the need to work on host data too large to fit into GPU memory in its entirety).This scenario raises a question about how to semantically specify which
sycl::queue
will kernels be submitted to?The proposed solution is to introduce
target_offload(obj, queue=q)
wrapper, so that semantics becomesThe
target_offload
function will create a data-only class serving to associate the specified queue to theX_host
object.The responsibility is on authors of
cls.fit
to recognize such inputs, and infer the intent to offload from the arguments.target_to_offload
called on USMArrayusm_ary
with samequeue
keyword argument asusm_ary.queue
simply returns theusm_ary
itself.target_to_offload
called onusm_ary
andqueue
different fromusm_ary.queue
raises an error, unless both queues have the samesycl::context
in which case the proposed interpretation is thatusm_ary
pointer is to be used in kernels submitted to the specified queue (no explicit copy is needed).The text was updated successfully, but these errors were encountered: