Discussion: Offload semantics on host data #2

oleksandr-pavlyk · 2021-03-23T18:37:30Z

A sycl-powered Python package is said to use "computation follows data" paradigm when its functions/methods infer the queue to which it submits kernel for execution based on (sycl::device, sycl::context) pairs associated with input USM data, encapsulated in sycl::queue.

Any ambiguity in such an inference process raises an error.

This ticket is to discuss a possible resolution for a scenario where offloaded computation is desired for host data (e.g. on C++ side this would correspond to using sycl::buffer to wrap host data for use by SYCL kernels, a real-case scenario of this workflow is likely the need to work on host data too large to fit into GPU memory in its entirety).

This scenario raises a question about how to semantically specify which sycl::queue will kernels be submitted to?

The proposed solution is to introduce target_offload(obj, queue=q) wrapper, so that semantics becomes

cls.fit(target_offload(X_host, queue=q), y_host)

The target_offload function will create a data-only class serving to associate the specified queue to the X_host object.

class DataWithQueue:
    cdef object base # 
    cdef SyclQueue q

def target_offload(host_obj, queue=q):
    return DataWithQueue(host_obj, queue=q)

The responsibility is on authors of cls.fit to recognize such inputs, and infer the intent to offload from the arguments.

target_to_offload called on USMArray usm_ary with same queue keyword argument as usm_ary.queue simply returns the usm_ary itself.

target_to_offload called on usm_ary and queue different from usm_ary.queue raises an error, unless both queues have the same sycl::context in which case the proposed interpretation is that usm_ary pointer is to be used in kernels submitted to the specified queue (no explicit copy is needed).

# the following raises hard error
cls.fit(target_offload(X_cpu_usm_array, queue=gpu_queue), Y_cpu_usm_array)

# the next line is equivalent to cls.fit(X_cpu_usm_array, Y_cpu_usm_array)
cls.fit(target_offload(X_cpu_usm_array, queue=cpu_queue), Y_cpu_usm_array)

# here queues X_usm_tile1.queue and q_tile2 have common multi-device context
target_offload(X_usm_tile1, queue=q_tile2)

The text was updated successfully, but these errors were encountered:

michael-smirnov · 2021-03-24T13:15:43Z

What about data location of results for such a call:

result = foo(target_offload(host_data, queue=gpu_queue)

Is the result supposed to be on the device associated with gpu_queue?

Which package will contain this function? dpctl?
What about restrictions for the objects passed as the first argument to that function? Are they expected to be some data containers or it can be an arbitrary type?
Is the way specifying target device aligned with other API that allocates data on this device or transfers it to the device? All of these functions accept a queue parameter?

napetrov · 2021-03-25T11:42:58Z

I would add also several cents on users ramp up and code transformation from host based to gpu enabled.

It should be in same time simple to understand and to convert code. So let's look from end to end perspective
Here is host only code

import numpy as np

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Here is current implementation that use context loads operation to GPU. This assumes that X is host data and clustering would be results residing on GPU

import numpy as np
from daal4py.sklearn import patch_sklearn
from daal4py.oneapi import sycl_context
patch_sklearn()

from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
              [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with sycl_context("gpu"):
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)

What would be full code example for new semantics? Because we have to use not only target_offload but also create a queue and we have to explain this to user - what he/she is doing

oleksandr-pavlyk assigned wahbahdoo, diptorupd, shssf, DrTodd13, fschlimb, AlexanderKalistratov and michael-smirnov Mar 23, 2021

DrTodd13 mentioned this issue May 11, 2021

__partitioned__ protocol for partitioned and distributed data containers #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Offload semantics on host data #2

Discussion: Offload semantics on host data #2

oleksandr-pavlyk commented Mar 23, 2021 •

edited by diptorupd

Loading

michael-smirnov commented Mar 24, 2021

napetrov commented Mar 25, 2021 •

edited

Loading

Discussion: Offload semantics on host data #2

Discussion: Offload semantics on host data #2

Comments

oleksandr-pavlyk commented Mar 23, 2021 • edited by diptorupd Loading

michael-smirnov commented Mar 24, 2021

napetrov commented Mar 25, 2021 • edited Loading

oleksandr-pavlyk commented Mar 23, 2021 •

edited by diptorupd

Loading

napetrov commented Mar 25, 2021 •

edited

Loading