Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example code producing memory access fault for array size 1024*8, works for 1023*8 #510

Open
chopikus opened this issue Jul 25, 2024 · 5 comments
Assignees
Labels
bug Something isn't working OpenCL

Comments

@chopikus
Copy link

chopikus commented Jul 25, 2024

Describe the bug

Running the example program produces an error for big enough arrays.

Program:

public class App 
{
    public static void parallelInitialization(VectorFloat8 data) {
        for (@Parallel int i = 0; i < data.size(); i++) {
            int j = i * 8;
            data.set(i, new Float8(j, j + 1, j + 2, j + 3, j + 4 , j + 5 , j + 6, j + 7));
        }
    }

    public static void computeSquare(VectorFloat8 data) {
        for (@Parallel int i = 0; i < data.size(); i++) {
            Float8 item = data.get(i);
            Float8 result = Float8.mult(item, item);
            data.set(i, result);
        }
    }

    public static void main( String[] args ) {
        VectorFloat8 array = new VectorFloat8(1024 * 8);
        TaskGraph taskGraph = new TaskGraph("s0")
                .transferToDevice(DataTransferMode.EVERY_EXECUTION, array)
                .task("t0", App::parallelInitialization, array)
                .task("t1", App::computeSquare, array)
                .transferToHost(DataTransferMode.EVERY_EXECUTION, array);

        TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(taskGraph.snapshot());

        // Obtain a device from the list
        TornadoDevice device = TornadoExecutionPlan.getDevice(0, 0);
        executionPlan.withDevice(device);

        // Put in a loop to analyze hotspots with Intel VTune (as a demo)
        for (int i = 0; i < 1000; i++ ) {
            // Execute the application
            executionPlan.execute();
        }
    }
}

Running mvn package and tornado -jar [JARFILE] produces an error:

Memory access fault by GPU node-1 (Agent handle: 0x7fe80076f230) on address 0x7fe614c00000. Reason: Page not present or supervisor privilege.

However if I change the size of array to 1023*8 instead of 1024*8 the error is gone.

How To Reproduce

I put my code into a repository: https://github.com/chopikus/my-tornado-app.

Steps:

  1. git clone https://github.com/chopikus/my-tornado-app.git
  2. cd my-tornado-app
  3. ./run.sh

Expected behavior

No errors should be produced

Computing system setup (please complete the following information):

  • Fedora 40
  • ROCm runtime version 1.13
  • Radeon 680M GPU on Ryzen 7 PRO 6850U
  • tornado --version: version=1.0.7-dev, branch=master, commit=96b3040; Backends installed: opencl
  • tornado -version: java version "21.0.4" 2024-07-16 LTS; Java(TM) SE Runtime Environment (build 21.0.4+8-LTS-274); Java HotSpot(TM) 64-Bit Server VM (build 21.0.4+8-LTS-274, mixed mode)

Additional context

tornado --devices:

WARNING: Using incubator modules: jdk.incubator.vector

Number of Tornado drivers: 1
Driver: OpenCL
  Total number of OpenCL devices  : 1
  Tornado device=0:0  (DEFAULT)
	OPENCL --  [AMD Accelerated Parallel Processing] -- gfx1035
		Global Memory Size: 4.0 GB
		Local Memory Size: 64.0 KB
		Workgroup Dimensions: 3
		Total Number of Block Threads: [256]
		Max WorkGroup Configuration: [1024, 1024, 1024]
		Device OpenCL C version: OpenCL C 2.0
@chopikus chopikus changed the title Example code not working for array size 1024*8, works for 1023*8 Example code producing memory access fault for array size 1024*8, works for 1023*8 Jul 25, 2024
@jjfumero
Copy link
Member

Hi @chopikus . Thank you for the detailed report.

I can reproduce the error also for NVIDIA GPUs. The problem is in the clBuildProgram, once the code is generated. However, running on Intel GPUs with OpenCL works. It also works for the SPIR-V and PTX backends. We will take a look and analyze why NVIDIA and AMD are reporting a clBuildProgram failure.

@jjfumero
Copy link
Member

For reference, I added this test in this branch: 87e6080

@jjfumero
Copy link
Member

jjfumero commented Jul 26, 2024 via email

@andrii0lomakin
Copy link
Contributor

Good day.

I hope you excuse me if I add my 5 cents by asking.

Recently, AMD released a new NPU, which will be supported by Xilinx RT and, in turn, will work over OpenCL. If I got it correctly, one API supports OpenCL too, so will not make the SPIR-V default approach narrow down the usage possibilities of TornadoVM?

@jjfumero
Copy link
Member

Hi @andrii0lomakin ,
OpenCL >= 2.1 can dispatch SPIR-V binary kernels. In fact, TornadoVM currently dispatches SPIR-V with both, OpenCL runtime and Level Zero API from oneAPI. We hope vendors in the future use more SPIR-V. From my view, the way to go is SPIR-V and PTX. However, debugging the compiler gets increasingly complex.

As it is now the vendors/accelerators landscape, it is difficult to deprecate our OpenCL C backend. Just a few examples: FPGA vendors support OpenCL 1.0 - 1.2. Apple supports OpenCL 1.2. Thus, if TornadoVM wants to run also on those platforms, the OpenCL C is still needed. Unless, of course, there are new backends (e.g., for VHDL directly, Apple Metal, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working OpenCL
Projects
Status: No status
Development

No branches or pull requests

4 participants