Skip to content

Latest commit

 

History

History
202 lines (153 loc) · 10.9 KB

README.md

File metadata and controls

202 lines (153 loc) · 10.9 KB

Nuclei AI Library

Nuclei AI Library is a set of ONNX AI operators optimized for Nuclei RISC-V Processors which support RISC-V Vector Instruction Set.

We implemented the ONNX Operators in pure c code, and also provided RISC-V Vector optimized implementation, see source code located in src for details.

We also provided test code to evaluate the ONNX operators implemention, which can be evaluated with Nuclei SDK.

Supported ONNX Operators

Some operator implementation may only support subset of the ONNX operator.

VPU Lite: VPU Lite is a lightweight VPU implementation, which don't support following features in whole or in part:

  • segment load/store
  • vslide/vgather/vcompress
  • ELEN=64

In the chart below, VPU Lite Compatibility illustrates the degree to which each operator is compatible with VPU Lite. The symbol indicates that the operator is fully compatible with VPU Lite. In instances where compatibility is not achieved, the chart will illustrates the reasons why the operator is not compatible.

Operator VPU Lite compatibility FP32 FP16 BF16 FP8 INT32 INT8 INT4 Boolean
Abs × × ×
Add × × × ×
BatchNormalization × × × × ×
Clamp × × ×
Concat × × ×
ConvInteger invoke segment load × × × × × ×
Cos × × × × ×
Div × × × × ×
Elu × × × × ×
Erf × × × × × × ×
Flip × × ×
GatherElements × × ×
Gelu × × × × × × ×
LayerNormalization × × × × ×
Log × × × × ×
MatMul × × × ×
Mul × × × ×
Negate × × ×
Pad × × ×
Pow × × × × ×
Reciprocal × × × × ×
ReduceAll × × × × × × ×
ReduceAny × × × × × × ×
ReduceMax × × ×
ReduceMin × × ×
ReduceProd invoke vslide × × ×
ReduceSum × × ×
Relu × × ×
RMSNormalization × × × × ×
Rsqrt × × × × ×
ScatterElements × × ×
Silu × × × × ×
Sin × × × × ×
Slice × × ×
Softmax × × × × ×
Sqrt × × × × ×
Sub × × × ×
Tile × × ×
TopK invoke vslide × × × ×

File Structure

Directory Description
src Source files, operators implementation, each file corresponds to one operator
inc Header files, operators declaration
test Test files, each file corresponds to one kind of operators(except main.c)

How to Use

Prerequests

We recommend utilizing the latest version of the Nuclei SDK and associated toolchain for optimal performance and compatibility. For this project we use the following versions:

Please adhere to the instructions outlined in the Setup Tools and Environment section to properly prepare your Nuclei SDK and toolchain for use. Both Linux and Windows operating systems are supported, for the purpose of example, we will demonstrate the process using the Ubuntu 20.04 Linux operating system.

It is recommended to setup NUCLEI_SDK_ROOT environment variable to point to /path/to/nuclei-sdk.

export NUCLEI_SDK_ROOT=/path/to/nuclei-sdk

After that, no matter where this project located in, you can run make to build and run the test program.

Otherwise, you should place this project in the directory of $NUCLEI_SDK_ROOT/application/baremetal

# if you have cloned this project to your local directory
mv /path/to/nuclei-ai-library /path/to/nuclei-sdk/application/baremetal
# if you havn't cloned this project to your local directory
git clone -b develop https://github.com/Nuclei-Software/nuclei-ai-library.git /path/to/nuclei-sdk/application/baremetal/nuclei-ai-library

After that, the files should organized as follows:

$NUCLEI_SDK_ROOT
├── application
│   ├── baremetal
│   │   ├── nuclei-ai-library
│   │   │   ├── ci
│   │   │   ├── evalsoc.ld
│   │   │   ├── inc
│   │   │   ├── Makefile
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   └── test
│   │   │   ...

Build

To build the test program for rv64, run the following command:

cd /path/to/nuclei-ai-library
make CORE=nx900fd ARCH_EXT=v_zfh_zvfh all

When not specify CORE and ARCH_EXT,the CORE=nx900fd and ARCH_EXT=v_zfh_zvfh will be used as default.

If you want to specify CORE and ARCH_EXT to build for rv32,you can run the following command:

make CORE=n900f ARCH_EXT=_zfh_zvfh_zve32f all

After make, the binary file ailib_bench.elf will be generated in the root directory of this project.

Run Test

Test on QEMU

To run the test program with QEMU, run the following command:

# run test on qemu for rv64
make CORE=nx900fd ARCH_EXT=v_zfh_zvfh SIMU=qemu clean all run_qemu
# run test on qemu for rv32
make CORE=n900f ARCH_EXT=_zfh_zvfh_zve32f SIMU=qemu clean all run_qemu

These command will rebuild the test program with SIMU=qemu,and run the test program on QEMU after build. When SIMU=qemu is specified, QEMU will automatically terminate upon the completion of the test. In other cases, you will need to press CTRL+C to manually exit QEMU once the test is completed.

Test on Hardware

Check Binary. To run the test program with hardware, SIMU=qemu is not allowed. You'd better run make clean and rebuild your binary file without SIMU=qemu before running.

Check Hardware. The hardware should meet the following requirements:

  • 1024kB ilm and 1024kB dlm
  • support v extension (rv64) or _zve32f extension (rv32)
  • support _zfh extension
  • support _zvfh extension

When the hardware has connected to your host locally, you can run the following command:

# when the hardware is rv64
make CORE=nx900fd ARCH_EXT=v_zfh_zvfh clean all upload
# when the hardware is rv32
make CORE=n900f ARCH_EXT=_zfh_zvfh_zve32f clean all upload

To lean more details about run applications on hardware please refer to Build, Run and Debug Sample Application section in Nuclei SDK documentation.

Test Results

No matter how you run the test program, the test results will be shown in the terminal like this:

...
CSV, Tile_float32_axis0, 5064
CSV, Tile_float32_rvv_axis0, 1548
CSV, Tile_float32_axis1, 7066
CSV, Tile_float32_rvv_axis1, 1286
CSV, Tile_float32_bothaxes, 9638
CSV, Tile_float32_rvv_bothaxes, 2380
CSV, Topk_int32, 116255
CSV, Topk_int32_rvv, 84587
All test done!
-------------
All tests passed!

Each line starting with CSV corresponds to a test case and adheres to the CSV format. Following the CSV, is the name of a specific test case, with the final number indicating the number of cycles consumed for that test.

Test Case Naming Rules: <Operator>_<DataType>[_rvv][_CaseName]

  • Operator(required): The name of the ONNX operator.
  • DataType(required): The data type of the input and output.
  • _rvv(optional): If the operator is optimized with RISC-V vector extension.
  • _CaseName(optional): The name of the subdivided test cases.

Reference