- Creating an InferenceSession from an on-disk model file and a set of SessionOptions.
- Registering customized loggers.
- Registering customized allocators.
- Registering predefined providers and set the priority order. ONNXRuntime has a set of predefined execution providers, like CUDA, MKLDNN. User can register providers to their InferenceSession. The order of registration indicates the preference order as well.
- Running a model with inputs. These inputs must be in CPU memory, not GPU. If the model has multiple outputs, user can specify which outputs they want.
- Converting an in-memory ONNX Tensor encoded in protobuf format to a pointer that can be used as model input.
- Setting the thread pool size for each session.
- Setting graph optimization level for each session.
- Dynamically loading custom ops. Instructions
- Ability to load a model from a byte array. See
OrtCreateSessionFromArray
in onnxruntime_c_api.h.
- Include onnxruntime_c_api.h.
- Call OrtCreateEnv
- Create Session: OrtCreateSession(env, model_uri, nullptr,...)
- Optionally add more execution providers (e.g. for CUDA use OrtSessionOptionsAppendExecutionProvider_CUDA)
- Create Tensor
- OrtCreateAllocatorInfo
- OrtCreateTensorWithDataAsOrtValue
- OrtRun
The example below shows a sample run using the SqueezeNet model from ONNX model zoo, including dynamically reading model inputs, outputs, shape and type information, as well as running a sample vector and fetching the resulting class probabilities for inspection.