2 releases
new 0.9.1 | Jan 30, 2025 |
---|---|
0.9.0 | Jan 30, 2025 |
#2 in #ort
16KB
121 lines
🧩 ORP: a Lightweight Framework for Building ONNX Runtime Pipelines with ORT
💬 Introduction
orp
is a lightweight framework designed to simplify the creation and execution of ONNX Runtime Pipelines. Built on top of ort
, it provides an simple way to handle data pre- and post-processing, chain multiple ONNX models together, while encouraging code reuse and clarity.
🔨 Sample Use-Cases
gline-rs
: an inference engine for GLiNER models- more to come...
⚡️ GPU/NPU Inferences
The execution providers available in ort
can be leveraged to perform considerably faster inferences on GPU/NPU hardware.
The first step is to pass the appropriate execution providers in RuntimeParameters
. For example:
let rtp = RuntimeParameters::default().with_execution_providers([
CUDAExecutionProvider::default().build()
]);
The second step is to activate the appropriate features (see related section below), otherwise ir may silently fall-back to CPU. For example:
$ cargo run --features=cuda ...
Please refer to doc/ORT.md
for details about execution providers.
📦 Crate Features
This create mirrors the following ort
features:
- To allow for dynamic loading of ONNX-runtime libraries:
load-dynamic
- To allow for activation of execution providers:
cuda
,tensorrt
,directml
,coreml
,rocm
,openvino
,onednn
,xnnpack
,qnn
,cann
,nnapi
,tvm
,acl
,armnn
,migraphx
,vitis
, andrknpu
.
⚙️ Dependencies
ort
: the ONNX runtime wrappercomposable
: this crate is used to actually define the pre- and post-processing pipelines by composition or elementary steps, and can in turn be used to combine mutliple pipelines.
Dependencies
~2.5–7.5MB
~54K SLoC