Kneron NPUs require models to be in the NEF (NPU Executable Format) binary format. This guide explains how to convert standard ONNX models into NEF.
The Conversion Pipeline
ONNX Model
Optimize
Quantize
Compile
NEF File
1. Prepare Your ONNX Model
Ensure your model uses only supported ONNX operators.
Warning: Dynamic axes are not supported. Please export your model with fixed input dimensions (e.g., 1x3x224x224).
PyTorch Export Example
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "model.onnx",
opset_version=11,
input_names=['input'],
output_names=['output'])2. Run the Optimizer
The onnx2onnx tool simplifies the graph and folds constants.
kneron optimize input_model.onnx -o optimized.onnx3. Quantization (Calibration)
Kneron NPUs use INT8 precision. You need a dataset of ~100 images to calibrate the quantization range.
kneron quantize optimized.onnx --dataset ./calibration_images --out quantized.onnx4. Compile to NEF
Finally, compile the quantized model for your specific target device (e.g., KL720).
kneron compile quantized.onnx --target KL720 --out model.nef