Kneron

K-Surgeon: Visual Model Editor

Open Source

K-Surgeon

Eliminate the guesswork of model deployment. Visualize, debug, and optimize neural networks for Kneron NPUs in a unified graphical interface.

GitHub Repo
Supported Frameworks
PyTorchTensorFlowONNX
Quick Stats
  • Latest Versionv2.4.0
  • Supported Operators142
  • LicenseApache 2.0
K-Surgeon Interface

The "Black Box" Problem

Prior to K-Surgeon, optimizing a model for NPU deployment meant wrestling with cryptic error logs and manual Python scripts like editor.py.

Developers had to guess which layer was causing a compilation failure, often leading to a trial-and-error loop that could take days.

Error: Op type 'GridSample' not supported in domain 'ai.onnx'

The Visual Solution

K-Surgeon brings your model graph to life. Based on the industry-standard Netron viewer, it adds a Kneron-specific intelligence layer.

  • Instant Validation: Unsupported nodes light up in red immediately upon import.
  • Smart Remediation: Right-click a broken node to see available fixes (Decompose, CPU Offload).
  • NPU Partitioning: Visualize exactly which parts of your graph run on the NPU vs the Host CPU.

Advanced Capabilities

Tools designed for the KL520, KL720, and the new KL530.

Quantization Preview

Crucial for the KL530's INT4 support.

Simulate the effects of quantization before you compile. K-Surgeon runs a "shadow inference" using FP32 and INT8/4 weights side-by-side, calculating the KL-Divergence drift for every layer.


  • Layer-wise Accuracy Heatmap
  • Mixed Precision Recommendations

Graph Surgery

Replace `editor.py` with drag-and-drop.

Modify your ONNX graph structure without writing a single line of code. Fuse redundant layers, cut out debug heads, or inject custom NPU operations directly in the canvas.


  • Fuse BN/ReLU
  • Channel Pruning Helper

Hardware Constraints

Know your limits.

Every chip has limits (SRAM size, max channels). K-Surgeon checks your model against the specific hardware specs of your target device (e.g., KL520 vs KL720) to prevent runtime OOM errors.


  • SRAM Usage Estimator
  • MAC Efficiency Score

From Training to Deployment

1. Import

Load ONNX/TFLite model from PyTorch or TensorFlow.

2. Analyze

Validate operators and check quantization drift.

3. Optimize

Apply automated fixes and fuse layers.

4. Export

Generate compiled .nef binary.

Interactive Demo

Experience K-Surgeon in Browser

See how easy it is to fix an unsupported "GridSample" layer in a ResNet model.

Launch Live Mockup