How to Run AI on a Microcontroller or Microprocessor – A Beginner’s Guide to Edge AI
In the last few years, Edge AI has rapidly transformed how devices understand and interact with the world — without needing constant cloud access. From detecting voice commands on smartwatches to classifying images on drones, AI at the edge is revolutionizing embedded systems.
But how exactly do you run AI on a microcontroller (MCU) or microprocessor (MPU)? As a beginner, this can seem overwhelming. Don’t worry — this blog breaks it down into simple, digestible steps.
What is Edge AI?
Edge AI means running artificial intelligence algorithms locally on a device like an MCU or MPU, rather than sending data to a remote cloud server. This enables:
- Faster responses (no internet delay)
- Lower power consumption
- Increased privacy (data stays on device)
Common applications include:
- Wake-word detection (“Hey Alexa”)
- Image classification (e.g., person vs object)
- Anomaly detection in machines
- Gesture recognition, etc.
Why Edge AI Matters ?
Imagine a construction site. Dust, noise, humans in helmets — and a camera watching for safety violations.
If that video had to go to the cloud for AI processing, it would take time, cost money, and raise privacy concerns.
So instead, we ask:
“Can we do the thinking right there, on the device?”
That’s Edge AI.
The i.MX93, with its efficient NPU, was made for exactly this.
Microcontroller vs Microprocessor: Which Brain to Choose?
Before you start building, you need to pick the right kind of hardware. Here’s how I learned the difference:
I started with microcontrollers (MCUs) — they’re simple, low-power, and perfect for turning on LEDs or reading sensor data. But when I tried to run an image classifier? Not a chance.
That’s when I moved to microprocessors (MPUs) like the i.MX93, which brought the power of Linux, better memory, and even a dedicated Neural Processing Unit (NPU).
Feature | MCU (e.g. Cortex-M) | MPU (e.g. Cortex-A / i.MX93) |
---|---|---|
Processing Power | Low (MHz range) | High (GHz range) |
Memory | Limited (KBs) | Larger (MBs or GBs) |
OS Support | Bare metal / RTOS | Full Linux support |
AI Support | TFLite Micro | TFLite, ONNX, NPU via eIQ SDK |
Use Case | Low-power IoT, sensors | Edge AI, vision, industrial control |
How to Deploy a Machine Learning Model on Any MPU (Generalized Method)
Deploying a machine learning (ML) model on a Microprocessor Unit (MPU) — such as Raspberry Pi, NXP i.MX93, or NVIDIA Jetson — typically involves four key stages:
1. Train the Model
-
Use standard ML frameworks like:
-
TensorFlow / Keras
-
PyTorch
-
Scikit-learn (for simpler models)
-
-
Train on a PC or cloud platform (e.g., Colab, AWS, Azure)
-
Export the model in a format such as:
-
.h5
,.pb
(TensorFlow) -
.pt
,.onnx
(PyTorch) -
.tflite
(Lite model for edge)
-
This step is compute-heavy — never done on the MPU.
2. Optimize & Convert the Model
-
MPUs are resource-constrained (limited RAM/compute), so you must optimize the model:
-
Convert to TensorFlow Lite, ONNX, or custom quantized formats
-
Reduce model size via:
-
Quantization (e.g., FP32 → INT8)
-
Pruning (removing unnecessary weights)
-
Knowledge distillation (if needed)
-
-
More deeper for Step 3 and Step 4:
https://chatgpt.com/share/688272e8-7a98-8000-8994-2bbba2f5fd74
3. Deploy the Model to the MPU
-
Copy the model to the MPU using:
-
SD card
-
USB
-
SSH / SCP
-
-
Install necessary runtime libraries (e.g., TensorFlow Lite runtime, ONNX Runtime, OpenCV)
-
Configure OS packages or GPIO (e.g., access camera, relays, buzzer)
4. Run Inference on the MPU
-
Write an inference script (Python, C++, etc.)
-
Load the model
-
Capture input (e.g., camera, sensor)
-
Preprocess the data
-
Run inference
-
Postprocess and act on output (e.g., turn on buzzer, display alert)
-
Congratulations — your model is now running offline, on edge, with real-time inference.
Now How to Deploy a ML Model on NXP i.MX93 MPU
The i.MX93 is an edge AI-enabled MPU from NXP with:
-
Arm Cortex-A55 (Dual-core)
-
Runs Linux (Yocto) OS
-
Handles AI workloads, networking, file systems
-
-
Arm Ethos-U65 NPU
-
Specialized Neural Processing Unit
-
Accelerates INT8 AI inference at the edge
-
-
Arm Cortex-M33 MCU
-
Dedicated real-time controller
-
Runs bare-metal or RTOS (FreeRTOS, Zephyr)
-
Handles deterministic tasks like sensor sampling, GPIO control, motor control, or low-latency signal processing
-
-
Supports TensorFlow Lite, ONNX, and eIQ Toolkit
-
Runs Yocto Linux (custom embedded OS)
There are two primary deployment methods:
Method 1: Using TensorFlow → TensorFlow Lite
This method is framework-agnostic and lets you use open-source tools.
Steps:
-
Train the model on PC (e.g., detect helmet):
-
Use TensorFlow/Keras
-
Save model in
.h5
or.pb
format
-
-
Convert to TensorFlow Lite:
-
Use
TFLiteConverter
to convert to.tflite
-
Apply post-training quantization to reduce size:
pythonconverter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model = converter.convert()
-
-
Transfer to i.MX93:
-
Via SD card or SCP
-
Place
.tflite
model in a known directory
-
-
Write inference script on i.MX93:
-
Use
tflite-runtime
orTensorFlow Lite Interpreter
-
Process camera frames
-
Run inference
-
Control GPIOs (like buzzers) via Linux
-
-
Install Dependencies:
-
Manually install:
-
tflite-runtime
-
opencv-python
-
numpy
-
-
Pros and Cons
Pros | Cons |
---|---|
Open-source & flexible | Requires manual setup |
Large community support | Performance may be lower |
Framework-agnostic | Less optimized for NXP hardware |
Method 2: Using NXP’s eIQ Toolkit (Recommended for i.MX)
This is NXP’s official AI deployment framework for i.MX devices. It simplifies optimization and deployment by leveraging hardware-aware tools.
Steps:
-
Train your model:
-
Model was trained using NXP’s eIQ Toolkit, which provides:
-
GUI- and CLI-based model training and management
-
Support for importing custom datasets
-
Built-in training pipelines using TensorFlow backend
-
-
You can also use:
-
Pre-trained models
-
Import models in formats like
.tflite
,.onnx
, or.pb
directly into eIQ
-
-
-
Import into eIQ Toolkit (GUI or CLI):
Once training is complete, optimize the model specifically for the Ethos-U65 NPU in i.MX93:
-
Apply INT8 quantization (reduces model size and speeds up inference)
-
Perform compiler optimizations for the NPU
-
Run model validation within eIQ to verify accuracy and compatibility
-
-
Deploy to i.MX93:
Once your model is optimized using eIQ Toolkit, you have two deployment options:
🔹 Method 1: Direct Deployment via eIQ Toolkit
-
Use NXP’s Yocto Linux image with eIQ SDK pre-installed
-
Transfer the optimized
.eiq
model file to the i.MX93 using:-
SCP (Secure Copy Protocol over SSH), or
-
USB connection / Ethernet, depending on setup
-
-
This is a faster and more integrated method, requiring no intermediate storage like SD cards
🔹 Method 2: TFLite Export & SD Card Transfer
-
Export your trained model in TensorFlow Lite (.tflite) format
-
Store the
.tflite
file onto an SD card -
Insert the SD card into the i.MX93’s SD card reader
-
From there, you can load and run the model using TensorFlow Lite interpreter or convert it inside the device
-
-
-
Capture real-time images from a USB or MIPI camera
-
Run inference on the Ethos-U65 NPU
-
Trigger GPIO outputs (buzzer, LEDs, etc.) using built-in libraries
Run Inference with eIQ Runtime:
Use eIQ Runtime APIs (C++ or Python) on the i.MX93 to:
-
This gives flexibility based on development preference:
-
Use Method 1 for seamless integration with the eIQ flow
-
Use Method 2 if you’re working in a TensorFlow-first environment or prototyping offline
Pros and Cons
Pros | Cons |
---|---|
Highly optimized for i.MX93 | Limited to NXP boards |
Easy deployment pipeline | May lock you into eIQ ecosystem |
GUI tools for optimization | Smaller community than TensorFlow |
Summary Table: TF Lite vs. eIQ Toolkit on i.MX93
Feature | TensorFlow → TFLite | NXP eIQ Toolkit |
---|---|---|
Model Format | .tflite |
.eiq |
Optimization | Manual (quantization/pruning) | Auto, HW-aware |
GPIO & Camera | Manual via Linux APIs | Integrated in eIQ runtime |
Hardware Acceleration | Limited (CPU) | Uses Ethos-U65 NPU |
Difficulty | Medium to High | Beginner-friendly |
Best For | Flexibility, custom code | Fast deployment on NXP MPUs |
If you’re deploying specifically on NXP i.MX93, using eIQ Toolkit is the faster, more efficient, and production-ready option — especially when targeting real-time inference on low power.
Author