Edge AI Technology

How to Run AI on a Microcontroller or Microprocessor – A Beginner’s Guide to Edge AI

In the last few years, Edge AI has rapidly transformed how devices understand and interact with the world — without needing constant cloud access. From detecting voice commands on smartwatches to classifying images on drones, AI at the edge is revolutionizing embedded systems.

But how exactly do you run AI on a microcontroller (MCU) or microprocessor (MPU)? As a beginner, this can seem overwhelming. Don’t worry — this blog breaks it down into simple, digestible steps.

What is Edge AI?

Edge AI means running artificial intelligence algorithms locally on a device like an MCU or MPU, rather than sending data to a remote cloud server. This enables:

  • Faster responses (no internet delay)
  • Lower power consumption
  • Increased privacy (data stays on device)

Common applications include:

  • Wake-word detection (“Hey Alexa”)
  • Image classification (e.g., person vs object)
  • Anomaly detection in machines
  • Gesture recognition, etc.

Why Edge AI Matters ?

Imagine a construction site. Dust, noise, humans in helmets — and a camera watching for safety violations.
If that video had to go to the cloud for AI processing, it would take time, cost money, and raise privacy concerns.

So instead, we ask:

“Can we do the thinking right there, on the device?”
That’s Edge AI.

The i.MX93, with its efficient NPU, was made for exactly this.

Microcontroller vs Microprocessor: Which Brain to Choose?

Before you start building, you need to pick the right kind of hardware. Here’s how I learned the difference:

I started with microcontrollers (MCUs) — they’re simple, low-power, and perfect for turning on LEDs or reading sensor data. But when I tried to run an image classifier? Not a chance.

That’s when I moved to microprocessors (MPUs) like the i.MX93, which brought the power of Linux, better memory, and even a dedicated Neural Processing Unit (NPU).

Feature MCU (e.g. Cortex-M) MPU (e.g. Cortex-A / i.MX93)
Processing Power Low (MHz range) High (GHz range)
Memory Limited (KBs) Larger (MBs or GBs)
OS Support Bare metal / RTOS Full Linux support
AI Support TFLite Micro TFLite, ONNX, NPU via eIQ SDK
Use Case Low-power IoT, sensors Edge AI, vision, industrial control

How to Deploy a Machine Learning Model on Any MPU (Generalized Method)

Deploying a machine learning (ML) model on a Microprocessor Unit (MPU) — such as Raspberry Pi, NXP i.MX93, or NVIDIA Jetson — typically involves four key stages:


 1. Train the Model

  • Use standard ML frameworks like:

    • TensorFlow / Keras

    • PyTorch

    • Scikit-learn (for simpler models)

  • Train on a PC or cloud platform (e.g., Colab, AWS, Azure)

  • Export the model in a format such as:

    • .h5, .pb (TensorFlow)

    • .pt, .onnx (PyTorch)

    • .tflite (Lite model for edge)

This step is compute-heavy — never done on the MPU.

 2. Optimize & Convert the Model

  • MPUs are resource-constrained (limited RAM/compute), so you must optimize the model:

    • Convert to TensorFlow Lite, ONNX, or custom quantized formats

    • Reduce model size via:

      • Quantization (e.g., FP32 → INT8)

      • Pruning (removing unnecessary weights)

      • Knowledge distillation (if needed)

More deeper for Step 3 and Step 4:

https://chatgpt.com/share/688272e8-7a98-8000-8994-2bbba2f5fd74

 3. Deploy the Model to the MPU

  • Copy the model to the MPU using:

    • SD card

    • USB

    • SSH / SCP

  • Install necessary runtime libraries (e.g., TensorFlow Lite runtime, ONNX Runtime, OpenCV)

  • Configure OS packages or GPIO (e.g., access camera, relays, buzzer)


 4. Run Inference on the MPU

  • Write an inference script (Python, C++, etc.)

    • Load the model

    • Capture input (e.g., camera, sensor)

    • Preprocess the data

    • Run inference

    • Postprocess and act on output (e.g., turn on buzzer, display alert)

Congratulations — your model is now running offline, on edge, with real-time inference.

Now How to Deploy a ML Model on NXP i.MX93 MPU

The i.MX93 is an edge AI-enabled MPU from NXP with:

  • Arm Cortex-A55 (Dual-core)

    • Runs Linux (Yocto) OS

    • Handles AI workloads, networking, file systems

  • Arm Ethos-U65 NPU

    • Specialized Neural Processing Unit

    • Accelerates INT8 AI inference at the edge

  • Arm Cortex-M33 MCU

    • Dedicated real-time controller

    • Runs bare-metal or RTOS (FreeRTOS, Zephyr)

    • Handles deterministic tasks like sensor sampling, GPIO control, motor control, or low-latency signal processing

  • Supports TensorFlow Lite, ONNX, and eIQ Toolkit

  • Runs Yocto Linux (custom embedded OS)


There are two primary deployment methods:


Method 1: Using TensorFlow → TensorFlow Lite

This method is framework-agnostic and lets you use open-source tools.

 Steps:

  1. Train the model on PC (e.g., detect helmet):

    • Use TensorFlow/Keras

    • Save model in .h5 or .pb format

  2. Convert to TensorFlow Lite:

    • Use TFLiteConverter to convert to .tflite

    • Apply post-training quantization to reduce size:

      python
      converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model = converter.convert()
  3. Transfer to i.MX93:

    • Via SD card or SCP

    • Place .tflite model in a known directory

  4. Write inference script on i.MX93:

    • Use tflite-runtime or TensorFlow Lite Interpreter

    • Process camera frames

    • Run inference

    • Control GPIOs (like buzzers) via Linux

  5. Install Dependencies:

    • Manually install:

      • tflite-runtime

      • opencv-python

      • numpy

Pros and Cons

Pros Cons
Open-source & flexible Requires manual setup
Large community support Performance may be lower
Framework-agnostic Less optimized for NXP hardware

 Method 2: Using NXP’s eIQ Toolkit (Recommended for i.MX)

This is NXP’s official AI deployment framework for i.MX devices. It simplifies optimization and deployment by leveraging hardware-aware tools.

 Steps:

  1. Train your model:

    • Model was trained using NXP’s eIQ Toolkit, which provides:

      • GUI- and CLI-based model training and management

      • Support for importing custom datasets

      • Built-in training pipelines using TensorFlow backend

    • You can also use:

      • Pre-trained models

      • Import models in formats like .tflite, .onnx, or .pb directly into eIQ

  2. Import into eIQ Toolkit (GUI or CLI):

    Once training is complete, optimize the model specifically for the Ethos-U65 NPU in i.MX93:

    • Apply INT8 quantization (reduces model size and speeds up inference)

    • Perform compiler optimizations for the NPU

    • Run model validation within eIQ to verify accuracy and compatibility

  3. Deploy to i.MX93:

    Once your model is optimized using eIQ Toolkit, you have two deployment options:


    🔹 Method 1: Direct Deployment via eIQ Toolkit

    • Use NXP’s Yocto Linux image with eIQ SDK pre-installed

    • Transfer the optimized .eiq model file to the i.MX93 using:

      • SCP (Secure Copy Protocol over SSH), or

      • USB connection / Ethernet, depending on setup

    • This is a faster and more integrated method, requiring no intermediate storage like SD cards


    🔹 Method 2: TFLite Export & SD Card Transfer

    • Export your trained model in TensorFlow Lite (.tflite) format

    • Store the .tflite file onto an SD card

    • Insert the SD card into the i.MX93’s SD card reader

    • From there, you can load and run the model using TensorFlow Lite interpreter or convert it inside the device

    • Capture real-time images from a USB or MIPI camera

    • Run inference on the Ethos-U65 NPU

    • Trigger GPIO outputs (buzzer, LEDs, etc.) using built-in libraries

      Run Inference with eIQ Runtime:

      Use eIQ Runtime APIs (C++ or Python) on the i.MX93 to:

       

This gives flexibility based on development preference:

  • Use Method 1 for seamless integration with the eIQ flow

  • Use Method 2 if you’re working in a TensorFlow-first environment or prototyping offline

Pros and Cons

Pros Cons
Highly optimized for i.MX93 Limited to NXP boards
Easy deployment pipeline May lock you into eIQ ecosystem
GUI tools for optimization Smaller community than TensorFlow

Summary Table: TF Lite vs. eIQ Toolkit on i.MX93

Feature TensorFlow → TFLite NXP eIQ Toolkit
Model Format .tflite .eiq
Optimization Manual (quantization/pruning) Auto, HW-aware
GPIO & Camera Manual via Linux APIs Integrated in eIQ runtime
Hardware Acceleration Limited (CPU) Uses Ethos-U65 NPU
Difficulty Medium to High Beginner-friendly
Best For Flexibility, custom code Fast deployment on NXP MPUs

If you’re deploying specifically on NXP i.MX93, using eIQ Toolkit is the faster, more efficient, and production-ready option — especially when targeting real-time inference on low power.

Author

Mridul Bajaj

Leave a comment