Skip to main content
Back to Case Studies
Semiconductors / AI Hardware / Consumer Electronics
|Consumer Electronics Semiconductor Company|
24 months
18 engineers

RISC-V Custom Silicon: 10x Energy-Efficient AI Accelerator Enabling Always-On Intelligence in Battery-Powered Devices

Design and tape-out of a custom RISC-V based AI accelerator SoC achieving 10 TOPS/W efficiency for TinyML workloads, enabling always-on voice, vision, and sensor fusion in battery-powered wearables and IoT devices.

10x
Energy Efficiency vs Competitors
10 TOPS/W
AI Performance per Watt
< 5mW
Always-On Power Consumption
12nm
Process Node
RISC-V Custom Silicon: 10x Energy-Efficient AI Accelerator Enabling Always-On Intelligence in Battery-Powered Devices - Rapid Circuitry embedded systems case study hero image

The Challenge

A consumer electronics company developing next-generation smart wearables and hearables needed a custom AI accelerator that could run sophisticated ML models continuously while maintaining week-long battery life on a coin cell battery.

Power Budget Constraints

Existing AI accelerators consumed 50-200mW for inference, far exceeding the 5mW budget required for always-on operation in small batteries. Duty cycling reduced user experience quality.

Impact: Target: <5mW continuous inference

Vendor Lock-in

Available solutions from major vendors came with restrictive licensing, high royalties per unit, and limited customization options. The client needed full IP ownership for differentiation.

Impact: 15-20% cost in licensing fees

Model Flexibility

Fixed-function accelerators couldn't adapt to evolving ML models. The client needed to update algorithms post-deployment without hardware changes for competitive advantage.

Impact: 6-month hardware refresh cycles

Integration Complexity

Off-the-shelf solutions required external memory, PMICs, and supporting chips, increasing BOM cost, PCB area, and power consumption for their compact form factor.

Impact: 4-chip solution increasing size by 3x

Our Solution

We designed a fully custom RISC-V based SoC with an integrated neural processing unit (NPU) optimized for TinyML workloads, featuring aggressive power gating, in-memory computing elements, and a flexible dataflow architecture.

System Architecture

Heterogeneous architecture combining a RISC-V application processor with custom neural accelerator blocks and comprehensive power management.

Application Processor

  • Dual-core RISC-V RV32IMC (custom microarchitecture)
  • 16KB I-cache, 16KB D-cache per core
  • Hardware floating-point unit
  • Custom DSP extensions for signal processing
  • Secure boot and hardware root of trust

Neural Processing Unit

  • 256 MAC units in systolic array
  • Support for INT4/INT8/INT16 precision
  • On-chip SRAM with in-memory computing
  • Flexible dataflow (weight/output stationary)
  • Hardware activation functions (ReLU, Sigmoid, Softmax)

Memory Subsystem

  • 1MB unified on-chip SRAM
  • Intelligent memory controller with compression
  • 4MB external QSPI flash interface
  • DMA engine for zero-copy data movement
  • Memory protection unit for security

Sensor Hub & I/O

  • Always-on sensor processor (separate power domain)
  • PDM microphone interface (up to 4 channels)
  • I2S for audio codec
  • SPI/I2C/UART for sensors
  • 12-bit ADC for analog sensors

Power Management

  • Integrated PMIC with multiple LDOs
  • Dynamic voltage and frequency scaling
  • Power gating for 8 independent domains
  • Ultra-low-power RTC and wake-up controller
  • Battery fuel gauge integration

Chip Specifications

Process Node12nm FinFET (TSMC)
Die Size9mm² (3x3mm)
PackageWLCSP 4x4mm, 81 balls
NPU Performance1 TOPS @ 100MHz
Power Efficiency10 TOPS/W (INT8)
Always-On Power< 5mW (voice wake + basic inference)
Deep Sleep< 1µA with RTC

Software Stack

  • Custom LLVM toolchain with RISC-V extensions
  • Lightweight RTOS optimized for power management
  • TensorFlow Lite Micro with custom kernels
  • Model compiler with quantization support
  • Power-aware scheduling runtime
  • Secure OTA update mechanism
  • HAL with power state management APIs

TinyML Model Support

The NPU architecture was optimized for common TinyML workloads while maintaining flexibility for model updates.

Keyword Spotting

DS-CNN (Depthwise Separable CNN)

96% accuracy on custom vocabulary

8ms inference, <3mW power

Voice Activity Detection

RNN-based classifier

98% detection accuracy

Always-on at 0.8mW

Person Detection

MobileNetV3-Small variant

92% accuracy at 96x96 resolution

45ms inference, <15mW power

Gesture Recognition

1D CNN on accelerometer data

94% on 12 gesture classes

5ms inference, <1mW power

Sensor Fusion

Multi-input neural network

Activity recognition, context awareness

Continuous at 2mW

Implementation Timeline

Phase 1: Architecture & Specification

12 weeks
  • Workload analysis and benchmarking
  • Architecture exploration and trade-off studies
  • RTL microarchitecture specification
  • Power and performance modeling

Phase 2: RTL Design & Verification

32 weeks
  • RTL implementation (Verilog)
  • Comprehensive UVM testbench development
  • Formal verification for critical paths
  • Power intent specification (UPF)

Phase 3: Physical Design & Tape-out

24 weeks
  • Synthesis and floorplanning
  • Place and route optimization
  • Sign-off (DRC, LVS, timing, power)
  • Tape-out to foundry

Phase 4: Silicon Bring-up & Productization

16 weeks
  • First silicon validation
  • Characterization across PVT corners
  • SDK and documentation completion
  • Production test development

Results & Impact

The custom RISC-V AI accelerator exceeded all specifications, enabling a new category of always-on intelligent devices with week-long battery life and sophisticated on-device AI capabilities.

Energy Efficiency

Before:1 TOPS/W (competitor baseline)
After:10 TOPS/W achieved
10x improvement improvement

Always-On Power

Before:50mW minimum (duty cycled)
After:3.2mW continuous inference
94% reduction improvement

Inference Latency

Before:100ms+ (cloud offload)
After:<10ms on-device
10x faster response improvement

BOM Cost

Before:Multi-chip solution
After:Single-chip integration
45% BOM reduction improvement

PCB Area

Before:120mm² for AI subsystem
After:25mm² total solution
79% area reduction improvement

Licensing Costs

Before:15-20% per unit royalty
After:Full IP ownership
100% cost elimination improvement

Return on Investment

Implementation Cost

Multi-year silicon development investment

Annual Savings

Payback Period

5-Year ROI

Rapid Circuitry delivered exactly what we needed - a custom AI chip that lets us differentiate in a crowded market. The 10x efficiency improvement enabled features our competitors simply cannot match. Our devices now have always-on AI with week-long battery life, and we own the IP completely.

CTO

Client Consumer Electronics Company

Technologies Used

RISC-VCustom NPU12nm FinFETTensorFlow Lite MicroVerilogUVMSynopsys Design CompilerCadence InnovusLLVMFreeRTOSHardware Security Module

Awards & Recognition

RISC-V Summit Innovation Award 2025

Best Commercial RISC-V Implementation

Embedded Computing Design Award

Most Innovative AI Processor

IEEE Solid-State Circuits Best Demo

Ultra-Low-Power AI Accelerator

Related Case Studies

Ready to Build Your Success Story?

Let's discuss how our expertise can help bring your vision to life with measurable results like this project.