Embedded TinyML on 256KB RAM Smartwatch

Real-time Human Activity Recognition (HAR) deployed on the Bangle.js 2 smartwatch under strict memory constraints (256KB RAM). I implemented an end-to-end TinyML pipeline for basketball activity recognition (HANG Time-HAR) using wrist accelerometer data, and optimized models for edge inference using TensorFlow Lite.

Project Snapshot

The goal was to run deep-learning-based activity recognition locally on a smartwatch without cloud dependency. This required designing models and preprocessing that fit within tight RAM/Flash budgets while keeping inference responsive for real-time use.

Device: Bangle.js 2 (memory-constrained wearable)
Sensing: 3-axis wrist accelerometer (time-series)
Task: Basketball activity classification (multi-class)
Deployment: TensorFlow Lite + post-training quantization

Focus: Embedded AI trade-offs — accuracy vs. memory footprint vs. latency

Edge Pipeline (Memory-Aware)

I built a memory-efficient pipeline tailored for microcontroller-class devices: cleaning raw sensor streams, generating compact fixed-length windows, and ensuring features and labels stay consistent after resampling.

Windowing: fixed 2-second windows with overlap for real-time responsiveness
Downsampling: reduced sample rate to lower compute + memory cost
Normalization: scaled inputs for stable training and inference
Class balancing: capped sequences per class to control dataset size

Models Evaluated & Optimization

Tested multiple time-series architectures and selected models based on deployment feasibility (memory footprint and supported ops) in addition to accuracy.

Architectures: CNN, Hybrid CNN-LSTM, DeepConvLSTM
Compression: post-training int8 quantization (TFLite)
Embedded constraint: avoid unsupported ops and reduce runtime overhead

Result highlight: best model reached ~79.3% test accuracy on the basketball case study while remaining edge-deployable.

What This Demonstrates

Embedded AI

Deploying ML under RAM/Flash limits, balancing accuracy vs. footprint vs. inference cost.

Real-Time Thinking

Designing windowed inference pipelines and tuning for responsiveness on-device.

Optimization

TFLite conversion + quantization workflows to reduce model size for edge deployment.

Back to Home