Project Summary (Case Study 1 — Basketball HAR)

Using the HANG Time-HAR basketball dataset, the task is to classify 19 basketball-related activities from wrist-worn 3-axis accelerometer signals. The Bangle.js 2 records accelerometer data at 50 Hz (±8g). I built the full pipeline: preprocessing → training multiple neural networks → model selection → compression/quantization → emulator-based deployment and evaluation.

  • Device: Bangle.js 2 (256KB RAM, 1024KB Flash)
  • Signal: 3-axis accelerometer (wrist), time-series HAR
  • Task: 19-class basketball activity classification
  • Models: LSTM, CNN, Hybrid CNN–LSTM, DeepConvLSTM
  • Edge focus: TensorFlow Lite conversion + int8 quantization
  • Supervisor: Prof. Kristof Van Laerhoven (University of Siegen)
Main engineering challenge: balancing accuracy vs memory footprint vs inference latency for a watch-class device.

Step 1 — Data Preprocessing (Memory-Aware)

The raw accelerometer streams (CSV per player) were cleaned, downsampled, and segmented into compact fixed-length windows so the model input stays small and consistent during training and on-device inference.

  • Cleaning: remove missing rows; filter extreme samples outside sensor range (|acc| > 8g)
  • Downsampling: 50 Hz → 10 Hz (keep every 5th sample) to reduce compute + memory
  • Windowing: 2-second sliding windows (20 samples at 10 Hz) with 50% overlap
  • Labeling: window label = predominant activity inside the window
  • Balancing: cap to max 5000 windows/class (random sampling) to control dataset size
  • Split: 60% train / 20% val / 20% test; normalize with MinMax scaling
Why this matters: fixed windows + downsampling make inference predictable and feasible under tight RAM limits.
Preprocessing: windowing and downsampling

Step 2 — Models Evaluated

I evaluated multiple time-series neural network architectures and selected candidates based on deployment feasibility (model size + supported ops) in addition to accuracy. The final comparison was driven by edge constraints, not only leaderboard accuracy.

  • LSTM: temporal modeling baseline
  • CNN: lightweight feature extractor; best fit for strict constraints
  • Hybrid CNN–LSTM: combines local feature extraction + temporal modeling
  • DeepConvLSTM: highest accuracy, highest complexity
Best test accuracy (Keras): DeepConvLSTM ≈ 79.33% on basketball HAR.

Model 0 — LSTM (Sequence Baseline)

The LSTM baseline models temporal dependencies directly from the accelerometer sequence. It is a useful reference point for comparing against convolution-based approaches, especially under edge deployment constraints.

  • Strength: learns time dependencies directly from sequences
  • Limitation: recurrent ops can be harder to deploy efficiently on constrained runtimes

LSTM Architecture

LSTM model architecture

Model A — CNN (Edge-Friendly Baseline)

The CNN uses stacked Conv1D blocks to capture local motion patterns (peaks, rhythm, short bursts), followed by compact dense layers for classification. This architecture is typically more deployment-friendly than recurrent models on constrained hardware.

  • Strength: efficient inference and smaller footprint
  • Test accuracy (Keras): ≈ 75.68%
  • Quantized TFLite size: ~17,944 bytes (smallest among tested models)

CNN Architecture

CNN architecture

Model B — Hybrid CNN–LSTM

The Hybrid model first extracts short-term features with convolution layers and then uses LSTM layers to model temporal dependencies. It performed competitively, but recurrent layers complicate deployment.

  • Test accuracy (Keras): ≈ 76.58%
  • TFLite size: ~39,840 bytes
  • Note: LSTM-based ops required extra conversion handling in the edge toolchain

Hybrid CNN–LSTM Architecture

Hybrid CNN-LSTM architecture

Model C — DeepConvLSTM (Best Accuracy)

DeepConvLSTM combines deeper convolution blocks for robust feature extraction with LSTM layers for temporal modeling. It achieved the best basketball HAR accuracy, but comes with higher parameter count and a heavier deployment footprint.

  • Best test accuracy (Keras):79.33%
  • TFLite size: ~101,088 bytes
  • TFLite test accuracy (500 samples): ≈ 68.20% (conversion + runtime constraints matter)
Takeaway: the “best” model depends on whether you optimize for accuracy only, or for real deployability on the target runtime.

DeepConvLSTM Architecture

DeepConvLSTM architecture

Step 3 — Deployment on the Watch (and Emulator Reality)

To make the models runnable under smartwatch constraints, I applied post-training int8 quantization with TensorFlow Lite. However, during development I relied on the Espruino Web IDE emulator, which does not directly execute standard TFLite binaries. To test end-to-end behavior in the emulator, the trained models were adapted into a JSON-based representation compatible with the JavaScript runtime.

  • Compression: int8 quantization to reduce model size and improve speed
  • Compatibility issue: LSTM-based models require fallback ops (SELECT_TF_OPS), increasing complexity
  • Emulator constraint: models executed via JSON conversion + simulated sensor values
  • Next step: validate on real Bangle.js 2 hardware for true latency + RAM behavior
Bangle.js 2 emulator outputs

Conclusion

This work demonstrates an end-to-end TinyML workflow for basketball activity recognition under strict memory limits. The main result is not only the accuracy (DeepConvLSTM ≈ 79.33% in Keras), but the engineering trade-offs needed to make models deployable: reducing sampling rate, fixed windowing, controlling dataset size, and quantizing models. Emulator-based testing enabled rapid iteration, but final validation on physical hardware is essential for real-world latency, memory, and sensor-noise robustness.


References

Supervisor: Prof. Dr. Kristof Van Laerhoven (University of Siegen)
Dataset: HANG Time-HAR (Basketball Activity Recognition)
Project report: “Efficient Activity Recognition on Memory-Constrained Smart Watches” (Case Study 1)