Model Performance Evaluation
In the eFabric™ ecosystem, a model’s success is not determined by its performance on a clean, curated dataset in a laboratory setting. Instead, success is defined by its resilience and reliability when deployed in the "wild"—where background noise is unpredictable, sensor data is noisy and battery life is non-negotiable. Evaluating an NDP-based model requires a multi-dimensional approach that balances statistical accuracy with hardware-specific constraints.
Evaluating performance for eFabric™ requires a shift from standard data science metrics to "System-Level" metrics. Unlike cloud-based AI, where a slight delay or a false positive might only cost a fraction of a cent in compute power, at the edge, these errors have physical consequences. We must balance the model's ability to catch true events while ensuring it doesn't drain the battery or create "notification fatigue" for the user.
The Three Dimensions of Evaluation
To provide a comprehensive view of how a model will behave on the Syntiant® NDP, we evaluate it across three distinct planes:
-
Statistical Integrity: Using traditional metrics like Precision, Recall, and F1-Score to ensure the model has "learned" the correct patterns.
-
Operational Reliability: Testing specifically for False Acceptance (FAR) and False Rejection (FRR) to understand how the model handles the "imbalanced" nature of the real world.
-
Hardware Efficiency: Measuring how the model utilizes the At-Memory architecture, ensuring it stays within the defined Microwatt power budget without sacrificing inference speed.
"Traditional 'Accuracy' (the percentage of correct guesses) is often a misleading metric for Always-On devices. If a device is listening for a 'Glass Break' that only happens once a year, a model that simply guesses 'No Glass Break' all the time would be 99.99% accurate but completely useless.”