Audio Classification
Chip
| Chip Value | Description |
|---|---|
| ndp120_b0 | NDP120 Series B0 — primary target for all standard audio deployments |
Model Architecture
| Architecture | Recommended Use Case |
|---|---|
| mlpnet | Simple keyword spotting, small vocabulary, very low latency (default) |
| convnet | Phrase detection in noisy environments, multi-word commands |
| expandedconvnet | Higher accuracy or larger vocabularies |
| edgenet | Ultra-low-power, always-on listening |
| recurrent | Sequential or time-dependent speech patterns |
| temporal_convolution_resnet | Long phrases and robust temporal modelling |
| custom | Full control over network design |
Network Layers (Read-Only)
Auto-generated from the selected chip and architecture. Displays Input → Conv2D → Flatten → Dense → Softmax, with output shapes at each layer.
NOTE
The network topology is auto-calculated and read-only. Change the architecture via the Model Architecture dropdown.
Audio Feature Configuration — User Inputs
| Parameter | Description |
|---|---|
| Input Matrix (Features) | Number of filterbank frequency bins (nfilters) |
| Input Matrix (Time) | Number of successive time frames (wincount) |
| Window Duration (s) | Total time span in seconds covered by the time frames |
| Window Step | Audio samples between successive short-time windows (hop size) |
| Preemphasis Coefficient | High-frequency emphasis filter coefficient. Typical: 0.96875 |
| Power Offset | Logarithmic offset before computing log-filterbank energies. Typical: 52 |
| Enable Data Augmentation | Generates additional synthetic filterbank samples per class |
| Augmented Filterbanks / Class | Number of augmented samples to generate per class |
Recommended Parameters for ndp120_b0
| Parameter | Value |
|---|---|
| Input Matrix (Features) | 40 |
| Input Matrix (Time) | 40 |
| Window Duration (s) | 1.000 |
| Window Step | 384 |
| Preemphasis Coefficient | 0.96875 |
| Power Offset | 52 |
| Window Length (calculated) | 512 |
| Feature Extractor (calculated) | log-bin |
| Num Samples to NN (calculated) | 1600 |
| Sampling Rate (calculated) | 16000.00 |
NOTE
Use these defaults unless you have a specific reason to change them.
Label Selection
| Field | Description |
|---|---|
| Target Words | Keywords the model should detect — each is an individual class |
| Open-set Words | Words that must NOT trigger detection — all merged into one non-target class |
| Number of Classes | Auto-calculated: number of target words + 1 (open-set class) — read-only |
Feature Generation Output
- Select Chip and Model Architecture
- Configure User Input Components
- Verify Calculated Components (Num Samples to NN, Sampling Rate, matrix dimensions)
- Optionally enable Data Augmentation and set Augmented Filterbanks / Class
- Assign Target Words and Open-set items in Label Selection
- Click Generate Features — output files:
X_train.npy,X_test.npy,y_train.npy,y_test.npy
Data Explorer Tab
Interactive visualisation of audio sample similarity using unsupervised clustering. Identifies poorly labelled data, class overlap, and outliers. Supports KNN and SVM classifiers.