Audio Classification

Chip

Chip Value	Description
ndp120_b0	NDP120 Series B0 — primary target for all standard audio deployments

Model Architecture

Architecture	Recommended Use Case
mlpnet	Simple keyword spotting, small vocabulary, very low latency (default)
convnet	Phrase detection in noisy environments, multi-word commands
expandedconvnet	Higher accuracy or larger vocabularies
edgenet	Ultra-low-power, always-on listening
recurrent	Sequential or time-dependent speech patterns
temporal_convolution_resnet	Long phrases and robust temporal modelling
custom	Full control over network design

Network Layers (Read-Only)

Auto-generated from the selected chip and architecture. Displays Input → Conv2D → Flatten → Dense → Softmax, with output shapes at each layer.

NOTE

The network topology is auto-calculated and read-only. Change the architecture via the Model Architecture dropdown.

Audio Feature Configuration — User Inputs

Parameter	Description
Input Matrix (Features)	Number of filterbank frequency bins (nfilters)
Input Matrix (Time)	Number of successive time frames (wincount)
Window Duration (s)	Total time span in seconds covered by the time frames
Window Step	Audio samples between successive short-time windows (hop size)
Preemphasis Coefficient	High-frequency emphasis filter coefficient. Typical: 0.96875
Power Offset	Logarithmic offset before computing log-filterbank energies. Typical: 52
Enable Data Augmentation	Generates additional synthetic filterbank samples per class
Augmented Filterbanks / Class	Number of augmented samples to generate per class

Recommended Parameters for ndp120_b0

Parameter	Value
Input Matrix (Features)	40
Input Matrix (Time)	40
Window Duration (s)	1.000
Window Step	384
Preemphasis Coefficient	0.96875
Power Offset	52
Window Length (calculated)	512
Feature Extractor (calculated)	log-bin
Num Samples to NN (calculated)	1600
Sampling Rate (calculated)	16000.00

NOTE

Use these defaults unless you have a specific reason to change them.

Label Selection

Field	Description
Target Words	Keywords the model should detect — each is an individual class
Open-set Words	Words that must NOT trigger detection — all merged into one non-target class
Number of Classes	Auto-calculated: number of target words + 1 (open-set class) — read-only

Feature Generation Output

Select Chip and Model Architecture
Configure User Input Components
Verify Calculated Components (Num Samples to NN, Sampling Rate, matrix dimensions)
Optionally enable Data Augmentation and set Augmented Filterbanks / Class
Assign Target Words and Open-set items in Label Selection
Click Generate Features — output files: X_train.npy, X_test.npy, y_train.npy, y_test.npy

Data Explorer Tab

Interactive visualisation of audio sample similarity using unsupervised clustering. Identifies poorly labelled data, class overlap, and outliers. Supports KNN and SVM classifiers.

Chip​

Model Architecture​

Network Layers (Read-Only)​

Audio Feature Configuration — User Inputs​

Recommended Parameters for ndp120_b0​

Label Selection​

Feature Generation Output​

Data Explorer Tab​