Skip to main content

Audio Classification

Chip

Chip ValueDescription
ndp120_b0NDP120 Series B0 — primary target for all standard audio deployments

Model Architecture

ArchitectureRecommended Use Case
mlpnetSimple keyword spotting, small vocabulary, very low latency (default)
convnetPhrase detection in noisy environments, multi-word commands
expandedconvnetHigher accuracy or larger vocabularies
edgenetUltra-low-power, always-on listening
recurrentSequential or time-dependent speech patterns
temporal_convolution_resnetLong phrases and robust temporal modelling
customFull control over network design

Network Layers (Read-Only)

Auto-generated from the selected chip and architecture. Displays Input → Conv2D → Flatten → Dense → Softmax, with output shapes at each layer.

NOTE

The network topology is auto-calculated and read-only. Change the architecture via the Model Architecture dropdown.

Audio Feature Configuration — User Inputs

ParameterDescription
Input Matrix (Features)Number of filterbank frequency bins (nfilters)
Input Matrix (Time)Number of successive time frames (wincount)
Window Duration (s)Total time span in seconds covered by the time frames
Window StepAudio samples between successive short-time windows (hop size)
Preemphasis CoefficientHigh-frequency emphasis filter coefficient. Typical: 0.96875
Power OffsetLogarithmic offset before computing log-filterbank energies. Typical: 52
Enable Data AugmentationGenerates additional synthetic filterbank samples per class
Augmented Filterbanks / ClassNumber of augmented samples to generate per class
ParameterValue
Input Matrix (Features)40
Input Matrix (Time)40
Window Duration (s)1.000
Window Step384
Preemphasis Coefficient0.96875
Power Offset52
Window Length (calculated)512
Feature Extractor (calculated)log-bin
Num Samples to NN (calculated)1600
Sampling Rate (calculated)16000.00
NOTE

Use these defaults unless you have a specific reason to change them.

Label Selection

FieldDescription
Target WordsKeywords the model should detect — each is an individual class
Open-set WordsWords that must NOT trigger detection — all merged into one non-target class
Number of ClassesAuto-calculated: number of target words + 1 (open-set class) — read-only

Feature Generation Output

  1. Select Chip and Model Architecture
  2. Configure User Input Components
  3. Verify Calculated Components (Num Samples to NN, Sampling Rate, matrix dimensions)
  4. Optionally enable Data Augmentation and set Augmented Filterbanks / Class
  5. Assign Target Words and Open-set items in Label Selection
  6. Click Generate Features — output files: X_train.npy, X_test.npy, y_train.npy, y_test.npy

Data Explorer Tab

Interactive visualisation of audio sample similarity using unsupervised clustering. Identifies poorly labelled data, class overlap, and outliers. Supports KNN and SVM classifiers.