Purpose and Model Architecture

The Model Config tab defines how the machine learning model is built, trained, and how predictions are interpreted for keyword or phrase detection.

Purpose of the Model Config Tab

Each architecture offers a different trade-off between accuracy, computational complexity, memory usage, and power consumption. Selecting the appropriate architecture depends on the application requirements, such as vocabulary size, noise conditions, and hardware constraints.

Model Architecture

Available architectures:

mlpnet
convnet
expandedconvnet
edgenet
mobilenet
recurrent
temporal_convolution_resnet
vgg
custom

Recommended Use

mlpnet → Simple keyword spotting, small vocabulary, very low latency
convnet → Phrase detection, noisy environments, multi-word commands
expandedconvnet → Higher accuracy requirements, larger vocabularies
edgenet → Ultra-low-power, always-on listening applications
mobilenet → Balanced accuracy and efficiency for constrained devices
recurrent → Sequential or time-dependent speech patterns
temporal_convolution_resnet → Long phrases and robust temporal modeling
vgg → Research, experimentation, high-capacity models
custom → Advanced users requiring full control over network design

General Guidance:

Lightweight architectures are suitable for simple keywords
Convolution-based architectures are preferred for phrases and noisy environments
Custom architecture allows full user control

Important Notes:

The input layer is automatically generated based on preprocessing settings
Changing the architecture requires retraining the model