Skip to main content

Purpose and Model Architecture

The Model Config tab defines how the machine learning model is built, trained, and how predictions are interpreted for keyword or phrase detection.

Purpose of the Model Config Tab

Each architecture offers a different trade-off between accuracy, computational complexity, memory usage, and power consumption. Selecting the appropriate architecture depends on the application requirements, such as vocabulary size, noise conditions, and hardware constraints.

Model Architecture

Available architectures:

  • mlpnet
  • convnet
  • expandedconvnet
  • edgenet
  • mobilenet
  • recurrent
  • temporal_convolution_resnet
  • vgg
  • custom

Recommended Use

  • mlpnet → Simple keyword spotting, small vocabulary, very low latency
  • convnet → Phrase detection, noisy environments, multi-word commands
  • expandedconvnet → Higher accuracy requirements, larger vocabularies
  • edgenet → Ultra-low-power, always-on listening applications
  • mobilenet → Balanced accuracy and efficiency for constrained devices
  • recurrent → Sequential or time-dependent speech patterns
  • temporal_convolution_resnet → Long phrases and robust temporal modeling
  • vgg → Research, experimentation, high-capacity models
  • custom → Advanced users requiring full control over network design

General Guidance:

  • Lightweight architectures are suitable for simple keywords
  • Convolution-based architectures are preferred for phrases and noisy environments
  • Custom architecture allows full user control

Important Notes:

  • The input layer is automatically generated based on preprocessing settings
  • Changing the architecture requires retraining the model