• Home • Profiles
Your Profile
A voice2json
profile contains everything necessary to recognize voice commands, including:
- profile.yml
- A YAML file with settings that override the defaults
- sentences.ini
- A template file describing all of your voice commands
- Speech/intent models
acoustic_model
- a directory with the speech model artifacts- Kaldi profiles typically have a large pre-trained
HCLG.fst
inacoustic_model/model/graph
- DeepSpeech profiles have an output graph in
model
- Kaldi profiles typically have a large pre-trained
intent.pickle.gz
- a directed graph generated during training that is converted to a finite state transducer- See the whitepaper for more details
- Pronunciation dictionaries
- How
voice2json
expects words to be pronounced. You can customize any word. base_dictionary.txt
- large, pre-built pronunciations for most wordscustom_words.txt
- small, custom pronunciation dictionary for words that voice2json doesn’t knowsounds_like.txt
- small, custom pronunciation dictionary using known words or word segmentsdictionary.txt
- pronunciation dictionary generated during training containing all needed words
- How
- Language models
- Captures statistics about which words follow others in your voice commands
base_language_model.txt
- large, pre-built ARPA language model for profile language- Used in open transcription and language model mixing
- DeepSpeech-based profiles may have an
lm.binary
file instead - Julius-based profiles may have a pre-compiled
base_language_model.bin
file instead
language_model.txt
- custom language model generated generated during training
- Grapheme to phoneme models
- Used to guess how unknown words should be pronounced
g2p.fst
- a finite state transducer created using phonetisaurusg2p.corpus
- a alignment corpus created using phonetisaurus andbase_dictionary.txt
for sounds-like pronunciations
- Phoneme Maps
- Used to relate speech-to-text and text-to-speech phonemes
espeak_phonemes.txt
- map to eSpeak phonemesmarytts_phonemes.txt
- map to MaryTTS phonemesipa_phonemes.txt
- map to International Phonetic Alphabet
profile.yml
A YAML file with settings that override the defaults. This file typically contains the profile language’s name and locale code, as well as an eSpeak voice to use for word pronunciations.
For Kaldi-based profiles, kaldi.model-type
must be set to either gmm
or nnet3
.
Environment Variables
You can use the !env
constructor in your profile.yml
file to expand environment variables inside string values.
voice2json
makes the following environment variables are available when profiles are loaded:
profile_dir
- directory whereprofile.yml
was loaded fromvoice2json_dir
- directory wherevoice2json
is installedmachine
- CPU architecture as reported by Python’splatform.machine()
- Typically
x86_64
,armv7l
,armv6l
, etc.
- Typically
Default Settings
The default values for all profile settings are stored in etc/profile.defaults.yml
.
speech-to-text:
# Path to pre-built acoustic model directory
# For deepspeech: $profile_dir/model/output_graph.pbmm
acoustic-model: !env "${profile_dir}/acoustic_model"
# Path to custom ARPA language model
# For deepspeech: $profile_dir/lm.binary
language-model: !env "${profile_dir}/language_model.txt"
# Path to custom pronunciation dictionary
dictionary: !env "${profile_dir}/dictionary.txt"
# Pocketsphinx-specific settings
pocketsphinx:
# Path to tuned acoustic model matrix (pocketsphinx only)
mllr-matrix: !env "${profile_dir}/mllr_matrix"
# Text file with word examples for each phoneme in the pronunciation dictionary
phoneme-examples-file: !env "${profile_dir}/phoneme_examples.txt"
# True if acoustic model uses phonemes for pronunciations
phoneme-pronunciations: true
# Kaldi-specific settings
kaldi:
# Type of Kaldi model (either nnet3 or gmm)
model-type: ""
# Path to directory with custom HCLG.fst
graph-directory: !env "${profile_dir}/acoustic_model/graph"
# Path to directory with pre-built HCLG.fst (open transcription)
base-graph-directory: !env "${profile_dir}/acoustic_model/model/graph"
# Mozilla DeepSpeech-specific settings
deepspeech:
# Path to trie generate from sentences.ini
trie: !env "${profile_dir}/trie"
# Path to large, pre-built binary language model (open transcription)
base-language-model: !env "${profile_dir}/model/lm.binary"
# Path to large, pre-built trie (open transcription)
base-true: !env "${profile_dir}/model/true"
# -----------------------------------------------------------------------------
intent-recognition:
# Path to custom intent graph (stored as a gzipped networkx pickle)
intent-graph: !env "${profile_dir}/intent.pickle.gz"
# True if text should not be strictly matched
fuzzy: true
# Path to text file with common words to ignore (fuzzy matching only)
stop_words: !env "${profile_dir}/stop_words.txt"
# -----------------------------------------------------------------------------
training:
# Type of acoustic model.
# One of: pocketsphinx, kaldi, julius, deepspeech.
acoustic-model-type: "pocketsphinx"
# Path to pre-built acoustic model directory
acoustic-model: !env "${profile_dir}/acoustic_model"
# Path to file with custom intents and sentences
sentences-file: !env "${profile_dir}/sentences.ini"
# Path to text file with intents that will be considered for training.
# All intents will be considered if this file is missing.
intent-whitelist: !env "${profile_dir}/intent_whitelist"
# Directory containing text files, one for each $slot referenced in sentences.ini
slots-directory: !env "${profile_dir}/slots"
# Directory containing programs, one for each $slot referenced in sentences.ini
slot-programs-directory: !env "${profile_dir}/slot_programs"
# Directory containing programs, one for each !converter referenced in sentences.ini
converters-directory: !env "${profile_dir}/converters"
# Path to write custom intent finite state transducer
intent-fst: !env "${profile_dir}/intent.fst"
# Path to write custom ARPA language model
language-model: !env "${profile_dir}/language_model.txt"
# Path to write custom pronunciation dictionary
dictionary: !env "${profile_dir}/dictionary.txt"
# Path to extra word pronunciations outside of base dictionary
custom-words-file: !env "${profile_dir}/custom_words.txt"
# Action to take when multiple pronunciations for a word are available.
# One of: "append", "overwrite_once", "overwrite_always".
custom-words-action: "append"
# Path to write words without any known pronunciation
unknown-words-file: !env "${profile_dir}/unknown_words.txt"
# Path to extra word pronunciations based on existing words instead of phonemes
sounds-like-file: !env "${profile_dir}/sounds_like.txt"
# Action to take when multiple pronunciations for a "sounds like" word are available.
# One of: "append", "overwrite_once", "overwrite_always".
sounds-like-action: "append"
# Path to pre-built ARPA language model (open transcription)
base-language-model: !env "${profile_dir}/base_language_model.txt"
# Path to save compiled finite state transducer for base language model
base-language-model-fst: !env "${profile_dir}/base_language_model.fst"
# Amount of base language model to mix into custom language model
base-language-model-weight: 0.0
# Path to pre-built pronunciation dictionary
base-dictionary: !env "${profile_dir}/base_dictionary.txt"
# Path to model used to guess unknown word pronunciation
grapheme-to-phoneme-model: !env "${profile_dir}/g2p.fst"
# Path to Phonetisaurus alignment corpus for base dictionary
grapheme-to-phoneme-corpus: !env "${profile_dir}/g2p.corpus"
# Force word case during dictionary lookup/g2p.
# One of ignore, upper, lower.
word-casing: "ignore"
# Force word case during dictionary g2p.
# One of default (use word-casing), upper, lower.
g2p-word-casing: "default"
# When true, numbers (e.g. 100) are replaced with words (one hundred).
# This is supported for most, but not all voice2json languages.
replace-numbers: true
# Kaldi-specific settings
kaldi:
# Type of Kaldi model (either nnet3 or gmm)
model-type: ""
# Path to directory where custom HCLG.fst will be written
graph-directory: !env "${profile_dir}/acoustic_model/graph"
# -----------------------------------------------------------------------------
wake-word:
# Sensitivity (0-1)
sensitivity: 0.5
# Mycroft Precise settings
precise:
# Path to precise-engine executable.
# Use $PATH if empty.
engine-executable: ""
# Path to model .pb file
model-file: !env "${voice2json_dir}/etc/precise/hey-mycroft-2.pb"
# -----------------------------------------------------------------------------
voice-command:
# Minimum number of seconds a voice command must last (ignored otherwise)
minimum-seconds: 2
# Maximum number of seconds a voice command can last (timeout otherwise)
maximum-seconds: 30
# Seconds of speech detected before voice command is considered started
speech-seconds: 0.3
# Seconds of silence before voice command is considered ended
silence-seconds: 0.5
# Seconds of audio before voice command starts to retain in output
before-seconds: 0.5
# Seconds of audio to ignore before voice command
skip-seconds: 0.0
# webrtcvad specific settings
webrtcvad:
# Speech filtering aggressiveness 0-3 (3 is most aggressive)
vad-mode: 3
# -----------------------------------------------------------------------------
text-to-speech:
# espeak specific settings
espeak:
# Path to map between dictionary and espeak phonemes
phoneme-map: !env "${profile_dir}/espeak_phonemes.txt"
# Command to execute to pronounce some espeak phonemes
pronounce-command: "espeak-ng -s 80 --stdout [[{phonemes}]]"
# Command to speak sentence
speak-command: "espeak-ng --stdout \"{sentence}\""
marytts:
# URL to do GET requests
process-url: "http://localhost:59125/process"
# Path to map between dictionary and MaryTTS phonemes
phoneme-map: !env "${profile_dir}/marytts_phonemes.txt"
# Token used to end a MaryTTS sentence during pronunciation.
# I get weird pronunciations when punctuation is missing.
sentence-end: "."
# prosody rate used for pronunciation
pronounce-rate: "5%"
# -----------------------------------------------------------------------------
audio:
# Command to execute to record raw 16-bit 16Khz mono audio
record-command: "arecord -q -r 16000 -c 1 -f S16_LE -t raw"
# Command to convert WAV data to 16-bit 16Khz mono (stdin -> stdout)
convert-command: "sox -t wav - -r 16000 -e signed-integer -b 16 -c 1 -t wav -"
# Command to play a WAV file (stdin)
play-command: "aplay -q -t wav"
# Expected audio format.
# Convert command is run if a different format is given.
format:
sample-rate-hertz: 16000
sample-width-bits: 16
channel-count: 1