Home • Install

Installing voice2json

voice2json has been tested on Ubuntu 18.04. It should be able to run on most any flavor of Linux using the Docker image. It may even run on Mac OSX, but I don’t have a Mac to test this out.

Installation options:

After installation:


Supported Hardware

voice2json is supported on typical desktops/laptops as well as the Raspberry Pi, including the Pi Zero (armel).

Category Name amd64 armv7 arm64 armel
Wake Word Mycroft Precise  
Speech to Text Pocketsphinx
  Kaldi  
  DeepSpeech    
  Julius

Debian Package

Pre-compiled packages are available for Debian-based distributions (Ubuntu, Linux Mint, etc.) on amd64, armhf, armel, and arm64 (aarch64) architectures. These packages are built using Docker and dpkg.

Next, download the appropriate .deb file for your CPU architecture:

If you’re unsure about your architecture, run:

$ dpkg-architecture | grep DEB_BUILD_ARCH=

which will output something like:

DEB_BUILD_ARCH=amd64

Next, install the .deb file:

$ sudo apt install /path/to/voice2json_<VERSION>_<ARCH>.deb

where where <VERSION> is voice2json’s version (probably 2.0) and <ARCH> is your build architecture.

NOTE: If you run sudo apt install in the same directory as the .deb file, make sure to prefix the filename with ./ like this:

$ sudo apt install ./voice2json_<VERSION>_<ARCH>.deb

After downloading a profile, you should now be able to run any of the example voice2json commands in the documentation.


Docker Image

The easiest way to try out voice2json is with Docker. Pre-built images are available for amd64, armhf, armel, and arm64 (aarch64) CPU architectures. To get started, make sure you have Docker installed:

$ curl -sSL https://get.docker.com | sh

and that your user is part of the docker group:

$ sudo usermod -a -G docker $USER

Be sure to reboot after adding yourself to the docker group!

Shell Script

Create a Bash script named voice2json somewhere in your $PATH and add the following content:

#!/usr/bin/env bash
docker run -i \
       --init \
       -v "${HOME}:${HOME}" \
       -v "/dev/shm/:/dev/shm/" \
       -w "$(pwd)" \
       -e "HOME=${HOME}" \
       --user "$(id -u):$(id -g)" \
       synesthesiam/voice2json "$@"

Mark it as executable with chmod +x /path/to/voice2json and try it out:

$ voice2json --help

After downloading a profile, you should now be able to run any of the example voice2json commands in the documentation.

Raspberry Pi Zero

Docker on the Raspberry Pi Zero appears to be broken, and will pull the wrong Docker image by default. To fix this, you must enable “experimental” features in your Docker daemon and explicitly specify the platform.

First, edit your /etc/docker/daemon.json file (create it if it doesn’t exist using sudo) and add the following content:

{
  "experimental": true
}

Next, restart your Docker daemon by running:

$ sudo systemctl restart docker

Finally, pull the correct Docker image:

$ docker pull --platform linux/arm/v6 synesthesiam/voice2json

Updating

To update your voice2json Docker image, simply run:

$ docker pull synesthesiam/voice2json

If you’re using a Raspberry Pi Zero, make sure to specify the platform:

$ docker pull --platform linux/arm/v6 synesthesiam/voice2json

From Source

voice2json uses autoconf to facilitate building from source. You will need Python 3.7 and some common build tools like gcc.

Once you’ve cloned the the repository, the build steps should be familiar:

$ git clone https://github.com/synesthesiam/voice2json
$ cd voice2json
$ ./configure
$ make
$ make install

This will install voice2json inside a virtual environment at $PWD/.venv by default with all of the supported speech to text engines and supporting tools. When installation is finished, copy voice2json.sh somewhere in your PATH and rename it to voice2json.

Customizing Installation

You can pass additional information to configure to avoid installing parts of voice2json that you won’t use. For example, if you only plan to use the French language profiles, set the VOICE2JSON_LANGUAGE environment variable to fr when configuring your installation:

$ ./configure VOICE2JSON_LANGUAGE=fr

The installation will now be configured to install only Kaldi (if supported). If instead you want a specific speech to text system, use VOICE2JSON_SPEECH_SYSTEM like:

$ ./configure VOICE2JSON_SPEECH_SYSTEM=deepspeech

which will only enable DeepSpeech.

To force the supporting tools to be built from source instead of downloading pre-compiled binaries, use --disable-precompiled-binaries. Dependencies will be compiled in a build directory (override with $BUILD_DIR during make), and bundled for installation in download (override with $DOWNLOAD_DIR).

See ./configure --help for additional options.


Download Profile

voice2json must have a profile in order to do speech/intent recognition. Because the artifacts for each language/locale can be quite large (100’s of MB or more), voice2json does not include them in its Debian package, Docker image, or source repository.

Profiles for each of the supported languages/locales are available for download on Github. You should download the appropriate .tar.gz and extract it to $HOME/.config/voice2json (any other directory will require a --profile argument to be passed to voice2json). If everything is in the right place, $HOME/.config/voice2json/profile.yml will exist.

The print-downloads command can be used to avoid downloading unnecessary files if you know how you plan to use voice2json.

English Example

For English, there are five available profiles:

  1. en-us_pocketsphinx-cmu
  2. en-us_kaldi-zamia
  3. en-us_deepspeech-mozilla
  4. en-us_julius-github
  5. en-in_pocketsphinx-cmu

The first four profiles are for U.S. English (en-us), while the third is for Indian English (en-in). For U.S. English, you will probably want to start with the en-us_pocketsphinx-cmu profile, which is based on pocketsphinx. This profile provides a good balance of accuracy and speed. For open transcription, however en-us_kaldi-zamia and en-us_deepspeech-mozilla are much better options.

Downloading and installing the en-us_pocketsphinx-cmu is straightforward from the command-line:

$ mkdir -p "${HOME}/.config/voice2json"
$ curl -SL \
      https://github.com/synesthesiam/en-us_pocketsphinx-cmu/archive/v1.0.tar.gz | \
      tar -C "${HOME}/.config/voice2json" --skip-old-files --strip-components=1 -xzvf -

Note: The --skip-old-files argument to tar will ensure that your sentences.ini and custom_words.txt files are not overwritten. Remove this argument to completely overwrite your profile.

Now you should be able to train your profile:

$ voice2json --debug train-profile

If you extracted the profile files to a directory other than $HOME/.config/voice2json, you will need to pass a --profile argument to voice2json:

$ voice2json --profile /path/to/profile/files/ --debug train-profile

Note: The first time you train your profile may take a long time, especially on Raspberry Pi (SD card). This is because voice2json is decompressing and re-combining split files from GitHub.

Test Your Profile

Once you’ve trained your profile, you can quickly test it out with:

$ voice2json transcribe-stream

The transcribe-stream will record from your microphone (using arecord and the default device), wait for you to speak a voice command, and then output a transcription (hit CTRL + C to exit).

If you’re using the default English sentences, try saying “turn on the living room lamp” and wait for the output. Getting intents out is as easy as:

$ voice2json transcribe-stream | \
    voice2json recognize-intent

Speaking a voice command should now output a line of JSON with the recognized intent. For example, “what time is it” outputs something like:

{
  "text": "what time is it",
  "likelihood": 0.025608657540496446,
  "transcribe_seconds": 1.4270143630001257,
  "wav_seconds": 0.0043125,
  "tokens": [
    "what",
    "time",
    "is",
    "it"
  ],
  "timeout": false,
  "intent": {
    "name": "GetTime",
    "confidence": 1
  },
  "entities": [],
  "raw_text": "what time is it",
  "recognize_seconds": 0.00019677899945236277,
  "raw_tokens": [
    "what",
    "time",
    "is",
    "it"
  ],
  "speech_confidence": null,
  "wav_name": null,
  "slots": {}
}

Back Up Your Profile

If you have an existing voice2json profile, it is highly recommended you regularly back up the following files:

See the print-files for an easy way to automate backups.