5. Speech Recognition Software

5.1. Free Software

Much of the free software listed here is available for download at: http://sunsite.uio.no/pub/Linux/sound/apps/speech/

5.1.1. XVoice

XVoice is a dictation/continuous speech recognizer that can be used with a variety of XWindow applications. It allows user-defined macros. This is a fine program with a definite future. Once setup, it performs with adequate accuracy.

XVoice requires that you download and install IBM's (free) ViaVoice for Linux (See Commercial Section). It also requires the configuration of ViaVoice to work correctly. Additionally, Lesstif/Motif (libXm) is required. It is also important to note that because this program interacts with X windows, you must leave X resources open on your machine, so caution should be used if you use this on a networked or multi-user machine.

This software is primarily for users. An RPM is available.

HomePage: http://www.compapp.dcu.ie/~tdoris/Xvoice/ http://www.zachary.com/creemer/xvoice.html

Project: http://xvoice.sourceforge.net

Community: http://www.onelist.com/community/xvoice

5.1.2. CVoiceControl/kVoiceControl

CVoiceControl (which stands for Console Voice Control) started its life as KVoiceControl (KDE Voice Control). It is a basic speech recognition system that allows a user to execute Linux commands by using spoken commands. CVoiceControl replaces KVoiceControl.

The software includes a microphone level configuration utility, a vocabulary "model editor" for adding new commands and utterances, and the speech recognition system.

CVoiceControl is an excellent starting point for experienced users looking to get started in ASR. It is not the most user friendly, but once it has been trained correctly, it can be very helpful. Be sure to read the documentation while setting up.

This software is primarily for users.

Homepage: http://www.kiecza.de/daniel/linux/index.html

Documents: http://www.kiecza.de/daniel/linux/cvoicecontrol/index.html

5.1.3. Open Mind Speech

Started in late 1999, Open Mind Speech has changed names several times (was VoiceControl, then SpeechInput, and then FreeSpeech), and is now part of the "Open Mind Initiative". This is an open source project. Currently it isn't completely operational and is primarily for developers.

This software is primarily for developers.

Homepage: http://freespeech.sourceforge.net

5.1.4. GVoice

GVoice is a speech ASR library that uses IBM's ViaVoice (free) SDK to control Gtk/GNOME applications. It includes libraries for initialization, recognition engine, vocabulary manipulation, and panel control. Development on this has been idle for over a year.

This software is primarily for developers.

Homepage: http://www.cse.ogi.edu/~omega/gnome/gvoice/

5.1.5. ISIP

The Institute for Signal and Information Processing at Mississippi State University has made its speech recognition engine available. The toolkit includes a front-end, a decoder, and a training module. It's a functional toolkit.

This software is primarily for developers.

The toolkit (and more information about ISIP) is available at: http://www.isip.msstate.edu/project/speech/

5.1.6. CMU Sphinx

Sphinx originally started at CMU and has recently been released as open source. This is a fairly large program that includes a lot of tools and information. It is still "in development", but includes trainers, recognizers, acoustic models, language models, and some limited documentation.

This software is primarily for developers.

Homepage: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html

Source: http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz

5.1.7. Ears

Although Ears isn't fully developed, it is a good starting point for programmers wishing to start in ASR.

This software is primarily for developers.

FTP site: ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/

5.1.8. NICO ANN Toolkit

The NICO Artificial Neural Network toolkit is a flexible back propagation neural network toolkit optimized for speech recognition applications.

This software is primarily for developers.

Its homepage: http://www.speech.kth.se/NICO/index.html

5.1.9. Myers' Hidden Markov Model Software

This software by Richard Myers is HMM algorithms written in C++ code. It provides an example and learning tool for HMM models described in the L. Rabiner book "Fundamentals of Speech Recognition".

This software is primarily for developers.

Information is available at: http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html

5.1.10. Jialong He's Speech Recognition Research Tool

Although not originally written for Linux, this research tool can be compiled on Linux. It contains three different types of recognizers: DTW, Dynamic Hidden Markov Model, and a Continuous Density Hidden Markov Model. This is for research and development uses, as it is not a fully functional ASR system. The toolkit contains some very useful tools.

This software is primarily for developers.

More information is available at: http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html

5.1.11. More Free Software?

If you know of free software that isn't included in the above list, please send me a note at: scook@gear21.com. If you're in the mood, you can also send me where to get a copy of the software, and any impressions you may have about it. Thanks!

5.2. Commercial Software

5.2.1. IBM ViaVoice

IBM has made true on their promise to support Linux with their series of ViaVoice products for Linux, though the future of their SDKs aren't set in stone (their licensing agreement for developers isn't officially released as of this date - more to come).

Their commercial (not-free) product, IBM ViaVoice Dictation for Linux (available at http://www-4.ibm.com/software/speech/linux/dictation.html) performs very well, but has some sizeable system requirements compared to the more basic ASR systems (64M RAM and 233MHz Pentium). For the $59.95US price tag you also get an Andrea NC-8 microphone. It also allows multiple users (but I haven't tried it with multiple users, so if anyone has any experience please give me a shout). The package includes: documentation (PDF), Trainer, dictation system, and installation scripts. Support for additional Linux Distributions based on 2.2 kernels is also available in the latest release.

The ASR SDK is available for free, and includes IBM's SMAPI, grammar API, documentation, and a variety of sample programs. The ViaVoice Run Time Kit provides an ASR engine and data files for dictation functions, and user utilities. The ViaVoice Command & Control Run Time Kit includes the ASR engine and data files for command and control functions, and user utilities. The SDK and Kits require 128M RAM and a Linux 2.2 or better kernel)

The SDKs and Kits are available for free at: http://www-4.ibm.com/software/speech/dev/sdk_linux.html

5.2.2. Vocalis Speechware

More information on Vocalis and Vocalis Speechware is available at: http://www.vocalisspeechware.com and http://www.vocalis.com.

5.2.3. Babel Technologies

Babel Technologies has a Linux SDK available called Babear. It is a speaker-independent system based on Hybrid Markov Models and Artificial Neural Networks technology. They also have a variety of products for Text-to-speech, speaker verification, and phoneme analysis. More information is available at: http://www.babeltech.com.

5.2.4. SpeechWorks

I didn't see anything on their website that specifically mentioned Linux, but their "OpenSpeech Recognizer" uses VoiceXML, which is an open standard. More information is available at: http://www.speechworks.com.

5.2.5. Nuance

Nuance offers a speech recognition/natural language product (currently Nuance 8.0) for a variety of *nix platforms. It can handle very large vocabularies and uses a unqiue distributed architecture for scalability and fault tolerance. More information is available at: http://www.nuance.com.

5.2.6. Abbot/AbbotDemo

Abbot is a very large vocabulary, speaker independent ASR system. It was originally developed by the Connectionist Speech Group at Cambridge University. It was transferred (commercialized) to SoftSound. More information is available at: http://www.softsound.com.

AbbotDemo is a demonstration package of Abbot. This demo system has a vocabulary of about 5000 words and uses the connectionist/HMM continuous speech algorithm. This is a demonstration program with no source code.

5.2.7. Entropic

The fine people over at Entropic have been bought out by Micro$oft... Their products and support services have all but disappeared. Their support for HTK and ESPS/waves+ is gone, and their future is in the hands of M$. Their old website as http://www.entropic.com has more information.

K.K. Chin advised me that the original developers of the HTK (the Speech Vision and Robotic Group at Cambridge) are still providing support for it. There is also a "free" version available at: http://htk.eng.cam.ac.uk. Also note that Microsoft still owns the copyright to the current HTK code...

5.2.8. More Commercial Products

There are rumors of more commercial ASR products becoming available in the near future (including L&H). I talked with a couple of L&H representatives at Comdex 2000 (Vegas) and none of them could give me any information on a Linux release, or even if they planned on releasing any products for Linux. If you have any further information, please send any details to me at scook@gear21.com.