SpeM

Home    :    About me    :    Research    :    Publications    :    Fine-Tracker    :    SpeM    :    CV

Introduction

This page provides information on the computational model of human speech recognition: SpeM. SpeM (Speech based model of speech recognition) was originally implemented to serve as a tool for research in the field of human speech recognition (HSR). It is a new and extended implementation of the theory underlying the Shortlist model (Norris, 1994), a computational model of human word recognition. The main advance of SpeM over pre-existing computational models of HSR is that SpeM uses the acoustic speech signal as input, while Shortlist and other computational models of HSR only take handcrafted symbolic representations as input.

SpeM consists of two modules:
  • An automatic phone recogniser (APR)
  • A word search module
The word search module parses the probabilistic phone graph created by the APR in order to find the most likely (sequence of) words, and computes for each word its activation based on the accumulated acoustic evidence for that word.

To be able to use the word search module, it is not obligatory to use an APR created with the same software package (Phicos, Steinbiss et al., 1993) as has been used in the original SpeM experiments. The word search module has successfully been used in combination with HTK (Young et al., 2002) and MIT's SUMMIT recognition system (Glass, 2003) as well. To be able to use the word search module, the input graphs should have the same structure as shown in the example file below.

If you have used SpeM in your research, please put a reference to SpeM using the reference given at 'Additional information' below. If you have any questions regarding the use of SpeM, please do not hesitate to contact me via e-mail.

Getting started

The files can be downloaded here in .zip format (for Windows) and here in .tar.gz format (to unzip: 'gzip -d spem.tar.gz', followed by 'tar xvf spem.tar').

  • Compile SpeM: g++ SpeM.v0.998.cpp -o spem.exe
  • Run SpeM: spem <config file> <input graph> [parameters]
The parameters can be given on the command line or can be put into the config file.

Example files
  • Config file: the names of the parameters are self-explanatory; where questions may arise, additional information is given in the config file.
  • Input graph: 3 graphs created by an automatic phone recogniser on the basis of 3 productions of the phrase: ship inquiry.
  • Lexicon: the first column in the lexicon is the orthographic transcription; the second column in the lexicon is the phonemic transcription. (In the example lexicon provided, both columns consists of the phonetic transcription. The output of SpeM will in this case thus consist of the phonemic instead of the orthographic transcription.)
Language models

SpeM supports unigram and bigram language models (LMs), although the latter have not been tested. To use language models in SpeM, a couple of steps need to be carried out:

  1. Compile LMConvert: g++ LMConvert.v0.7.cpp -o lmconvert.exe
  2. Create a language model similar to the language model to be found here.
  3. Create a SpeM language model:
    • Unigram LMs: LMConvert <LM filename> uni <lowerBoundUni>
      Creates a filename.TMP if it does not yet exist.
      Default value <lowerBoundUni>: 5.
    • Bigram LMs: LMConvert <LM filename> bi <lowerBoundUni> <lowerBoundBi> <Discount>
      Creates a filename.TMP, filename.BI.TMP, and a filename.BO.TMP if they do not yet exist.
      Default value <lowerBoundBi>: 5; default value <Discount>: 0.5.
    The implementation of <lowerBoundUni>, <lowerBoundBi>, and <Discount> follows the HTK-implementation (Young et al., 2002).
    An example of a SpeM unigram LM can be found here.
  4. Add the path to the language model in the config file.
  5. Add a value for UniGramAlpha in the config file. This value determines the influence of the language model on the overall score (value between 0-1).
  6. Change the value for xGram in the config file to either 1 (unigram LM) or 2 (bigram LM).

Additional information

SpeM:

  • O. Scharenborg, D. Norris, L. ten Bosch & J.M. McQueen (2005). How should a speech recognizer work? Cognitive Science, 29:6, 867-918. [.pdf]

Unigram language models in SpeM:
  • O. Scharenborg & S. Seneff (2005). A two-pass strategy for handling OOVs in a large vocabulary recognition task. Proceedings of Interspeech, Lisbon, Portugal, pp. 1669-1672. [.pdf]
  • O. Scharenborg, S. Seneff & L. Boves (2007). A two-pass approach for handling out-of-vocabulary words in a large vocabulary recognition task. Computer Speech and Language, 21 (1), 206-218. [.pdf]

References

  • Glass, J.R., A probabilistic framework for segment-based speech recognition, Computer Speech and Language, 17, 137-152, 2003.
  • Norris, D., Shortlist: A connectionist model of continuous speech recognition, Cognition, 52, 189-234, 1994.
  • Steinbiss, V., Ney, H., Haeb-Umbach, R., Tran, B.-H., Essen, U., Kneser, R., Oerder, M., Meier, H.-G., Aubert, X., Dugast, C., Geller, D., The Philips research system for large-vocabulary continuous speech recognition. Proceedings of Eurospeech, Berlin, Germany. pp. 2125-2128, 1993.
  • Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., The HTK book (for HTK version 3.2). Technical Report, Cambridge University, Engineering Department, 2002.



Last updated: July 17th, 2007