eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows.

eSpeak uses a “format synthesis” method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but not as natural or smooth as larger synthesizers which are based on human speech recordings.


  • Includes different voices, whose characteristics can be altered.
  • Can produce speech output as a WAV file.
  • SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
  • Compact size. The program and its data, including many languages, total about 2 MB.
  • Can be used as a front-end MBROLA diphone voices. eSpeak converts text to phonemes with pitch and length information.
  • Can translate text into phoneme codes, so it could be adapted as a front end for other speech synthesis engines.
  • Potential for other languages. Several are included in varying stages of progress.

Configuration Options

Variable Default Value Description
male true Voice gender, male or female
voice_number 1 Male supports voices 1-4, Female, 1-7