Speech Synthesis

Why Trust Techopedia

What Does Speech Synthesis Mean?

Speech synthesis is artificial simulation of human speech with by a computer or other device. The counterpart of the voice recognition, speech synthesis is mostly used for translating text information into audio information and in applications such as voice-enabled services and mobile applications. Apart from this, it is also used in assistive technology for helping vision-impaired individuals in reading text content.

Advertisements

Techopedia Explains Speech Synthesis

Homer Dudley’s VODER, which was based on the vocoder from Bell Laboratories, is considered the first fully functional voice synthesizer. The computer used in speech synthesis is known as a speech synthesizer or speech computer. The quality of the speech computer is often judged by its similarity to the human voice. Most computer operating systems have incorporated speech synthesizers since the early 1990s. Synthesized speech is usually generated with the help of concatenating pieces of recorded speech, which is contained in a database.

The initial stage in speech synthesis is pre-processing, which eliminates the ambiguity surrounding the manner in which the specific word needs to be read, and which also includes handling homographs. In the next stage of speech synthesis, the computer takes the help of phonemes to convert the text into sequence of sounds. The last stage involves the use of human recordings or basic sound generation techniques to mimic the human voice mechanism and read out the entire text. One of the popular branches of speech synthesis is the audio-visual speech synthesis or multimodal speech synthesis which makes use of an animated face tightly synchronized to complement the synthesized speech. Multimodal speech synthesis also incorporates additional features such as non-verbal cues to the speech to help in communicating the user’s words with more accuracy. Many speech synthesis systems allow users to choose the type of voice such as male or female voice.

Most speech synthesis systems are capable of reading texts and outputting them in a very intelligent manner though the voice can at times be dull. Speech synthesis, however, is yet to develop the ability to fully imitate the wide spectrum of human intonations and cadences.

Advertisements

Related Terms

Margaret Rouse
Technology Expert
Margaret Rouse
Technology Expert

Margaret é uma premiada redatora e professora conhecida por sua habilidade de explicar assuntos técnicos complexos para um público empresarial não técnico. Nos últimos vinte anos, suas definições de TI foram publicadas pela Que em uma enciclopédia de termos tecnológicos e citadas em artigos do New York Times, Time Magazine, USA Today, ZDNet, PC Magazine e Discovery Magazine. Ela ingressou na Techopedia em 2011. A ideia de Margaret de um dia divertido é ajudar os profissionais de TI e de negócios a aprenderem a falar os idiomas altamente especializados uns dos outros.