Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Speech Translation Research at SRI International pic

Full Spontaneous Translation

SRI's newest translation technology permits bidirectional, voice-to-voice machine translation of spontaneous utterances.

Unlike the Phraselator or BPTS, the full spontaneous translation system is not restricted to prerecorded translations. It can translate a wider range of utterances, including novel utterances it has never seen before.

Our most advanced translation system is IraqCommTM, which has been in use in Iraq since early 2006. Below is described our earlier work on a similar system for Pashto, a major language of Afghanistan.

Speech synthesis output

In the full translation system, the computer-generated translations (in both directions) are played through a speech synthesizer. While the synthesized speech used in the full translation system is smooth and fluent, it is necessarily of lower quality than the prerecorded human translations used in the Phraselator and BPTS.

The speech synthesis technology in our translation systems is provided by Cepstral LLC. Translations into English are synthesized in Cepstral's off-the-shelf English voice. Foreign-language voices are custom-built by Cepstral specially for this project based on data that we provide.

pic

Translation software

The full translation system relies entirely on computer-generated translations. This makes it more flexible than the Phraselator and BPTS, whose translations are hand-crafted in advance.

The heart of our computer-generated translation system is Gemini, a system developed in SRI's Artificial Intelligence Center. The Gemini system can both interpret and generate natural language utterances, which makes it well-suited to automatic translation work.

Full translation proceeds as follows. First, the Dynaspeak speech recognition system sends an utterance in the source language (say, English) to Gemini, which converts the utterance into a language-independent semantic form. Next, Gemini generates from this semantic form a grammatical, natural-sounding utterance in the target language (say, Pashto) that is semantically equivalent to the source utterance. This target-language translation is then output through the speech synthesizer.

Gemini's translation abilities rely on sophisticated grammars developed by linguists for both the source and target languages. Our grammars of English and Pashto each contain thousands of words and hundreds of grammatical rules.

This computer-generated translation approach is more flexible than the Phraselator and BPTS, whose translations are hand-crafted in advance, but is also more error-prone. In some cases Gemini cannot interpret the meaning of the input utterance; in other cases, Gemini cannot generate a suitable translation in the target language.

For this reason, the full translation system relies on two fall-back translation strategies. The first is a statistical translation system, which attempts to select a target translation with the highest probability of matching the source utterance. The second strategy is a word-by-word translation system, which produces a literal, word-by-word translation (often awkward and ungrammatical) of the source utterance. Both of these fall-back strategies tend to produce lower-quality translations than Gemini, but can be valuable in cases where Gemini fails to produce a translation.

History of full spontaneous translation

Work on the full translation system began early 2002. The use of Gemini for producing computer-generated translations was inspired by a previous SRI project called the Spoken Language Translator, which lasted from 1992 until 1999. The Spoken Language Translator, one of the first and most successful projects in the area of automatic speech translation, was able to translate among English, French, and Swedish in the domain of air travel planning. At the heart of the Spoken Language Translator was a natural language processing system called the Core Language Engine, a predecessor of Gemini.

More information about the Spoken Language Translator

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Jun 11, 2007