Speech Translation Research at SRI International
Full Spontaneous Translation
SRI's newest translation technology permits bidirectional, voice-to-voice machine
translation of spontaneous utterances.
Unlike the Phraselator or BPTS,
the full spontaneous translation system is not restricted to prerecorded
translations. It can translate a wider range of utterances,
including novel utterances it has never seen before.
Our most advanced translation system is IraqCommTM,
which has been in use in Iraq since early 2006. Below is described
our earlier work on a similar system for Pashto, a major language of
Afghanistan.
Speech synthesis output
In the full translation system, the computer-generated translations
(in both directions) are played through a speech synthesizer. While
the synthesized speech used in the full translation system is smooth
and fluent, it is necessarily of lower quality than the prerecorded
human translations used in the Phraselator and BPTS.
The speech synthesis technology in our translation systems
is provided by
Cepstral LLC. Translations
into English are synthesized in Cepstral's
off-the-shelf English voice. Foreign-language
voices are custom-built by Cepstral
specially for this project based on data that we provide.
Translation software
The full translation system relies entirely on computer-generated
translations. This makes it more flexible than the Phraselator and
BPTS, whose translations are hand-crafted in advance.
The heart of our computer-generated translation system is Gemini, a system developed in SRI's
Artificial Intelligence Center. The Gemini system can both interpret and generate natural language utterances, which makes it well-suited to automatic translation work.
Full translation proceeds as follows.
First, the Dynaspeak speech recognition system sends an
utterance in the source language (say, English) to Gemini, which converts
the utterance into a language-independent semantic form. Next, Gemini
generates from this semantic form a grammatical, natural-sounding utterance in the target language
(say, Pashto) that is semantically equivalent to the source utterance. This target-language translation is
then output through the speech synthesizer.
Gemini's translation abilities rely on sophisticated
grammars developed by linguists for both the source and target
languages. Our grammars of English and Pashto each contain thousands of
words and hundreds of grammatical rules.
This computer-generated translation approach is more flexible than the Phraselator and
BPTS, whose translations are hand-crafted in
advance, but is also more error-prone. In some cases Gemini cannot
interpret the meaning of the input utterance; in other cases, Gemini
cannot generate a suitable translation in the target language.
For this reason, the full translation system relies on two fall-back
translation strategies. The first is a statistical translation
system, which attempts to select a target translation with the highest
probability of matching the source utterance. The second strategy is
a word-by-word translation system, which produces a
literal, word-by-word translation (often awkward and ungrammatical) of the source utterance. Both of
these fall-back strategies tend to produce lower-quality translations
than Gemini, but can be valuable in cases where Gemini fails to
produce a translation.
History of full spontaneous translation
Work on the full translation system began early 2002.
The use of Gemini for producing computer-generated translations was
inspired by a previous SRI project called the Spoken Language Translator,
which lasted from 1992 until 1999. The Spoken Language Translator,
one of the first and most successful projects in the area of automatic
speech translation, was able to translate among English, French, and
Swedish in the domain of air travel planning.
At the heart of the Spoken Language Translator was a natural language processing system called the Core Language Engine, a predecessor of Gemini.
More information about the Spoken Language Translator
|