ASK HERE

seminar class · 21-04-2011, 04:02 PM

[attachment=12579]
1. INTRODUCTION
Automatic Speech-To-Speech (S2S) translation breaks down communication barriers between people who do not share a common language and hence enable instant oral cross-lingual communication for many critical applications such as emergency medical care. The development of an accurate, efficient and robust S2S translation system poses a lot of challenges. This is especially true for colloquial speech and resource deficient languages.
The IBM MASTOR speech-to-speech translation system has been developed for the DARPA (Defense Advanced Research Projects Agency)’s CAST and TRANSTAC programs whose mission is to develop technologies that enable rapid deployment of real-time S2S translation of low-resource languages on portable devices. It originated from the IBM MARS S2S system handling the air travel reservation domain described in [1], which was later significantly improved in all components, including ASR (Automatic Speech Recognition), MT (Machine Translation) and TTS (Text To Speech), and later evolved into the MASTOR multilingual S2S system that covers much broader domains such as medical treatment and force protection. More recently, we have further broadened our experience and efforts to very rapidly develop systems for under-studied languages, such as regional dialects of Arabic. The intent of this program is to provide language support to military, medical and humanitarian personnel during operations in foreign territories, by deciphering possibly critical language communications with a two-way real-time speech-to-speech translation system designed for specific tasks such as medical triage and force protection.
The initial data collection effort for the project has shown that the domain of force protection and medical triage is, though limited, rather broad. In fact, the definition of domain coverage is tough when the speech from responding foreign language speakers are concerned, as their responses are less constrained and may include out-of-domain words and concepts. Moreover, flexible casual or colloquial speaking style inevitably appears in the human- to-human conversational communications. Therefore, the project is a great challenge that calls for major research efforts.
Among all the challenges for speech recognition and translation for under-studied languages, there are two main issues: 1) Lack of appropriate amount of speech data that represent the domain of interest and the oral language spoken by the target speakers, resulting in difficulties in accurate estimation of statistical models for speech recognition and translation. 2) Lack of linguistic knowledge realization in spelling standards, transcriptions, lexicons and dictionaries, or annotated corpora. Therefore, various different approaches have to be explored.
Another critical challenge is to embed complicated algorithms and programs into small devices for mobile users. A hand-held computing device may have a CPU of 256MHz and 64MB memory; to fit the programs, as well as the models and data files into this memory and operate the system in real-time are tremendous challenges.
In this paper, we will describe the overall framework of the MASTOR system and our approaches for each major component, i.e., speech recognition and translation. Various statistical approaches are explored and used to solve different technical challenges. We will show how we addressed the challenges that arise when building automatic speech recognition (ASR) and machine translation (MT) for colloquial Arabic on both the laptop and handheld PDA platforms.
2. SYSTEM OVERVIEW
IBM MASTOR (Multilingual Automatic Speech-To-Speech TranslatOR) is IBM’s highly trainable speech-to-speech translation system, targeting conversational spoken language translation between English and Mandarin Chinese for limited domains. Figure 1 depicts the architecture of MASTOR. The speech input is processed and decoded by a large-vocabulary speech recognition system. Then the transcribed text is analyzed by a statistical parser for semantic and syntactic features.
A sentence-level natural language generator based on maximum entropy (ME) modeling is used to generate sentences in the target language from the parser output. The produced sentence in target language is synthesized into speech by a high quality text-to-speech system.
The general framework of our speech translation system is illustrated in Figure 2. The general framework of our MASTOR system has components of ASR (Automatic Speech Recognition), MT (Machine Translation) and TTS (Text To Speech). ASR converts user’s speech to text in source language and then MT translates the source text into the target language. In the end, SST creates the synthesized speech from the target text. The cascaded approach allows us to deploy the power of the existing advanced speech and language processing techniques, while concentrating on the unique problems in speech-to-speech translation. Figure 3 illustrates the MASTOR GUI (Graphic User Interface) on laptop and PDA, respectively.
Figure 2. IBM MASTOR Speech-to-Speech Translation System
Acoustic models for English and Mandarin baseline are developed for large-vocabulary continuous speech and trained on over 200 hours of speech collected from about 2000 speakers for each language. However, the Arabic dialect speech recognizer was only trained using about 50 hours of dialectal speech. The training data for Arabic consists of about 200K short utterances. Large efforts were invested in initial cleaning and normalization of the training data because of large number of irregular dialectal words and variations in spellings. We experimented with three approaches for pronunciation and acoustic modeling: i.e. grapheme, phonetic, and context-sensitive grapheme as will be described in section 3.A. We found that using context-sensitive pronunciation rules reduces the WER of the grapheme based acoustic model by about 3% (from 36.7% to 35.8%). Based on these results, we decided to use context-sensitive grapheme models in our system.
The Arabic language model (LM) is an interpolated model consisting of a trigram LM, a class-based LM and a morphologically processed LM, all trained from a corpus of a few hundred thousand words. We also built a compact language model for the hand-held system, where singletons are eliminated and bigram and trigram counts are pruned with increased thresholds. The LM footprint size is 10MB.
There are two approaches for translation. The concept based approach uses natural language understanding (NLU) and natural language generation models trained from an annotated corpus. Another approach is the phrase-based finite state transducer which is trained using an un-annotated parallel corpus. A trainable, phrase-splicing and variable substitution TTS system is adopted to synthesize speech from translated sentences, which has a special ability to generate speech of mixed languages seamlessly. In addition, a small footprint TTS is developed for the handheld devices using embedded concatenative TTS technologies. Next, we will describe our approaches in automatic speech recognition and machine translation in greater detail.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	vote of thanks speech malayalam pdf		3	3,965	22-08-2017, 05:18 AM Last Post: Shahinsha SarSha
	anchoring speech for farewell party in gujrati		1	1,766	02-03-2017, 12:35 PM Last Post: jaseela123d
	welcome speech in malayalam pdf		1	1,777	28-02-2017, 10:08 AM Last Post: jaseela123d
	sample of farewell speech in punjabi language		1	3,016	24-08-2016, 11:16 AM Last Post: ijasti
	best speech in hindi for farewell party by anker		2	2,200	23-08-2016, 11:13 AM Last Post: ijasti
	anchoring speech for technical event		2	1,028	28-07-2016, 03:59 PM Last Post: visalakshik
	retirement speech in hindi pdf		1	2,531	28-07-2016, 03:41 PM Last Post: jaseela123d
	vote of thanks speech malayalam pdf		1	2,932	28-07-2016, 03:13 PM Last Post: jaseela123d
	farewell party speech in hindi pdf		3	3,395	22-07-2016, 04:40 PM Last Post: jaseela123d
	valedictory function speech on anchoring script		1	2,037	22-07-2016, 02:28 PM Last Post: visalakshik

Important Note..!

ASK HERE