ASK HERE

seminar class · 02-05-2011, 02:39 PM

[attachment=13248]
ABSTRACT
Language Identification is process of identifying the language being spoken from a sample of speech by an unknown speaker. Most of the previous work in this field is based on the fact that phoneme sequences have different occurrence probabilities in different languages, and all the systems designed till now have tried to exploit this fact.
Language identification process in turn consists of two sub-systems. First system converts speech into some intermediate form called as phoneme sequences, which are used to model the language by doing their probabilistic analysis in the second sub-system. In this project both of the sub-systems are targeted. First some algorithms are discussed for designing language models. Then an attempt is made to design an algorithm for extracting phoneme sequences in form of more abstract classes derived by statistical tools like Gaussian Mixture Models (GMM) and Hidden Markov Model (HMM).
1. Introduction
The problem of Language Identification (language ID) is defined as recognizing the language being spoken from a sample of speech by an unknown speaker [3]. The human is by far the best language ID system in operation today, with accuracy as high as hundred percent in case if they know the language and can make a pretty reasonable guess about them in case if they don’t. This project has tried to develop this ability in machines.
Several important applications already exist for language ID. A language ID system could be used as a 'front-end' system to a telephone-based company, routing the caller to an appropriate operator fluent in the caller's language. Currently either a manual system or IVRS based system exists. But, both of them suffer from two main problems, however: speed and expense. It is highly expensive to employ the call routers for AT&T who, between them, must be able to correctly route 140 languages or Reliance Infocomm for more than 10 languages. For emergency services, this could be a fatal delay. Other application includes usage of such systems in war times when soldiers are doing rescue operations in alien lands, to communicate with local person. Another application which actually has been implemented during this project includes its usage in designing Content Verification System (CVS), which is used for verification of the speech data stored for different languages. As research in automatic speech recognition progresses, a language ID system would be necessary for any multi-lingual speech recognition system. One such system may be a fast information system, say at an airport, catering for multi-national clients. Another may be an automatic translation system. Both these systems would need to first recognize the language that was being spoken before they could process it.
There are number of ways to achieve this task of language ID, like based on spectral features of speech, or based on word lexicon or identifying presence of some distinct characteristics in different languages like special phonemes. Ones discussed here are based on phonetic characteristics.
To design a language ID we should have a proper knowledge about speech and its components. Speech is basically consists of small units of sound called as phonemes. For example if you speak word BAT, then \b, \@, \t are three phonemes which together forms this sound. Now for language identification using phonetic characteristics, first a speech is converted into phoneme sequences, which can be done using various methods. In this project the phoneme recognizer used, is based on Hidden Markov Models (HMM). Once speech is converted into phoneme sequences then a probabilistic analysis is done which in turn is divided on three phases; training, tuning & testing, and for that the corpus is also divided accordingly. In second phase of project I have actually attempted to supersede the conventional HMM way of converting speech into phoneme sequences by using more abstract classes derived using statistical tools like Gaussian Mixture Method (GMM) and HMM and then passing it through the same language models, which were designed in first phase.
All experiments were conducted on two speech data set; ‘Radio broadcast data’ and ‘Customer care speech data’, in two languages (Hindi & English). Each data set was in turn divided into three disjoint set, so that a proper testing of language models can be done and so that there is no biasing in the results. Though results from the later data set are of more significance because of its application but performance was better in case of former.
A brief discussion on the research done in this field along with some example is given in section 2. In following section 4 & 5, work done by me along with the algorithm proposed is discussed. After that in section 6, I have discussed about the tools which have been designed by me based on the algorithm proposed, followed by conclusion and references in subsequent sections.
2. Background Work
This section reviews the current methods used for language ID and discusses previous research in the area of feature vectors and language models.
2.1. Distinct Characteristics of Language
“Each language has a finite set of phonemes. As we learn our first language, we also learn to identify them. When listening to a foreign language, with phonemes not found in our first language, the presence of such sounds is readily apparent to us. Examples are the "clicks" found in some sub-Saharan African languages.
As the vocal apparatus used in the production of languages is universal, there is much overlap of the phoneme sets, and the total number of phonemes is finite. But there can be differences in the way the same phoneme is interpreted in two different languages. For example, in English, /l/ and /r/ (as in "leaf" and "reef") are two different phonemes, whereas in Japanese they are not”.[12]
On the contrary, the frequency of occurrence of phones and the phonotactic rules in languages can differ significantly. Phonotactic rules govern the way different phonemes are combined. For example, phoneme clusters /sr/ and /sp/ are quite common in Tamil and German respectively (the latter could be represented as /shp/ in English), but are rare in English. This is what we have tried to exploit and use to design an algorithm for language identification, in this project.
2.2. Overview on Language Identifiers
Language IDs works as a single entity in many applications, but it is, in itself a set of three black boxes; front-end processing system, phoneme recognizer, and language models. Speech Data is given as an input to these set of boxes and then it flows into the system as shown in the figure. Implementation of every system is hidden from others; only interfaces are standardized as we do in case of OSI Layers of networking. By standardization, we mean the format of data, which will be passed from one system to another, is fixed.
2.2.1. Front-End Processing
Main purpose of front-end processing is the feature vector extraction. Many different algorithms exist for speech recognition and language identification. A common need between them is some form of parameterized representation (feature vectors) of the speech input. These feature vector streams may then be used to train or interrogate the language models which will follow the feature extraction module in a typical language identification system [6]. It is obvious that there exist an infinite number of ways to encode the speech, depending upon which particular numerical measures are deemed useful. Over the many years of speech recognition research, there has been a convergence towards a few (spectrally based) features that perform well. Of these, Linear Prediction (LP) and Cepstral measures are most widely used [5].
The final test for any such front-end is its effect on the accuracy of the overall language ID system. In this respect the system based on Cepstral compares favorably with any other we have come across in our investigation.
2.2.2. Phoneme Recognizer
The basic aim behind this system is to generate the phoneme sequences from the vector sequences. There are 56 phonemes, and their different combinations can represent all possible speeches in various languages. We used Hidden Markov Models (HMMs) for this purpose.
HMM models are primarily probabilistic state machines, in which each state represents a phoneme. Now there are two kinds of probabilities attached with each state. First, is the probability with which we can say which will be the next state (Bxx, as shown in fig.) and second, is the probability with which we can say what will be the output sound when this state is reached, which are represented by Ax(Oy) in the figure. Basically HMMs are used for three problems, out of which one which we will be using it for is to find out the most probable state sequence given a sequence of output sound [1]. So basically what it does is that when processed speech vectors are passed through this system it gives sequence of phonemes. Usually you have phoneme recognizer specific to a language, depending on the training data used. We will discuss about this in later sections.
2.2.3. Language Models
These are the most important aspect of a language ID. There basic aim is to predict the language given a phoneme sequence, and for this purpose some kind of probabilistic analysis is done which is implementation dependent. This is precisely my area of work. There are various ways of implementing this. One of the properties that most methods have in common is that they are made up of two phases: training and recognition. The latter may only be performed after the former, which involves presenting the system with speech from target languages (i.e. those that we are trying to recognize). Different systems then model languages according to particular language-dependent features. During recognition, these features are compared to those of utterances being tested, in order to decide which language is the correct one. The simplest form of training uses only a sampled speech wave, and the true identity of the language being spoken. More complex approaches use phonetic transcriptions (a sequence of symbols representing the sounds in each utterance), or orthographic transcriptions (the text of the words spoken), along with a pronunciation dictionary, which would map each word to its representation. Such methods are obviously more costly and time-consuming, not least because fluent speakers for each target language are required. We have used the former approach for training.
Language models will be discussed in great depth in next section.
2.3. Example: Phonetic Recognition/Language Modeling
In last section we have seen that it is the likelihood of occurrence of a phoneme in a language which differentiates one language from another. Phonetic Recognition, Language Modeling (PRLM) is based on this principle. This system uses acoustic pre-processing for feature vector extraction as discussed in section 2.2.1. Then a language specific phoneme recognizer is placed to convert speech into phoneme sequences and at the end lies the language models which calculates the n-Gram probabilities. Figure gives a graphical view of the system
Disadvantage of this system is that it uses a single language dependent phoneme recognizer, which can make its results bias to the language in which recognizer is trained because the phones present in target languages do not always occur in the language used during training. We may wish to incorporate sounds from more than one language into a PRLM-like system. An alternative to it can be to use multiple PRLM systems in parallel, with recognizers trained in different languages, as shown in figure.
While enhancing performance, this approach has a couple of disadvantages, namely the need for labeled training speech in more than one language, and the increased processing time.
3. Objectives
The major objective or goals which were set before starting this project are as follows:
• An algorithm for more accurate language ID, in field of language models.
• Design some alternative to conventional front-end processing, which should not be language dependent.
• A tool based on above algorithms for call routing in Customer Care Center. This tool will also enable administrator to manage the speech corpus.
• A tool based on above algorithm for ‘Content Verification System’ for verifying the data files present in data servers.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	FINGER PRINT BASED ELECTRONIC VOTING MACHINE full report	project topics	60	50,652	11-05-2017, 10:43 AM Last Post: jaseela123d
	Microcontroller Based Cellular Voting Machine	seminar projects crazy	5	6,589	10-05-2016, 04:19 PM Last Post: pankaj raj
	PNEUMATIC MULTI-PURPOSE MACHINE	smart paper boy	1	2,060	22-12-2012, 12:04 AM Last Post: [email protected]
	PLC based automatic multi machine lubricating system	project topics	3	4,198	05-10-2012, 04:05 PM Last Post: seminar details
	ELECTRONIC VOTING MACHINE	projectsofme	6	6,100	04-10-2012, 12:52 PM Last Post: seminar details
	GSM Based Voting Machine	seminar class	4	4,189	05-06-2012, 11:47 AM Last Post: seminar details
	electronic voting machine project full report	seminar topics	2	16,666	21-01-2012, 11:28 AM Last Post: seminar addict
	SMS BASED MACHINE MONITORING SYSTEM	project topics	8	3,206	13-10-2011, 09:40 AM Last Post: seminar addict
	ASYNCHRONOUS MACHINE MODELING USING SIMULINK FED BY PWM INVERTER	smart paper boy	0	1,280	30-08-2011, 09:43 AM Last Post: smart paper boy
	Tunnel Boring Machine (TBM)	smart paper boy	0	1,079	29-08-2011, 12:06 PM Last Post: smart paper boy

Important Note..!

ASK HERE