ASK HERE

project topics · 14-04-2011, 02:55 PM

[attachment=12196]

Submitted by
M. GLORIA JACINTH
Department of Electronics and Communication Engineering
MALLA REDDY ENGINEERING COLLEGE
Maisamaguda, Secunderabad-500014

ABSTRACT

This project presents a novel voice verification system using wavelet transforms. The conventional signal processing techniques assume the signal to be stationary and are ineffective in recognizing non stationary signals such as the voice signals. Voice signals which are more dynamic could be analyzed with far better accuracy using wavelet transform. The developed voice recognition system is word dependant voice verification system combining the RASTA and LPC. The voice signal is filtered using the special purpose voice signal filter using the Relative Spectral Algorithm (RASTA). The signals are de=noised and decomposed to derive the wavelet coefficients and thereby a statistical computation is carried out. Further the formant or the resonance of the voices signal is detected using the Linear Predictive Coding (LPC). With the statistical computation on the coefficients alone, the accuracy of the verifying sample individual voice to his own voice is quite high (around 75% to 80%). The reliability of the signal verification is strengthened by combining entailments from these two completely different aspects of the individual voice. For voice comparison purposes four out five individuals are verified and the results show higher percentage of accuracy. The accuracy of the system can be improved by incorporating advanced pattern recognition techniques such as Hidden Markov Model (HMM).

CHAPTER-1
1.1 INTRODUCTION

Speech is a very basic way for humans to convey information to one another. With a bandwidth of only 4 kHz, speech can convey information with the emotion of a human voice. People want to be able to hear someone’s voice from anywhere in the world. As if the person was in the same room. As a result a greater emphasis is being placed on the design of new and efficient speech coders for voice communication and transmission.
Today applications of speech coding and compression have become very numerous. Many applications involve the real time coding of speech signals, for use in mobile satellite communications, cellular telephony, and audio for videophones or video teleconferencing systems. Other applications include the storage of speech for speech synthesis and playback, or for the transmission of voice at a later time. Some examples include voice mail systems, voice memo wristwatches, voice logging recorders and interactive PC software. Traditionally speech coders can be classified into two categories: waveform coders and analysis/synthesis vocoders (from .voice coders.). Waveform coders attempt to copy the actual shape of the signal produced by the microphone and its associated analogue circuits
A popular waveform coding technique is pulse code modulation (PCM). This is used in telephony today. Vocoders use an entirely different approach to speech coding, known as parameter coding, or analysis/synthesis coding where no attempt is made at reproducing the exact speech waveform at the receiver, only a signal perceptually equivalent to it. These systems provide much lower data rates by using a functional model of the human speaking mechanism at the receiver. One of the most popular techniques for analysis synthesis coding of speech is called Linear Predictive Coding (LPC). Some higher quality vocoders include RELP (Residual Excited Linear Prediction) and CELP (Code Excited Linear Prediction)
This document looks at a new technique for analyzing and compressing speech signals using wavelets. Very simply wavelets are mathematical functions of finite duration with an average value of zero that are useful in representing data or other functions.
Any signal can be represented by a set of scaled and translated versions of a basic function called the mother wavelet. This set of wavelet functions forms the wavelet coefficients at different scales and positions and results from taking the wavelet transform of the original signal. The coefficients represent the signal in the wavelet domain and all data operations can be performed using just the corresponding wavelet coefficients.
Speech is a non-stationary random process due to the time varying nature of the human speech production system. Non-stationary signals are characterized by numerous transitory drifts, trends and abrupt changes. The localization feature of wavelets, along with its time-frequency resolution properties makes them well suited for coding speech signals.
In designing a wavelet based speech coder, the major issues explored in this thesis are:
i. Choosing optimal wavelets for speech,
ii. Decomposition level in wavelet transforms,
iii. Threshold criteria for the truncation of coefficients,
iv. Efficiently representing zero valued coefficients and
v. Quantizing and digitally encoding the coefficients.
The performance of the wavelet compression scheme in coding speech signals and the
Quality of the reconstructed signals is also evaluated.

1.2 PRINCIPLES OF SPEAKER RECOGNITION
Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. Figure 1 shows the basic structures of speaker identification and verification systems.
Speaker recognition methods can also be divided into text-independent and text-dependent methods. In a text-independent system, speaker models capture characteristics of somebody’s speech which show up irrespective of what one is saying. In a text-dependent system, on the other hand, the recognition of the speaker’s identity is based on his or her speaking one or more specific phrases, like passwords, card numbers, PIN codes, etc.
All technologies of speaker recognition, identification and verification, text-independent and text-dependent, each has its own advantages and disadvantages and may requires different treatments and techniques. The choice of which technology to use is application-specific. The system that we will develop is classified as text-independent speaker identification system since its task is to identify the person who speaks regardless of what is saying.
At the highest level, all speaker recognition systems contain two main modules (refer to Figure 1): feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing extracted features from his/her voice input with the ones from a set of known speakers. We will discuss each module in detail in later sections.

All speaker recognition systems have to serve two distinguish phases. The first one is referred to the enrollment sessions or training phase while the second one is referred to as the operation sessions or testing phase. In the training phase, each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. In case of speaker verification systems, in addition, a speaker-specific threshold is also computed from the training samples. During the testing (operational) phase (see Figure 1), the input speech is matched with stored reference model(s) and recognition decision is made.
Speaker recognition is a difficult task and it is still an active research area. Automatic speaker recognition works based on the premise that a person’s speech exhibits characteristics that are unique to the speaker. However this task has been challenged by the highly variant of input speech signals. The principle source of variance is the speaker himself. Speech signals in training and testing sessions can be greatly different due to many facts such as people voice change with time, health conditions (e.g. the speaker has a cold), speaking rates, etc. There are also other factors, beyond speaker variability, that present a challenge to speaker recognition technology. Examples of these are acoustical noise and variations in recording environments (e.g. speaker uses different telephone handsets).
1.3 GENERAL IDEA OF SPEECH RECOGNITION
Human speech presents a formidable pattern classification task for speech recognition system. Numerous speech recognition techniques have been formulated yet the very best techniques used today have recognition capabilities well below those of a child. This is due to the fact that human speech is highly dynamic and complex. There are generally several types of disciplines present in the human speech. A basic understanding of these disciplines is needed in order to create an effective system. The following provide a brief description of the disciplines that have been applied to speech recognition problems
1.3.1 SIGNAL PROCESSING
This process extracts the important information from the speech signal in a well-organized manner. In signal processing, spectral analysis is used to characterize the time varying properties of the speech signal. Several other types of processing are also needed prior to the spectral analysis stage to make the speech signal more accurate and robust.
1.3.2 ACOUSTICS
It is the science of understanding the relationship between the physical speech signal and the human vocal tract mechanisms that produce the speech and with which the speech is distinguished.
1.3.3 PATTERN RECOGNITION
A set of coding algorithm used to compute data to create prototypical patterns of a data ensemble. It is used to compare a pair of patterns based on the features extracted from the speech signal.
1.4 COMMUNICATION AND INFORMATION THEORY
The procedures for estimating parameters of the statistical models and the methods for recognizing the presence of speech patterns.
1.4.1 LINGUISTICS
This refers to the relationships between sounds, words in a sentence, meaning and logic of spoken words.
1.4.2 PHYSIOLOGY
This refers to the comprehension of the higher-order mechanisms within the human central nervous system. It is responsible for the production and perception of speech within the human beings.
1.4.3 COMPUTER SCIENCE
The study of effective algorithms for application in software and hardware. For example, the various methods used in a speech recognition system.
1.4.4 PSYCHOLOGY
The science of understanding the aspects that enables the technology to be used by human beings.
1.5 SPEECH PRODUCTION
Speech is the acoustic product of voluntary and well-controlled movement of a vocal mechanism of a human. During the generation of speech, air is inhaled into the human lungs by expanding the rib cage and drawing it in via the nasal cavity, velum and trachea. It is then expelled back into the air by contracting the rib cage and increasing the lung pressure. During the expulsion of air, the air travels from the lungs and passes through vocal cords which are the two symmetric pieces of ligaments and muscles located in the larynx on the trachea. Speech is produced by the vibration of the vocal cords. Before the expulsion of air, the larynx is initially closed. When the pressure produced by the expelled air is sufficient, the vocal cords are pushed apart, allowing air to pass through. The vocal cords close upon the decrease in air flow. This relaxation cycle is repeated with generation frequencies in the range of 80Hz – 300Hz. The generation of this frequency depends on the speaker’s age, sex, stress and emotions. This succession of the glottis openings and closure generates quasi-periodic pulses of air after the vocal cords.

The speech signal is a time varying signal whose signal characteristics represent the different speech sounds produced. There are three ways of labeling events in speech. First is the silence state in which no speech is produced. Second state is the unvoiced state in which the vocal cords are not vibrating, thus the output speech waveform is aperiodic and random in nature. The last state is the voiced state in which the vocal cords are vibrating periodically when air is expelled from the lungs. This results in the output speech being quasi-periodic. Figure 2 below shows a speech waveform with unvoiced and voiced state.
Speech is produced as a sequence of sounds. The type of sound produced depends on shape of the vocal tract. The vocal tract starts from the opening of the vocal cords to the end of the lips. Its cross sectional area depends on the position of the tongue, lips, jaw and velum. Therefore the tongue, lips, jaw and velum play an important part in the production of speech.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	INTEGRATED EMERGENCY RESPONSE SYSTEM USING EMBEDDED SYSTEM	seminar presentation	1	9,036	19-11-2018, 08:40 PM Last Post:
	FINGER PRINT BASED ELECTRONIC VOTING MACHINE full report	project topics	60	50,652	11-05-2017, 10:43 AM Last Post: jaseela123d
	Voice Based Automated Transport Enquiry System	seminar class	2	3,161	05-10-2016, 09:34 AM Last Post: ijasti
	AUTOMATIC BUS STATION ANNOUNCEMENT SYSTEM full report	project report tiger	4	10,793	13-08-2016, 11:16 AM Last Post: jaseela123d
	DEVELOPMENT OF SOFTWARE & HARDWARE FOR MICROCONTROLLER BASED SMART NOTICE BOARD (US	Electrical Fan	3	3,854	09-08-2016, 11:27 AM Last Post: jaseela123d
	MICROCONTROLLER BASED DAM GATE CONTROL SYSTEM full report	seminar class	13	17,151	19-06-2016, 07:53 PM Last Post: Saianjana
	METAL DETECTOR full report	project report tiger	14	23,777	12-03-2016, 01:51 PM Last Post: seminar report asees
	Solar power plant full report	seminar class	2	3,340	11-11-2015, 01:49 PM Last Post: seminar report asees
	MICROCONTROLLER BASED AUTOMATIC RAILWAY GATE CONTROL full report	project topics	49	57,891	10-09-2015, 03:18 PM Last Post: seminar report asees
	RELAY CO-ORDINATION full report	project report tiger	2	4,413	24-02-2015, 10:18 AM Last Post: seminar report asees

Important Note..!

ASK HERE