02-02-2012, 10:34 AM
SPEECH RECOGNITION
TOPICS
What is speech recognition?
Two main subsystems of speech recognition :
- LPC(LINEAR PREDICTIVE CODING)
-MFCC(MEL-FREQUENCY CEPSTRAL COEFFICIENTS)
What are LPC and MFCC exactly?
Calculation of these coefficients to improve reliability of speech recognition systems.
Shortcomings that need to be improved.
Applications
Conclusion
Introduction to Speech Recognition Technique
Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words.
These words can serve as an input to further linguistic processing in order to achieve speech understanding
A Speech Recognition system consists of the following components:
Signal processing, speech decoding, and adaptation.
A speaker generates a word sequence which is passed through a communication channel to produce a waveform.
The speech waveform is passed to the signal-processing component of the speech recognizer which will generate a parameterized acoustic signal.
The speech decoder component decodes the
MEL FREQUENCY CEPSTRAL COEFFICIENTS
In sound processing, the Mel-Frequency cepstrum is a representation of short-term power spectrum of sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"
Basic working principle pf LPC
The general algorithm for linear predictive coding involves an analysis or encoding part and a synthesis or decoding part.
In the encoding, LPC takes the speech signal in blocks or frames of speech and determines the input signal and the coefficients of the filter that will be capable of reproducing the current block of speech.
This information is quantized and transmitted.
In the decoding, LPC rebuilds the filter based on the coefficients received.