26-11-2010, 12:05 PM
Speech Recognition with Hidden Markov Models
HMMs for Speech
Speech is the output of an HMM; problem is to find most likely model for a given speech observation sequence.
Speech is divided into sequence of 10-msec frames, one frame per state transition (faster processing). Assume speech can be recognized using 10-msec chunks.
HMMs for Speech
Each state can be associated with sub-phoneme phoneme
sub-word
Usually, sub-phonemes or sub-words are used, to account for spectral dynamics (coarticulation).
One HMM corresponds to one phoneme or word
For each HMM, determine the probability of the best state sequence that results in the observed speech.
Choose HMM with best match (probability) to observed speech.
Given most likely HMM and state sequence, maybe determine the corresponding phoneme and word sequence.
7-state word model for “cat” with null states
Null states do not emit observations, and are entered and exited at the same time t. Theoretically, they are unnecessary. Practically, they can make implementation easier.
States don’t have to correspond directly to phonemes, but are commonly labeled using phonemes.
This permits several different models for each phoneme, depending on surrounding phonemes (context sensitive)
k-ae+t
p-ae+t
k-ae+p
Probability of “illegal” state sequence is zero (never used) sil-k+ae p-ae+t
Much larger number of states to train on… (50 vs. 125,000 for a full set of phonemes, 39 vs. 59,319 for reduced set).
For more information about this article,please follow the link:3
http://cslu.ogi.edu/people/hosom/cs552/l...speech.ppt