28-02-2017, 01:03 PM
Recognition of vowels using formantes
So let's run through an example of the process of recognizing the simple vowel "uh" as in "hood". First, we recorded four samples of the word "hood" from the four members of this group, and physically cut the vowels using the soundeditor program. These vowels are stored in the sound files: hood_a.se.bin, hood_j.se.bin, hood_s.se.bin and hood_t.se.bin (.au format, click to download). These files can be loaded into Matlab as a vector using the Matlab "auread" command. Most of the following charts use only three of the above recordings for graphical purposes.
Then load these sounds into Matlab. A vowel plot is in uh_normalized. We then normalized the amplitudes and centered these samples. A frame of the normalized sounds is in uh_normalized. You will notice that the vowels are almost periodic. Although the pitch, or frequency of periods, varies from speaker to speaker, vowels have a different shape that is repeated.
After normalization, we create an Auto-Regressive Model of the voice of the sound signal. We calculate the formants from the frequency response of the AR model and then compare them with the nearest standard vowel formants. In our example, the four vowels (one from each speaker) correctly matched the vowel "uh". In uh_formants, you can see how the formants of each speaker's vowels match the standard form values for "uh". The horizontal lines represent the standard forming frequencies. In uh_four_formants, we have superimposed the four forming plots for our samples. It is easy to see how close the formants of different speakers are. This diagram is quite demonstrative of the similarities in the speech patterns of several speakers.
We have a vowelrec.m matlab function that takes a vowel sound file and sample rate, and then proceeds to perform the steps mentioned above. It prints a formant graphic of the sound and returns a match of phenotype vowel.
We conclude our demonstration by carrying out the same process in an artificially complicated word: "syphilis". This word consists of three subtle vowels: "sif", "ful" and "lus". The audio recording of "syphillis" and its three vowels can be found in syphillis_t.se.bin, syph_1.se.bin, syph_2.se.bin and syph_3.se.bin (all .au files, click to to download) . The diagram of the word "syphillis" can be found in syphillis.gif. The plot of their normalized vowels is in syph_normalized.gif. It is possible to distinguish vowels from consonants in the plot by the repetitive nature of vowels.
Next, we proceed to calculate the formants of the vowels and then to do a correspondence of vowels. Unfortunately, as our study shows, vowel agreement is very successful with simple words like "head" or "bob", but with "syphillis", our matching program was not so successful. It was shown "sif" to match AH as in bud, "ful" paired to UH as in "hood", and "lus" paired to IY as in "heed". The formantes and formantes of vales to which they correspond are in syph_formants.gif. As it is visible in the last vowel, our formant calculation lost a formant, resulting in the wide discrepancy.
This demonstrates the fact that our program is not as effective with samples of vowels that are not stable enough. Perhaps if our sampling frequency were greater than 8 KHz, we could have been more successful.
However, with simpler words, our role was extremely successful. As it is visible in vowels_analyzed.gif, given samples of vowels from 4 different speakers, we had a correct match rate of 86%. The only vowel we did not consistently succeed in matching was AO as in "hawed". However, the fact is that none of us really know how to pronounce "hawed" anyway.
In conclusion, our vowel recognition process is very successful for simple words with distinct vowels, and is unsuccessful for phonetically complicated words.
So let's run through an example of the process of recognizing the simple vowel "uh" as in "hood". First, we recorded four samples of the word "hood" from the four members of this group, and physically cut the vowels using the soundeditor program. These vowels are stored in the sound files: hood_a.se.bin, hood_j.se.bin, hood_s.se.bin and hood_t.se.bin (.au format, click to download). These files can be loaded into Matlab as a vector using the Matlab "auread" command. Most of the following charts use only three of the above recordings for graphical purposes.
Then load these sounds into Matlab. A vowel plot is in uh_normalized. We then normalized the amplitudes and centered these samples. A frame of the normalized sounds is in uh_normalized. You will notice that the vowels are almost periodic. Although the pitch, or frequency of periods, varies from speaker to speaker, vowels have a different shape that is repeated.
After normalization, we create an Auto-Regressive Model of the voice of the sound signal. We calculate the formants from the frequency response of the AR model and then compare them with the nearest standard vowel formants. In our example, the four vowels (one from each speaker) correctly matched the vowel "uh". In uh_formants, you can see how the formants of each speaker's vowels match the standard form values for "uh". The horizontal lines represent the standard forming frequencies. In uh_four_formants, we have superimposed the four forming plots for our samples. It is easy to see how close the formants of different speakers are. This diagram is quite demonstrative of the similarities in the speech patterns of several speakers.
We have a vowelrec.m matlab function that takes a vowel sound file and sample rate, and then proceeds to perform the steps mentioned above. It prints a formant graphic of the sound and returns a match of phenotype vowel.
We conclude our demonstration by carrying out the same process in an artificially complicated word: "syphilis". This word consists of three subtle vowels: "sif", "ful" and "lus". The audio recording of "syphillis" and its three vowels can be found in syphillis_t.se.bin, syph_1.se.bin, syph_2.se.bin and syph_3.se.bin (all .au files, click to to download) . The diagram of the word "syphillis" can be found in syphillis.gif. The plot of their normalized vowels is in syph_normalized.gif. It is possible to distinguish vowels from consonants in the plot by the repetitive nature of vowels.
Next, we proceed to calculate the formants of the vowels and then to do a correspondence of vowels. Unfortunately, as our study shows, vowel agreement is very successful with simple words like "head" or "bob", but with "syphillis", our matching program was not so successful. It was shown "sif" to match AH as in bud, "ful" paired to UH as in "hood", and "lus" paired to IY as in "heed". The formantes and formantes of vales to which they correspond are in syph_formants.gif. As it is visible in the last vowel, our formant calculation lost a formant, resulting in the wide discrepancy.
This demonstrates the fact that our program is not as effective with samples of vowels that are not stable enough. Perhaps if our sampling frequency were greater than 8 KHz, we could have been more successful.
However, with simpler words, our role was extremely successful. As it is visible in vowels_analyzed.gif, given samples of vowels from 4 different speakers, we had a correct match rate of 86%. The only vowel we did not consistently succeed in matching was AO as in "hawed". However, the fact is that none of us really know how to pronounce "hawed" anyway.
In conclusion, our vowel recognition process is very successful for simple words with distinct vowels, and is unsuccessful for phonetically complicated words.