29-08-2011, 04:06 PM
ABSTRACT
Speech and Audio processing techniques are used along with statistical pattern recognition principles to solve the problem of music instrument recognition. Non temporal, frame level features only are used so that the proposed system is scalable from the isolated notes to the solo instrumental phrases scenario without the need for temporal segmentation of solo music. Based on their effectiveness in speech, Line Spectral Frequencies(LSF) are proposed as features for music instrument recognition. The proposed system has also been evaluated using MFCC and LPCC features. Gaussian Mixture Models and K-Nearest Neighbour model classifier are used for classification. The experimental dataset included the UIowa’s MIS and the C Music corporation’s RWC databases. Our best results at the instrument family level is about 95% and at the instrument level is about 90% when classifying 14 instruments.
1. INTRODUCTION
While much has been achieved in the field of speech content analysis(Automatic Speech Recognition (ASR), Language Identification (LID), Speaker Identification (SID) etc), music content analysis is relatively in its infancy. In the broad area of Music Content Analysis, sound source recognition (recognition ofmusical instruments and others) forms a very important part. Music Content analysis has a lot of applications including media annotation, singer identification, music transcription, structured audio coding, information retrieval etc. Drawing analogies from speech processing, ASR corresponds to automatic music transcription, LID to music genre recognition and SID to music instrument recognition. The solution to these three important problems has reached a certain maturity in speech, and we look to draw from that, although speech and music are quite different. There has been a lot of work in the area of Music Instrument Recognition (MIR). A brief collection of those that are most relevent to the work presented here are discussed. Brown [2] has used SID techniques to determine the properties most useful in identifying sounds from 4 woodwind instruments. Cepstral coefficients, bin-to-bin differences of constant-Q transform coefficients and autocorrelation coefficients were used as features with gaussian mixture model based classifiers, obtaining accuracies of about 79% to 84%. Excerpts from commercial CDs were used in her study rather than isolated notes. Marques [3] has used gaussian mixture models and support vector machines to classify 0.2s segments of 9 instruments obtaining an accuracy of about 70% with LPC, FFT based cepstral coefficients and MFCC feature sets. Marques also has used solo music, and not isolated notes.Martin [1] has used a set of perceptual features derived from a lag-log correlogram to classify isolated notes from 27 instruments with accuracies of about 86% at the instrument family level and about 71% at the individual instrument level. This system has been shown to be robust with respect to handling noisy and reverberent notes. Eronen [4] has also used a set of perceptually motivated features to classify isolated notes from 30 instruments with accuracies of about 94% at the instrument family level and about 85% at the individual instrument level. Agostini [6] has used spectral features only to classify 27 instruments with an accuracy of about 96% at the instrument family level and about 92% at the individual instrument level. Eronen [5], in a study of comparing different features for music instrument recognition has reported best accuracies of 77% at the instrument family level and 35% at the individual instrument level. The different feature sets analyzed are LPCC, on an uniform as well as warped frequency scale, MFCC and other features. The best accuracies were obtained for WLPCC with a prediction order of 13 and Bark scale warping. Kitahara [7] has classified tones from 19 instruments using a fundamental frequency dependent multivariate normal distribution of spectral, temporal, modulation and other features obtaining accuracies of about 90% at the instrument family level and about 80% at the individual instrument level. Except Brown and Marques, all other results are using isolated notes.
Download full report
http://googleurl?sa=t&source=web&cd=2&ve...FMusic.pdf&ei=TWtbTpr3DcfwrQfXovmbDQ&usg=AFQjCNEX0p_wcZthfyVxHzww2TJJQo1oYg