HEAL DSpace

On the effects of filterbank design and energy computation on robust speech recognition

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Dimitriadis, D en
dc.contributor.author Maragos, P en
dc.contributor.author Potamianos, A en
dc.date.accessioned 2014-03-01T01:36:33Z
dc.date.available 2014-03-01T01:36:33Z
dc.date.issued 2011 en
dc.identifier.issn 1558-7916 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/21335
dc.subject Bandpass filters en
dc.subject cepstrum analysis en
dc.subject error analysis en
dc.subject parameter estimation en
dc.subject robustness en
dc.subject spectral analysis en
dc.subject speech processing en
dc.subject speech recognition en
dc.subject timefrequency analysis en
dc.subject.classification Acoustics en
dc.subject.classification Engineering, Electrical & Electronic en
dc.subject.other Alternative approach en
dc.subject.other Alternative energy en
dc.subject.other Automatic speech recognition en
dc.subject.other Cepstrum en
dc.subject.other cepstrum analysis en
dc.subject.other Energy estimation en
dc.subject.other Energy operators en
dc.subject.other Error rate reduction en
dc.subject.other Experimental analysis en
dc.subject.other Feature sets en
dc.subject.other Filter bandwidth en
dc.subject.other Filter bank design en
dc.subject.other Mel-frequency cepstral coefficients en
dc.subject.other Noise conditions en
dc.subject.other Noise types en
dc.subject.other Noisy recordings en
dc.subject.other Noisy speech recognition en
dc.subject.other Noisy speech signals en
dc.subject.other Relative error rates en
dc.subject.other Robust speech recognition en
dc.subject.other robustness en
dc.subject.other Signal energy en
dc.subject.other Spectral analysis en
dc.subject.other Speech recognition performance en
dc.subject.other Teager-Kaiser operator en
dc.subject.other Time frequency analysis en
dc.subject.other Bandwidth en
dc.subject.other Calculations en
dc.subject.other Convolution en
dc.subject.other Design en
dc.subject.other Error analysis en
dc.subject.other Feature extraction en
dc.subject.other Filter banks en
dc.subject.other Parameter estimation en
dc.subject.other Spectrum analysis en
dc.subject.other Speech processing en
dc.subject.other Speech recognition en
dc.title On the effects of filterbank design and energy computation on robust speech recognition en
heal.type journalArticle en
heal.identifier.primary 10.1109/TASL.2010.2092766 en
heal.identifier.secondary http://dx.doi.org/10.1109/TASL.2010.2092766 en
heal.identifier.secondary 5638124 en
heal.language English en
heal.publicationDate 2011 en
heal.abstract In this paper, we examine how energy computation and filterbank design contribute to the overall front-end robustness, especially when the investigated features are applied to noisy speech signals, in mismatched training-testing conditions. In prior work ("Auditory Teager energy cepstrum coefficients for robust speech recognition," D. Dimitriadis, P. Maragos, and A. Potamianos, in Proc. Eurospeech'05, Sep. 2005), a novel feature set called " Teager energy cepstrum coefficients" (TECCs) has been proposed, employing a dense, smooth filterbank and alternative energy computation schemes. TECCs were shown to be more robust to noise and exhibit improved performance compared to the widely used Mel frequency cepstral coefficients (MFCCs). In this paper, we attempt to interpret these results using a combined theoretical and experimental analysis framework. Specifically, we investigate in detail the connection between the filterbank design, i.e., the filter shape and bandwidth, the energy estimation scheme and the automatic speech recognition (ASR) performance under a variety of additive and/or convolutional noise conditions. For this purpose: 1) the performance of filterbanks using triangular, Gabor, and Gammatone filters with various bandwidths and filter positions are examined under different noisy speech recognition tasks, and 2) the squared amplitude and Teager-Kaiser energy operators are compared as two alternative approaches of computing the signal energy. Our end-goal is to understand how to select the most efficient filterbank and energy computation scheme that are maximally robust under both clean and noisy recording conditions. Theoretical and experimental results show that: 1) the filter bandwidth is one of the most important factors affecting speech recognition performance in noise, while the shape of the filter is of secondary importance, and 2) the Teager-Kaiser operator outperforms (on the average and for most noise types) the squared amplitude energy computation scheme for speech recognition in noisy conditions, especially, for large filter bandwidths. Experimental results show that selecting the appropriate filterbank and energy computation scheme can lead to significant error rate reduction over both MFCC and perceptual linear predicion (PLP) features for a variety of speech recognition tasks. A relative error rate reduction of up to similar to 30% for MFCCs and 39% for PLPs is shown for the Aurora-3 Spanish Task. en
heal.publisher IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC en
heal.journalName IEEE Transactions on Audio, Speech and Language Processing en
dc.identifier.doi 10.1109/TASL.2010.2092766 en
dc.identifier.isi ISI:000293702300005 en
dc.identifier.volume 19 en
dc.identifier.issue 6 en
dc.identifier.spage 1504 en
dc.identifier.epage 1516 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής