dc.contributor.author |
Dimitriadis, D |
en |
dc.contributor.author |
Maragos, P |
en |
dc.contributor.author |
Potamianos, A |
en |
dc.date.accessioned |
2014-03-01T01:36:33Z |
|
dc.date.available |
2014-03-01T01:36:33Z |
|
dc.date.issued |
2011 |
en |
dc.identifier.issn |
1558-7916 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/21335 |
|
dc.subject |
Bandpass filters |
en |
dc.subject |
cepstrum analysis |
en |
dc.subject |
error analysis |
en |
dc.subject |
parameter estimation |
en |
dc.subject |
robustness |
en |
dc.subject |
spectral analysis |
en |
dc.subject |
speech processing |
en |
dc.subject |
speech recognition |
en |
dc.subject |
timefrequency analysis |
en |
dc.subject.classification |
Acoustics |
en |
dc.subject.classification |
Engineering, Electrical & Electronic |
en |
dc.subject.other |
Alternative approach |
en |
dc.subject.other |
Alternative energy |
en |
dc.subject.other |
Automatic speech recognition |
en |
dc.subject.other |
Cepstrum |
en |
dc.subject.other |
cepstrum analysis |
en |
dc.subject.other |
Energy estimation |
en |
dc.subject.other |
Energy operators |
en |
dc.subject.other |
Error rate reduction |
en |
dc.subject.other |
Experimental analysis |
en |
dc.subject.other |
Feature sets |
en |
dc.subject.other |
Filter bandwidth |
en |
dc.subject.other |
Filter bank design |
en |
dc.subject.other |
Mel-frequency cepstral coefficients |
en |
dc.subject.other |
Noise conditions |
en |
dc.subject.other |
Noise types |
en |
dc.subject.other |
Noisy recordings |
en |
dc.subject.other |
Noisy speech recognition |
en |
dc.subject.other |
Noisy speech signals |
en |
dc.subject.other |
Relative error rates |
en |
dc.subject.other |
Robust speech recognition |
en |
dc.subject.other |
robustness |
en |
dc.subject.other |
Signal energy |
en |
dc.subject.other |
Spectral analysis |
en |
dc.subject.other |
Speech recognition performance |
en |
dc.subject.other |
Teager-Kaiser operator |
en |
dc.subject.other |
Time frequency analysis |
en |
dc.subject.other |
Bandwidth |
en |
dc.subject.other |
Calculations |
en |
dc.subject.other |
Convolution |
en |
dc.subject.other |
Design |
en |
dc.subject.other |
Error analysis |
en |
dc.subject.other |
Feature extraction |
en |
dc.subject.other |
Filter banks |
en |
dc.subject.other |
Parameter estimation |
en |
dc.subject.other |
Spectrum analysis |
en |
dc.subject.other |
Speech processing |
en |
dc.subject.other |
Speech recognition |
en |
dc.title |
On the effects of filterbank design and energy computation on robust speech recognition |
en |
heal.type |
journalArticle |
en |
heal.identifier.primary |
10.1109/TASL.2010.2092766 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/TASL.2010.2092766 |
en |
heal.identifier.secondary |
5638124 |
en |
heal.language |
English |
en |
heal.publicationDate |
2011 |
en |
heal.abstract |
In this paper, we examine how energy computation and filterbank design contribute to the overall front-end robustness, especially when the investigated features are applied to noisy speech signals, in mismatched training-testing conditions. In prior work ("Auditory Teager energy cepstrum coefficients for robust speech recognition," D. Dimitriadis, P. Maragos, and A. Potamianos, in Proc. Eurospeech'05, Sep. 2005), a novel feature set called " Teager energy cepstrum coefficients" (TECCs) has been proposed, employing a dense, smooth filterbank and alternative energy computation schemes. TECCs were shown to be more robust to noise and exhibit improved performance compared to the widely used Mel frequency cepstral coefficients (MFCCs). In this paper, we attempt to interpret these results using a combined theoretical and experimental analysis framework. Specifically, we investigate in detail the connection between the filterbank design, i.e., the filter shape and bandwidth, the energy estimation scheme and the automatic speech recognition (ASR) performance under a variety of additive and/or convolutional noise conditions. For this purpose: 1) the performance of filterbanks using triangular, Gabor, and Gammatone filters with various bandwidths and filter positions are examined under different noisy speech recognition tasks, and 2) the squared amplitude and Teager-Kaiser energy operators are compared as two alternative approaches of computing the signal energy. Our end-goal is to understand how to select the most efficient filterbank and energy computation scheme that are maximally robust under both clean and noisy recording conditions. Theoretical and experimental results show that: 1) the filter bandwidth is one of the most important factors affecting speech recognition performance in noise, while the shape of the filter is of secondary importance, and 2) the Teager-Kaiser operator outperforms (on the average and for most noise types) the squared amplitude energy computation scheme for speech recognition in noisy conditions, especially, for large filter bandwidths. Experimental results show that selecting the appropriate filterbank and energy computation scheme can lead to significant error rate reduction over both MFCC and perceptual linear predicion (PLP) features for a variety of speech recognition tasks. A relative error rate reduction of up to similar to 30% for MFCCs and 39% for PLPs is shown for the Aurora-3 Spanish Task. |
en |
heal.publisher |
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
en |
heal.journalName |
IEEE Transactions on Audio, Speech and Language Processing |
en |
dc.identifier.doi |
10.1109/TASL.2010.2092766 |
en |
dc.identifier.isi |
ISI:000293702300005 |
en |
dc.identifier.volume |
19 |
en |
dc.identifier.issue |
6 |
en |
dc.identifier.spage |
1504 |
en |
dc.identifier.epage |
1516 |
en |