dc.contributor.author |
Tsiakoulis, P |
en |
dc.contributor.author |
Potamianos, A |
en |
dc.contributor.author |
Dimitriadis, D |
en |
dc.date.accessioned |
2014-03-01T02:46:31Z |
|
dc.date.available |
2014-03-01T02:46:31Z |
|
dc.date.issued |
2009 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/32691 |
|
dc.subject |
AM-FM |
en |
dc.subject |
Filterbank overlap |
en |
dc.subject |
Instantaneous bandwidth |
en |
dc.subject |
Instantaneous frequency |
en |
dc.subject |
Speech recognition |
en |
dc.subject.other |
Automatic speech recognition |
en |
dc.subject.other |
Cepstral domain |
en |
dc.subject.other |
Frequency domains |
en |
dc.subject.other |
Instantaneous bandwidth |
en |
dc.subject.other |
Instantaneous frequency |
en |
dc.subject.other |
Relative error rates |
en |
dc.subject.other |
Speaker recognition |
en |
dc.subject.other |
Spectral moments |
en |
dc.subject.other |
Stand -alone |
en |
dc.subject.other |
Sub-bands |
en |
dc.subject.other |
Time averages |
en |
dc.subject.other |
Amplitude modulation |
en |
dc.subject.other |
Bandwidth |
en |
dc.subject.other |
Electric frequency measurement |
en |
dc.subject.other |
Speech recognition |
en |
dc.title |
Short-time instantaneous frequency and bandwidth features for speech recognition |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1109/ASRU.2009.5373305 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/ASRU.2009.5373305 |
en |
heal.identifier.secondary |
5373305 |
en |
heal.publicationDate |
2009 |
en |
heal.abstract |
In this paper, we investigate the performance of modulation related features and normalized spectral moments for automatic speech recognition. We focus on the short-time averages of the amplitude weighted instantaneous frequencies and bandwidths, computed at each subband of a mel-spaced filterbank. Similar features have been proposed in previous studies, and have been successfully combined with MFCCs for speech and speaker recognition. Our goal is to investigate the stand-alone performance of these features. First, it is experimentally shown that the proposed features are only moderately correlated in the frequency domain, and, unlike MFCCs, they do not require a transformation to the cepstral domain. Next, the filterbank parameters (number of filters and filter overlap) are investigated for the proposed features and compared with those of MFCCs. Results show that frequency related features perform at least as well as MFCCs for clean conditions, and yield superior results for noisy conditions; up to 50% relative error rate reduction for the AURORA3 Spanish task. © 2009 IEEE. |
en |
heal.journalName |
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009 |
en |
dc.identifier.doi |
10.1109/ASRU.2009.5373305 |
en |
dc.identifier.spage |
103 |
en |
dc.identifier.epage |
106 |
en |