HEAL DSpace

Short-time instantaneous frequency and bandwidth features for speech recognition

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Tsiakoulis, P en
dc.contributor.author Potamianos, A en
dc.contributor.author Dimitriadis, D en
dc.date.accessioned 2014-03-01T02:46:31Z
dc.date.available 2014-03-01T02:46:31Z
dc.date.issued 2009 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/32691
dc.subject AM-FM en
dc.subject Filterbank overlap en
dc.subject Instantaneous bandwidth en
dc.subject Instantaneous frequency en
dc.subject Speech recognition en
dc.subject.other Automatic speech recognition en
dc.subject.other Cepstral domain en
dc.subject.other Frequency domains en
dc.subject.other Instantaneous bandwidth en
dc.subject.other Instantaneous frequency en
dc.subject.other Relative error rates en
dc.subject.other Speaker recognition en
dc.subject.other Spectral moments en
dc.subject.other Stand -alone en
dc.subject.other Sub-bands en
dc.subject.other Time averages en
dc.subject.other Amplitude modulation en
dc.subject.other Bandwidth en
dc.subject.other Electric frequency measurement en
dc.subject.other Speech recognition en
dc.title Short-time instantaneous frequency and bandwidth features for speech recognition en
heal.type conferenceItem en
heal.identifier.primary 10.1109/ASRU.2009.5373305 en
heal.identifier.secondary http://dx.doi.org/10.1109/ASRU.2009.5373305 en
heal.identifier.secondary 5373305 en
heal.publicationDate 2009 en
heal.abstract In this paper, we investigate the performance of modulation related features and normalized spectral moments for automatic speech recognition. We focus on the short-time averages of the amplitude weighted instantaneous frequencies and bandwidths, computed at each subband of a mel-spaced filterbank. Similar features have been proposed in previous studies, and have been successfully combined with MFCCs for speech and speaker recognition. Our goal is to investigate the stand-alone performance of these features. First, it is experimentally shown that the proposed features are only moderately correlated in the frequency domain, and, unlike MFCCs, they do not require a transformation to the cepstral domain. Next, the filterbank parameters (number of filters and filter overlap) are investigated for the proposed features and compared with those of MFCCs. Results show that frequency related features perform at least as well as MFCCs for clean conditions, and yield superior results for noisy conditions; up to 50% relative error rate reduction for the AURORA3 Spanish task. © 2009 IEEE. en
heal.journalName Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009 en
dc.identifier.doi 10.1109/ASRU.2009.5373305 en
dc.identifier.spage 103 en
dc.identifier.epage 106 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής