HEAL DSpace

Face active appearance modeling and speech acoustic information to recover articulation

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Katsamanis, A en
dc.contributor.author Papandreou, G en
dc.contributor.author Maragos, P en
dc.date.accessioned 2014-03-01T01:30:37Z
dc.date.available 2014-03-01T01:30:37Z
dc.date.issued 2009 en
dc.identifier.issn 1558-7916 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/19605
dc.subject Active appearance models (AAMs) en
dc.subject Audiovisual-to-articulatory speech inversion en
dc.subject Canonical correlation analysis (CCA) en
dc.subject Multimodal fusion en
dc.subject.classification Acoustics en
dc.subject.classification Engineering, Electrical & Electronic en
dc.subject.other Active appearance models en
dc.subject.other Active appearance models (AAMs) en
dc.subject.other Appearance modeling en
dc.subject.other Audio features en
dc.subject.other Audiovisual-to-articulatory speech inversion en
dc.subject.other Canonical correlation analysis en
dc.subject.other Canonical correlation analysis (CCA) en
dc.subject.other Dynamic information en
dc.subject.other Electromagnetic articulography en
dc.subject.other Face Tracking en
dc.subject.other Facial analysis en
dc.subject.other Ill posed en
dc.subject.other Ill-posedness en
dc.subject.other Inversion process en
dc.subject.other Inversion scheme en
dc.subject.other Line spectral frequencies en
dc.subject.other Linear mapping en
dc.subject.other Markovian en
dc.subject.other Mel-frequency cepstral coefficients en
dc.subject.other Model switching en
dc.subject.other Multi-modal en
dc.subject.other Multi-stream hidden Markov model en
dc.subject.other Multimodal fusion en
dc.subject.other Piecewise linear models en
dc.subject.other Points of interest en
dc.subject.other Speech acoustics en
dc.subject.other Speech inversion en
dc.subject.other Speech production en
dc.subject.other Visual feature extraction en
dc.subject.other Visual information en
dc.subject.other Visual modalities en
dc.subject.other Vocal-tracts en
dc.subject.other Face recognition en
dc.subject.other Feature extraction en
dc.subject.other Frequency estimation en
dc.subject.other Hidden Markov models en
dc.subject.other Piecewise linear techniques en
dc.subject.other Speech recognition en
dc.subject.other Visual communication en
dc.subject.other Audio acoustics en
dc.title Face active appearance modeling and speech acoustic information to recover articulation en
heal.type journalArticle en
heal.identifier.primary 10.1109/TASL.2008.2008740 en
heal.identifier.secondary http://dx.doi.org/10.1109/TASL.2008.2008740 en
heal.language English en
heal.publicationDate 2009 en
heal.abstract We are interested in recovering aspects of vocal tract's geometry and dynamics from speech, a problem referred to as speech inversion. Traditional audio-only speech inversion techniques are inherently ill-posed since the same speech acoustics can be produced by multiple articulatory configurations. To alleviate the ill-posedness of the audio-only inversion process, we propose an inversion scheme which also exploits visual information from the speaker's face. The complex audiovisual-to-articulatory mapping is approximated by an adaptive piecewise linear model. Model switching is governed by a Markovian discrete process which captures articulatory dynamic information. Each constituent linear mapping is effectively estimated via canonical correlation analysis. In the described multimodal context, we investigate alternative fusion schemes which allow interaction between the audio and visual modalities at various synchronization levels. For facial analysis, we employ active appearance models (AAMs) and demonstrate fully automatic face tracking and visual feature extraction. Using the AAM features in conjunction with audio features such as Mel= frequency cepstral coefficients (MFCCs) or line spectral frequencies (LSFs) leads to effective estimation of the trajectories followed by certain points of interest in the speech production system. We report experiments on the QSMT and MOCHA databases which contain audio, video, and electromagnetic articulography data recorded in parallel. The results show that exploiting both audio and visual modalities in a multistream hidden Markov model based scheme clearly improves performance relative to either audio or visual-only estimation. © 2009 IEEE. en
heal.publisher IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC en
heal.journalName IEEE Transactions on Audio, Speech and Language Processing en
dc.identifier.doi 10.1109/TASL.2008.2008740 en
dc.identifier.isi ISI:000263639400002 en
dc.identifier.volume 17 en
dc.identifier.issue 3 en
dc.identifier.spage 411 en
dc.identifier.epage 422 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής