HEAL DSpace

Audiovisual-to-articulatory speech inversion using active appearance models for the face and Hidden Markov Models for the dynamics

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Katsamanis, A en
dc.contributor.author Papandreou, G en
dc.contributor.author Maragos, P en
dc.date.accessioned 2014-03-01T02:45:09Z
dc.date.available 2014-03-01T02:45:09Z
dc.date.issued 2008 en
dc.identifier.issn 15206149 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/32169
dc.subject Articulatory en
dc.subject Audiovisual en
dc.subject Fusion en
dc.subject Hidden markov models en
dc.subject Speech inversion en
dc.subject.other Acoustics en
dc.subject.other Computational grammars en
dc.subject.other Computer networks en
dc.subject.other Dynamics en
dc.subject.other Feature extraction en
dc.subject.other Learning systems en
dc.subject.other Markov processes en
dc.subject.other Object recognition en
dc.subject.other Signal processing en
dc.subject.other Speech en
dc.subject.other Speech recognition en
dc.subject.other Articulatory en
dc.subject.other Audiovisual en
dc.subject.other Fusion en
dc.subject.other International conferences en
dc.subject.other Speech inversion en
dc.subject.other Hidden Markov models en
dc.title Audiovisual-to-articulatory speech inversion using active appearance models for the face and Hidden Markov Models for the dynamics en
heal.type conferenceItem en
heal.identifier.primary 10.1109/ICASSP.2008.4518090 en
heal.identifier.secondary http://dx.doi.org/10.1109/ICASSP.2008.4518090 en
heal.identifier.secondary 4518090 en
heal.publicationDate 2008 en
heal.abstract We are interested in recovering aspects of vocal tract's geometry and dynamics from auditory and visual speech cues. We approach the problem in a statistical framework based on Hidden Markov Models and demonstrate effective estimation of the trajectories followed by certain points of interest in the speech production system. Alternative fusion schemes are investigated to account for asynchrony between the modalities and allow independent modeling of the dynamics of the involved streams. Visual cues are extracted from the speaker's face by means of Active Appearance Modeling. We report experiments on the QSMT database which contains audio, video, and electromagnetic articulography data recorded in parallel. The results show that exploiting both audio and visual modalities in a multistream HMM based scheme clearly improves performance relative to either audio or visual-only estimation. ©2008 IEEE. en
heal.journalName ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings en
dc.identifier.doi 10.1109/ICASSP.2008.4518090 en
dc.identifier.spage 2237 en
dc.identifier.epage 2240 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής