Audiovisual speech inversion by switching dynamical modeling governed by a Hidden Markov process

Katsamanis, A; Ananthakrishnan, G; Papandreou, G; Maragos, P; Engwall, O

dc.contributor.author	Katsamanis, A	en
dc.contributor.author	Ananthakrishnan, G	en
dc.contributor.author	Papandreou, G	en
dc.contributor.author	Maragos, P	en
dc.contributor.author	Engwall, O	en
dc.date.accessioned	2014-03-01T02:45:09Z
dc.date.available	2014-03-01T02:45:09Z
dc.date.issued	2008	en
dc.identifier.issn	22195491	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/32168
dc.relation.uri	http://www.scopus.com/inward/record.url?eid=2-s2.0-84863731362&partnerID=40&md5=3dc13d3b075c5904501658f11ce8135c	en
dc.relation.uri	http://cvsp.cs.ntua.gr/publications/confr/KAPME_KalmanHMMInversion_eusipco08.pdf	en
dc.relation.uri	http://www.speech.kth.se/prod/publications/files/3258.pdf	en
dc.relation.uri	http://www.eurasip.org/Proceedings/Eusipco/Eusipco2008/papers/1569105532.pdf	en
dc.relation.uri	http://cvsp.cs.ntua.gr/publications/confr/KatsamanisAnanthPapandreouMaragosEngwall_AV-Speechinvers-SwitchDynModel-HidMarkov_EUSIPCO2008.pdf	en
dc.subject	Active Appearance Model	en
dc.subject	Dynamic Model	en
dc.subject	Hidden Markov Process	en
dc.subject	Inverse Problem	en
dc.subject	Linear Dynamical System	en
dc.subject	Prediction Error	en
dc.subject	Radial Basis Function	en
dc.subject	Root Mean Square Error	en
dc.subject	Support Vector Machine	en
dc.subject	Visual Analysis	en
dc.subject	mel frequency cepstral coefficient	en
dc.subject	Markov Model	en
dc.subject.other	Active appearance models	en
dc.subject.other	Audio-visual speech	en
dc.subject.other	Classification analysis	en
dc.subject.other	Correlation coefficient	en
dc.subject.other	Dynamical modeling	en
dc.subject.other	Evaluation scheme	en
dc.subject.other	Hidden Markov process	en
dc.subject.other	Inversion problems	en
dc.subject.other	Mel-frequency cepstral coefficients	en
dc.subject.other	Prediction errors	en
dc.subject.other	Radial basis functions	en
dc.subject.other	Root mean squared errors	en
dc.subject.other	State sequences	en
dc.subject.other	Switching linear dynamical systems	en
dc.subject.other	Unified framework	en
dc.subject.other	Visual analysis	en
dc.subject.other	Hidden Markov models	en
dc.subject.other	Linear control systems	en
dc.subject.other	Radial basis function networks	en
dc.subject.other	Signal processing	en
dc.subject.other	Image segmentation	en
dc.title	Audiovisual speech inversion by switching dynamical modeling governed by a Hidden Markov process	en
heal.type	conferenceItem	en
heal.publicationDate	2008	en
heal.abstract	We propose a unified framework to recover articulation from audiovisual speech. The nonlinear audiovisual-to-articulatory mapping is modeled by means of a switching linear dynamical system. Switching is governed by a state sequence determined via a Hidden Markov Model alignment process. Mel Frequency Cepstral Coefficients are extracted from audio while visual analysis is performed using Active Appearance Models. The articulatory state is represented by the coordinates of points on important articulators, e.g., tongue and lips. To evaluate our inversion approach, instead of just using the conventional correlation coefficients and root mean squared errors, we introduce a novel evaluation scheme that is more specific to the inversion problem. Prediction errors in the positions of the articulators are weighted differently depending on their relevant importance in the production of the corresponding sound. The applied weights are determined by an articulatory classification analysis using Support Vector Machines with a radial basis function kernel. Experiments are conducted in the audiovisual-articulatory MOCHA database. copyright by EURASIP.	en
heal.journalName	European Signal Processing Conference	en