dc.contributor.author |
Katsamanis, A |
en |
dc.contributor.author |
Papandreou, G |
en |
dc.contributor.author |
Maragos, P |
en |
dc.date.accessioned |
2014-03-01T02:44:29Z |
|
dc.date.available |
2014-03-01T02:44:29Z |
|
dc.date.issued |
2007 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/31843 |
|
dc.subject |
Canonical Correlation Analysis |
en |
dc.subject |
Linear Model |
en |
dc.subject |
Point of View |
en |
dc.subject |
Posterior Probability |
en |
dc.subject |
Speech Production |
en |
dc.subject |
Audio Video |
en |
dc.subject |
Hidden Markov Model |
en |
dc.subject |
Point of Interest |
en |
dc.subject.other |
Hidden Markov models |
en |
dc.subject.other |
Markov processes |
en |
dc.subject.other |
Signal processing |
en |
dc.subject.other |
Technical presentations |
en |
dc.subject.other |
Audio visuals |
en |
dc.subject.other |
Audiovisual speeches |
en |
dc.subject.other |
Canonical Correlation Analysis |
en |
dc.subject.other |
Electromagnetic articulography |
en |
dc.subject.other |
Points of interests |
en |
dc.subject.other |
Speech productions |
en |
dc.subject.other |
Statistical frameworks |
en |
dc.subject.other |
Visual modalities |
en |
dc.subject.other |
Visual speeches |
en |
dc.subject.other |
Vocal tracts |
en |
dc.subject.other |
Speech recognition |
en |
dc.title |
Audiovisual-to-articulatory speech inversion using HMMs |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1109/MMSP.2007.4412915 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/MMSP.2007.4412915 |
en |
heal.identifier.secondary |
4412915 |
en |
heal.publicationDate |
2007 |
en |
heal.abstract |
We address the problem of audiovisual speech inversion, namely recovering the vocal tract's geometry from auditory and visual speech cues. We approach the problem in a statistical framework, combining ideas from multistream Hidden Markov Models and canonical correlation analysis, and demonstrate effective estimation of the trajectories followed by certain points of interest in the speech production system. Our experiments show that exploiting both audio and visual modalities clearly improves performance relative to either audio-only or visual-only estimation. We report experiments on the QSMT database which contains audio, video, and electromagnetic articulography data recorded in parallel. © 2007 IEEE. |
en |
heal.journalName |
2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings |
en |
dc.identifier.doi |
10.1109/MMSP.2007.4412915 |
en |
dc.identifier.spage |
457 |
en |
dc.identifier.epage |
460 |
en |