HEAL DSpace

Audiovisual-to-articulatory speech inversion using HMMs

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Katsamanis, A en
dc.contributor.author Papandreou, G en
dc.contributor.author Maragos, P en
dc.date.accessioned 2014-03-01T02:44:29Z
dc.date.available 2014-03-01T02:44:29Z
dc.date.issued 2007 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/31843
dc.subject Canonical Correlation Analysis en
dc.subject Linear Model en
dc.subject Point of View en
dc.subject Posterior Probability en
dc.subject Speech Production en
dc.subject Audio Video en
dc.subject Hidden Markov Model en
dc.subject Point of Interest en
dc.subject.other Hidden Markov models en
dc.subject.other Markov processes en
dc.subject.other Signal processing en
dc.subject.other Technical presentations en
dc.subject.other Audio visuals en
dc.subject.other Audiovisual speeches en
dc.subject.other Canonical Correlation Analysis en
dc.subject.other Electromagnetic articulography en
dc.subject.other Points of interests en
dc.subject.other Speech productions en
dc.subject.other Statistical frameworks en
dc.subject.other Visual modalities en
dc.subject.other Visual speeches en
dc.subject.other Vocal tracts en
dc.subject.other Speech recognition en
dc.title Audiovisual-to-articulatory speech inversion using HMMs en
heal.type conferenceItem en
heal.identifier.primary 10.1109/MMSP.2007.4412915 en
heal.identifier.secondary http://dx.doi.org/10.1109/MMSP.2007.4412915 en
heal.identifier.secondary 4412915 en
heal.publicationDate 2007 en
heal.abstract We address the problem of audiovisual speech inversion, namely recovering the vocal tract's geometry from auditory and visual speech cues. We approach the problem in a statistical framework, combining ideas from multistream Hidden Markov Models and canonical correlation analysis, and demonstrate effective estimation of the trajectories followed by certain points of interest in the speech production system. Our experiments show that exploiting both audio and visual modalities clearly improves performance relative to either audio-only or visual-only estimation. We report experiments on the QSMT database which contains audio, video, and electromagnetic articulography data recorded in parallel. © 2007 IEEE. en
heal.journalName 2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings en
dc.identifier.doi 10.1109/MMSP.2007.4412915 en
dc.identifier.spage 457 en
dc.identifier.epage 460 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής