Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition

Papandreou, G; Katsamanis, A; Pitsikalis, V; Maragos, P

dc.contributor.author	Papandreou, G	en
dc.contributor.author	Katsamanis, A	en
dc.contributor.author	Pitsikalis, V	en
dc.contributor.author	Maragos, P	en
dc.date.accessioned	2014-03-01T01:29:47Z
dc.date.available	2014-03-01T01:29:47Z
dc.date.issued	2009	en
dc.identifier.issn	1558-7916	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/19344
dc.subject	Active appearance models (AAMS)	en
dc.subject	Audiovisual automatic speech recognition (AV-ASR)	en
dc.subject	Multimodal fusion	en
dc.subject	Uncertainty compensation	en
dc.subject.classification	Acoustics	en
dc.subject.classification	Engineering, Electrical & Electronic	en
dc.subject.other	Active appearance models	en
dc.subject.other	Active appearance models (AAMS)	en
dc.subject.other	Adaptivity	en
dc.subject.other	Audio features	en
dc.subject.other	Audio visual speech recognition	en
dc.subject.other	Audiovisual automatic speech recognition (AV-ASR)	en
dc.subject.other	Environmental conditions	en
dc.subject.other	Feature measurement	en
dc.subject.other	Learning rules	en
dc.subject.other	Measurement Noise	en
dc.subject.other	Multi-modal	en
dc.subject.other	Multimodal fusion	en
dc.subject.other	Multimodal integration	en
dc.subject.other	Multiple streams	en
dc.subject.other	On-stream	en
dc.subject.other	Person-independent	en
dc.subject.other	Uncertainty compensation	en
dc.subject.other	Uncertainty estimates	en
dc.subject.other	Uncertainty estimation	en
dc.subject.other	Visual feature extraction	en
dc.subject.other	Feature extraction	en
dc.subject.other	Remelting	en
dc.subject.other	Uncertainty analysis	en
dc.subject.other	Speech recognition	en
dc.title	Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition	en
heal.type	journalArticle	en
heal.identifier.primary	10.1109/TASL.2008.2011515	en
heal.identifier.secondary	http://dx.doi.org/10.1109/TASL.2008.2011515	en
heal.language	English	en
heal.publicationDate	2009	en
heal.abstract	While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are easy and efficient to implement. Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures.We further show that multimodal fusion methods relying on stream weights can naturally emerge from our scheme under certain assumptions; this connection provides valuable insights into the adaptivity properties of our multimodal uncertainty compensation approach.We show how these ideas can be practically applied for audiovisual speech recognition. In this context, we propose improved techniques for person-independent visual feature extraction and uncertainty estimation with active appearance models, and also discuss how enhanced audio features along with their uncertainty estimates can be effectively computed. We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models. © 2009 IEEE.	en
heal.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	en
heal.journalName	IEEE Transactions on Audio, Speech and Language Processing	en
dc.identifier.doi	10.1109/TASL.2008.2011515	en
dc.identifier.isi	ISI:000263639400003	en
dc.identifier.volume	17	en
dc.identifier.issue	3	en
dc.identifier.spage	423	en
dc.identifier.epage	435	en