dc.contributor.author |
Papandreou, G |
en |
dc.contributor.author |
Katsamanis, A |
en |
dc.contributor.author |
Pitsikalis, V |
en |
dc.contributor.author |
Maragos, P |
en |
dc.date.accessioned |
2014-03-01T01:29:47Z |
|
dc.date.available |
2014-03-01T01:29:47Z |
|
dc.date.issued |
2009 |
en |
dc.identifier.issn |
1558-7916 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/19344 |
|
dc.subject |
Active appearance models (AAMS) |
en |
dc.subject |
Audiovisual automatic speech recognition (AV-ASR) |
en |
dc.subject |
Multimodal fusion |
en |
dc.subject |
Uncertainty compensation |
en |
dc.subject.classification |
Acoustics |
en |
dc.subject.classification |
Engineering, Electrical & Electronic |
en |
dc.subject.other |
Active appearance models |
en |
dc.subject.other |
Active appearance models (AAMS) |
en |
dc.subject.other |
Adaptivity |
en |
dc.subject.other |
Audio features |
en |
dc.subject.other |
Audio visual speech recognition |
en |
dc.subject.other |
Audiovisual automatic speech recognition (AV-ASR) |
en |
dc.subject.other |
Environmental conditions |
en |
dc.subject.other |
Feature measurement |
en |
dc.subject.other |
Learning rules |
en |
dc.subject.other |
Measurement Noise |
en |
dc.subject.other |
Multi-modal |
en |
dc.subject.other |
Multimodal fusion |
en |
dc.subject.other |
Multimodal integration |
en |
dc.subject.other |
Multiple streams |
en |
dc.subject.other |
On-stream |
en |
dc.subject.other |
Person-independent |
en |
dc.subject.other |
Uncertainty compensation |
en |
dc.subject.other |
Uncertainty estimates |
en |
dc.subject.other |
Uncertainty estimation |
en |
dc.subject.other |
Visual feature extraction |
en |
dc.subject.other |
Feature extraction |
en |
dc.subject.other |
Remelting |
en |
dc.subject.other |
Uncertainty analysis |
en |
dc.subject.other |
Speech recognition |
en |
dc.title |
Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition |
en |
heal.type |
journalArticle |
en |
heal.identifier.primary |
10.1109/TASL.2008.2011515 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/TASL.2008.2011515 |
en |
heal.language |
English |
en |
heal.publicationDate |
2009 |
en |
heal.abstract |
While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recognition, where multiple streams of complementary time-evolving features are integrated. For such applications, provided that the measurement noise uncertainty for each feature stream can be estimated, the proposed framework leads to highly adaptive multimodal fusion rules which are easy and efficient to implement. Our technique is widely applicable and can be transparently integrated with either synchronous or asynchronous multimodal sequence integration architectures.We further show that multimodal fusion methods relying on stream weights can naturally emerge from our scheme under certain assumptions; this connection provides valuable insights into the adaptivity properties of our multimodal uncertainty compensation approach.We show how these ideas can be practically applied for audiovisual speech recognition. In this context, we propose improved techniques for person-independent visual feature extraction and uncertainty estimation with active appearance models, and also discuss how enhanced audio features along with their uncertainty estimates can be effectively computed. We demonstrate the efficacy of our approach in audiovisual speech recognition experiments on the CUAVE database using either synchronous or asynchronous multimodal integration models. © 2009 IEEE. |
en |
heal.publisher |
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
en |
heal.journalName |
IEEE Transactions on Audio, Speech and Language Processing |
en |
dc.identifier.doi |
10.1109/TASL.2008.2011515 |
en |
dc.identifier.isi |
ISI:000263639400003 |
en |
dc.identifier.volume |
17 |
en |
dc.identifier.issue |
3 |
en |
dc.identifier.spage |
423 |
en |
dc.identifier.epage |
435 |
en |