dc.contributor.author |
Papandreou, G |
en |
dc.contributor.author |
Katsamanis, A |
en |
dc.contributor.author |
Pitsikalis, V |
en |
dc.contributor.author |
Maragos, P |
en |
dc.date.accessioned |
2014-03-01T02:44:51Z |
|
dc.date.available |
2014-03-01T02:44:51Z |
|
dc.date.issued |
2007 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/31975 |
|
dc.subject |
Audio Visual Speech Recognition |
en |
dc.subject |
Measurement Noise |
en |
dc.subject |
multimodal fusion |
en |
dc.subject |
Speech Recognition |
en |
dc.subject |
Time Varying |
en |
dc.subject.other |
Audio visuals |
en |
dc.subject.other |
Audiovisual speech recognitions |
en |
dc.subject.other |
Complementary features |
en |
dc.subject.other |
Learning rules |
en |
dc.subject.other |
Measurement noises |
en |
dc.subject.other |
Multimodal fusions |
en |
dc.subject.other |
Multiple streams |
en |
dc.subject.other |
On streams |
en |
dc.subject.other |
Uncertain features |
en |
dc.subject.other |
Signal processing |
en |
dc.subject.other |
Speech analysis |
en |
dc.subject.other |
Technical presentations |
en |
dc.subject.other |
Uncertainty analysis |
en |
dc.subject.other |
Speech recognition |
en |
dc.title |
Multimodal fusion and learning with uncertain features applied to audiovisual speech recognition |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1109/MMSP.2007.4412868 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1109/MMSP.2007.4412868 |
en |
heal.identifier.secondary |
4412868 |
en |
heal.publicationDate |
2007 |
en |
heal.abstract |
We study the effect of uncertain feature measurements and show how classification and learning rules should be adjusted to compensate for it. Our approach is particularly fruitful in multimodal fusion scenarios, such as audio-visual speech recognition, where multiple streams of complementary features whose reliability is time-varying are integrated. For such applications, by taking the measurement noise uncertainty of each feature stream into account, the proposed framework leads to highly adaptive multimodal fusion rules for classification and learning which are widely applicable and easy to implement. We further show that previous multimodal fusion methods relying on stream weights fall under our scheme under certain assumptions; this provides novel insights into their applicability for various tasks and suggests new practical ways for estimating the stream weights adaptively. The potential of our approach is demonstrated in audio-visual speech recognition experiments. ©2007 IEEE. |
en |
heal.journalName |
2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings |
en |
dc.identifier.doi |
10.1109/MMSP.2007.4412868 |
en |
dc.identifier.spage |
264 |
en |
dc.identifier.epage |
267 |
en |