Multimodal user's affective state analysis in naturalistic interaction

Caridakis, G; Karpouzis, K; Wallace, M; Kessous, L; Amir, N

dc.contributor.author	Caridakis, G	en
dc.contributor.author	Karpouzis, K	en
dc.contributor.author	Wallace, M	en
dc.contributor.author	Kessous, L	en
dc.contributor.author	Amir, N	en
dc.date.accessioned	2014-03-01T01:33:46Z
dc.date.available	2014-03-01T01:33:46Z
dc.date.issued	2010	en
dc.identifier.issn	17837677	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/20585
dc.subject	Affective computing	en
dc.subject	Emotion dynamics	en
dc.subject	Emotion recognition	en
dc.subject	Multimodal analysis	en
dc.subject	Recurrent neural network	en
dc.subject.other	Affective computing	en
dc.subject.other	Affective state	en
dc.subject.other	Approximation capabilities	en
dc.subject.other	Audio-visual database	en
dc.subject.other	Audio-visual material	en
dc.subject.other	Dimensional representation	en
dc.subject.other	Dynamic events	en
dc.subject.other	Emotion recognition	en
dc.subject.other	Emotional state	en
dc.subject.other	Facial Expressions	en
dc.subject.other	Human machine interaction	en
dc.subject.other	Human-centered computing	en
dc.subject.other	Multi-modal	en
dc.subject.other	Multimodal analysis	en
dc.subject.other	Prosody information	en
dc.subject.other	Real world situations	en
dc.subject.other	Recognition rates	en
dc.subject.other	Short term memory	en
dc.subject.other	Video sequences	en
dc.subject.other	Human computer interaction	en
dc.subject.other	Video recording	en
dc.subject.other	Recurrent neural networks	en
dc.title	Multimodal user's affective state analysis in naturalistic interaction	en
heal.type	journalArticle	en
heal.identifier.primary	10.1007/s12193-009-0030-8	en
heal.identifier.secondary	http://dx.doi.org/10.1007/s12193-009-0030-8	en
heal.publicationDate	2010	en
heal.abstract	Affective and human-centered computing have attracted an abundance of attention during the past years, mainly due to the abundance of environments and applications able to exploit and adapt to multimodal input from the users. The combination of facial expressions with prosody information allows us to capture the users' emotional state in an unintrusive manner, relying on the best performing modality in cases where one modality suffers from noise or bad sensing conditions. In this paper, we describe a multi-cue, dynamic approach to detect emotion in naturalistic video sequences, where input is taken from nearly real world situations, contrary to controlled recording conditions of audiovisual material. Recognition is performed via a recurrent neural network, whose short term memory and approximation capabilities cater for modeling dynamic events in facial and prosodic expressivity. This approach also differs from existing work in that it models user expressivity using a dimensional representation, instead of detecting discrete 'universal emotions', which are scarce in everyday human-machine interaction. The algorithm is deployed on an audiovisual database which was recorded simulating human-human discourse and, therefore, contains less extreme expressivity and subtle variations of a number of emotion labels. Results show that in turns lasting more than a few frames, recognition rates rise to 98%. © OpenInterface Association 2009.	en
heal.journalName	Journal on Multimodal User Interfaces	en
dc.identifier.doi	10.1007/s12193-009-0030-8	en
dc.identifier.volume	3	en
dc.identifier.issue	1	en
dc.identifier.spage	49	en
dc.identifier.epage	66	en