dc.contributor.author |
Kessous, L |
en |
dc.contributor.author |
Castellano, G |
en |
dc.contributor.author |
Caridakis, G |
en |
dc.date.accessioned |
2014-03-01T01:33:46Z |
|
dc.date.available |
2014-03-01T01:33:46Z |
|
dc.date.issued |
2010 |
en |
dc.identifier.issn |
17837677 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/20583 |
|
dc.subject |
Affective body language |
en |
dc.subject |
Affective speech |
en |
dc.subject |
Emotion recognition |
en |
dc.subject |
Facial expression |
en |
dc.subject |
Multimodal fusion |
en |
dc.subject.other |
Acoustic analysis |
en |
dc.subject.other |
Affective body language |
en |
dc.subject.other |
Affective speech |
en |
dc.subject.other |
Automatic classification |
en |
dc.subject.other |
Bayesian classifier |
en |
dc.subject.other |
Bi-modal data |
en |
dc.subject.other |
Bimodal emotion recognition |
en |
dc.subject.other |
Body gesture |
en |
dc.subject.other |
Emotion recognition |
en |
dc.subject.other |
Emotional expressions |
en |
dc.subject.other |
Facial Expressions |
en |
dc.subject.other |
Feature level |
en |
dc.subject.other |
Gesture-speech |
en |
dc.subject.other |
Multi-modal |
en |
dc.subject.other |
Multi-modal approach |
en |
dc.subject.other |
Multi-modal data |
en |
dc.subject.other |
Multi-modal fusion |
en |
dc.subject.other |
Native language |
en |
dc.subject.other |
Recognition rates |
en |
dc.subject.other |
System-based |
en |
dc.subject.other |
Unimodal |
en |
dc.subject.other |
Automatic indexing |
en |
dc.subject.other |
Classifiers |
en |
dc.subject.other |
Face recognition |
en |
dc.subject.other |
Feature extraction |
en |
dc.subject.other |
Gesture recognition |
en |
dc.subject.other |
Linguistics |
en |
dc.subject.other |
Query languages |
en |
dc.subject.other |
Speech recognition |
en |
dc.title |
Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis |
en |
heal.type |
journalArticle |
en |
heal.identifier.primary |
10.1007/s12193-009-0025-5 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1007/s12193-009-0025-5 |
en |
heal.publicationDate |
2010 |
en |
heal.abstract |
In this paper a study on multimodal automatic emotion recognition during a speech-based interaction is presented. A database was constructed consisting of people pronouncing a sentence in a scenario where they interacted with an agent using speech. Ten people pronounced a sentence corresponding to a command while making 8 different emotional expressions. Gender was equally represented, with speakers of several different native languages including French, German, Greek and Italian. Facial expression, gesture and acoustic analysis of speech were used to extract features relevant to emotion. For the automatic classification of unimodal data, bimodal data and multimodal data, a system based on a Bayesian classifier was used. After performing an automatic classification of each modality, the different modalities were combined using a multimodal approach. Fusion of the modalities at the feature level (before running the classifier) and at the results level (combining results from classifier from each modality) were compared. Fusing the multimodal data resulted in a large increase in the recognition rates in comparison to the unimodal systems: the multimodal approach increased the recognition rate by more than 10% when compared to the most successful unimodal system. Bimodal emotion recognition based on all combinations of the modalities (i.e., 'face-gesture', 'face-speech' and 'gesture-speech') was also investigated. The results show that the best pairing is 'gesture-speech'. Using all three modalities resulted in a 3.3% classification improvement over the best bimodal results. © OpenInterface Association 2009. |
en |
heal.journalName |
Journal on Multimodal User Interfaces |
en |
dc.identifier.doi |
10.1007/s12193-009-0025-5 |
en |
dc.identifier.volume |
3 |
en |
dc.identifier.issue |
1 |
en |
dc.identifier.spage |
33 |
en |
dc.identifier.epage |
48 |
en |