heal.abstract |
Emotions prepare people to deal with important events, and thus, play a vital role in their various relationships and decision-making. The possible applications of an interface capable of assessing human emotional states, make Emotion Recognition an exciting research field. There are three major contenders for the role of a general model on emotion mechanisms. The most common one is the basic emotion or categorical approach, that assumes the existence of a small, fixed number of discrete emotions. Emotional information is conveyed by a wide range of multimodal cues,
facial expressions being the most used by the research community. In this thesis we focus on expression from the body, motivated by the fact, that the COVID-19
pandemic has undoubtedly changed the standards and affected all aspects of our lives, especially social life. Nowadays people extensively wear face masks, as it is one of the essential means to prevent the transmission of the pandemic. As a result,
emotional reading from face can be strongly irritated by the presence of a mask. The type of algorithm that we use to tackle the problem of Emotion Recognition is Deep Learning, as these type of methods have yielded excellent results, due to the
massive amounts of digital data in combination with powerful processing hardware, and on most cases has outperformed conventional machine learning methods. In this
thesis, we wish to conduct insightful studies and come to fruitful applicative conclusions, regarding the area of Affective Bodily Expression Recognition. We adopt a proven deep learning-based visual recognition model called Temporal Segment Network
and perform an experimental study about the face occlusion effect, caused by a face mask, on emotion recognition performance. This is achieved, by creating a medical face mask application tool and using it on a children emotion database, named EmoReact. We compare results based on the input modality and show, that
although performance drops considerably with face, with full body we observe little to no decrease. Also, incorporating the whole body into the input, gives superior results over the plain masked face cropped image. We enhance our model with some
proven techniques and almost fully overcome the face mask consequences, regarding performance. Lastly, as an essential step towards making a real-world emotion recognition interface, we create a real-time setup of the model, present multiple input
versions of it and study the face mask effect in-the-wild. |
en |