heal.abstract |
Computer Vision is a field of Artificial Intelligence that aims in enabling computer systems to derive important information from visual inputs, such as images and videos, in order to take actions or make recommendations based on them. It has had applications on a variety of areas and industries, like transportation, education, healthcare, security systems and smartphones, for that reason it's advancement is deemed necessary. Another area where computer vision has submerged is in studying children interactions through the analysis of video data, which is crucial for understanding how children develop social, emotional, and cognitive skills, particularly in early childhood.
Key to these interactions is the child’s engagement, which is a valid indicator of how focused the child is during its interaction and how different approaches change its behaviour. Engagement from a visual perspective is often measured through behaviors like gaze, which indicates attention and focus. Understanding child engagement is crucial for assessing their learning, social interactions, and cognitive development. Gaze estimation, as a key aspect of non-verbal communication, plays a pivotal role in interpreting attention levels and identifying atypical behavioral patterns, such as those seen in children with Autism Spectrum Disorder (ASD).
This thesis leverages state-of-the-art computer vision techniques to automatically track and assess children's gaze and engagement levels. In addition to that, it studies the effectiveness of Temporal Segment Networks (TSN), in improving the capabilities of the proposed models in those tasks. We create and implement frameworks for both the engagement estimation and gaze tracking and test them with visual data using only RGB data as input, from really challenging datasets that capture children interacting freely in different environments. Finally, we create a different kind of network that combines the engagement and gaze estimation networks, while using the TSN method, into a new architecture to further improve the engagement estimation.
By utilizing machine learning models, the thesis aims to provide an accurate and scalable solution for real-time analysis of these behaviors. The findings contribute to the growing body of research in early childhood development, offering potential tools for professionals to better understand and support children’s developmental needs, particularly in identifying early signs of disorders such as ASD. |
el |