dc.contributor.author | Κολιός, Παναγιώτης | el |
dc.contributor.author | Kolios, Panagiotis | en |
dc.date.accessioned | 2023-09-01T08:40:03Z | |
dc.date.available | 2023-09-01T08:40:03Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/57941 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.25638 | |
dc.rights | Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/gr/ | * |
dc.subject | Μηχανική Μάθηση | el |
dc.subject | Τεχνητή Νοημοσύνη | el |
dc.subject | Νευρωνικά Δίκτυα | el |
dc.subject | Causality | en |
dc.subject | Cognitive Science | en |
dc.subject | Natural Language Processing | en |
dc.subject | Deep Learning | en |
dc.subject | Transformers | en |
dc.title | Towards Neural Models with System 2 - Type Capabilities | en |
heal.type | bachelorThesis | |
heal.classification | Computer Science - Machine Learning | en |
heal.language | el | |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2023-05-30 | |
heal.abstract | The subject of this thesis is the study and development of properties and characteristics of neural nets, that aim at allowing these models to store and manage information in a manner that resembles the way the human brain does so. Our study revolves around the well-known transformer model and its variants, which have mainly been applied to natural language problems. We present topics coming from the areas of cognitive science, neuroscience and the study of causality and are related to the way the human mind thinks and works. We examine the division of the human mind into System 1 and System 2, which Kahneman analyzes in his book, Thinking Fast And Slow. We then describe selected parts of the human brain, demonstrating an existent tendency in it, towards specialization and division of information processing tasks, and list the main features of a theoretical model, whose goal is to describe the way the various brain parts communicate with each other, called Global Workspace Theory. We also explain how conclusions derived from studies in the field of causality justify the organization of information processing modules into independent mechanisms. We subsequently examine the possibility of integrating the aforementioned ideas into modern neural networks. We build on the work of Goyal and Bengio on inductive biases, which aim to determine the assumptions made by the training algorithms regarding the training conditions, as well as the hypotheses made by the neural models regarding the (causal) model that has generated the data set. We present a set of attempts to incorporate various types of inductive preferences, either in the training process or in famous neural network model architectures. We focus on efforts that seek to promote the specialization of the various model areas, mainly through competitive processes between those areas, leading to a modular architectures. We propose two modifications in the architectures of neural models that are based on the transformer model. We first propose the replacement of the feed-forward networks located in the transformer layers by a set of parallel networks of the same kind, which are trained through competitive processes, the winners of whose acquire the rights to process the respective elements of the input vector. We also propose the application of a similar system for the training of the attention heads of the same model, also based on a competition procedure among the heads of each layer. We apply the second method, whose goal is to train specialized attention heads, to the transformer model and then train it on neural machine translation problems, as well as to the BERT model, which we train on a natural language modeling problem. The two models show no clear signs of improvement in these problems compared to the baseline models. We examine the possible causes of this behavior and suggest a variety of possible solutions to these problems as well as directions for future research. | en |
heal.advisorName | Ποταμιάνος, Αλέξανδρος | el |
heal.advisorName | Potamianos, Alexandros | en |
heal.committeeMemberName | Ποταμιάνος, Αλέξανδρος | el |
heal.committeeMemberName | Κόλλιας, Στέφανος | el |
heal.committeeMemberName | Τζαφέστας, Κωνσταντίνος | el |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 210 σ. | el |
heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: