HEAL DSpace

Towards Neural Models with System 2 - Type Capabilities

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Κολιός, Παναγιώτης el
dc.contributor.author Kolios, Panagiotis en
dc.date.accessioned 2023-09-01T08:40:03Z
dc.date.available 2023-09-01T08:40:03Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/57941
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.25638
dc.rights Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/gr/ *
dc.subject Μηχανική Μάθηση el
dc.subject Τεχνητή Νοημοσύνη el
dc.subject Νευρωνικά Δίκτυα el
dc.subject Causality en
dc.subject Cognitive Science en
dc.subject Natural Language Processing en
dc.subject Deep Learning en
dc.subject Transformers en
dc.title Towards Neural Models with System 2 - Type Capabilities en
heal.type bachelorThesis
heal.classification Computer Science - Machine Learning en
heal.language el
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2023-05-30
heal.abstract The subject of this thesis is the study and development of properties and characteristics of neural nets, that aim at allowing these models to store and manage information in a manner that resembles the way the human brain does so. Our study revolves around the well-known transformer model and its variants, which have mainly been applied to natural language problems. We present topics coming from the areas of cognitive science, neuroscience and the study of causality and are related to the way the human mind thinks and works. We examine the division of the human mind into System 1 and System 2, which Kahneman analyzes in his book, Thinking Fast And Slow. We then describe selected parts of the human brain, demonstrating an existent tendency in it, towards specialization and division of information processing tasks, and list the main features of a theoretical model, whose goal is to describe the way the various brain parts communicate with each other, called Global Workspace Theory. We also explain how conclusions derived from studies in the field of causality justify the organization of information processing modules into independent mechanisms. We subsequently examine the possibility of integrating the aforementioned ideas into modern neural networks. We build on the work of Goyal and Bengio on inductive biases, which aim to determine the assumptions made by the training algorithms regarding the training conditions, as well as the hypotheses made by the neural models regarding the (causal) model that has generated the data set. We present a set of attempts to incorporate various types of inductive preferences, either in the training process or in famous neural network model architectures. We focus on efforts that seek to promote the specialization of the various model areas, mainly through competitive processes between those areas, leading to a modular architectures. We propose two modifications in the architectures of neural models that are based on the transformer model. We first propose the replacement of the feed-forward networks located in the transformer layers by a set of parallel networks of the same kind, which are trained through competitive processes, the winners of whose acquire the rights to process the respective elements of the input vector. We also propose the application of a similar system for the training of the attention heads of the same model, also based on a competition procedure among the heads of each layer. We apply the second method, whose goal is to train specialized attention heads, to the transformer model and then train it on neural machine translation problems, as well as to the BERT model, which we train on a natural language modeling problem. The two models show no clear signs of improvement in these problems compared to the baseline models. We examine the possible causes of this behavior and suggest a variety of possible solutions to these problems as well as directions for future research. en
heal.advisorName Ποταμιάνος, Αλέξανδρος el
heal.advisorName Potamianos, Alexandros en
heal.committeeMemberName Ποταμιάνος, Αλέξανδρος el
heal.committeeMemberName Κόλλιας, Στέφανος el
heal.committeeMemberName Τζαφέστας, Κωνσταντίνος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής el
heal.academicPublisherID ntua
heal.numberOfPages 210 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού - Μη Εμπορική Χρήση - Παρόμοια Διανομή 3.0 Ελλάδα