HEAL DSpace

An exploration of deep learning architectures for handwritten text recognition

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Τασσοπούλου, Βασιλική el
dc.contributor.author Tassopoulou, Vasiliki en
dc.date.accessioned 2020-05-07T08:42:14Z
dc.date.available 2020-05-07T08:42:14Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/50408
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.18106
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ *
dc.subject Connectionist temporal classification en
dc.subject Convolutional neural networks en
dc.subject Decoding algorithms en
dc.subject Handwritten text recognition en
dc.subject Multitask learning en
dc.subject Αναγνώριση χειρόγραφων κειμένων el
dc.subject Connectionist temporal classification el
dc.subject Αλγόριθμοι αποκωδικοποίησης el
dc.subject Μοντελοποίηση ακολουθίας el
dc.subject Συνελικτικά νευρωνικά δίκτυα el
dc.title An exploration of deep learning architectures for handwritten text recognition en
heal.type bachelorThesis
heal.classification Computer Vision en
heal.classification Pattern Recognition en
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2019-11-06
heal.abstract The objective of this thesis is the study of the Handwritten Text Recognition problem with the use of deep learning models. In this thesis, we experiment with a variety of tasks that apply to the whole pipeline that synthesizes our final model. At first, we implement the baseline architecture and then we experiment with dynamic data augmentation. We implement two new augmentation techniques, the local affine transform, and the local morphological transform. Our incentive behind this is the implementation of transformations that will augment the letters and not the whole text line. Generally, we deduced that dynamic data augmentation makes the model more able to generalize and improves recognition rates. Then, we experiment with the CTC alignments that our model learns. We augment the target sequence with bigrams, except for unigrams. We train such complex alignments so as to obtain a bigram level visual language model and we utilize it in two new CTC beam search decoding algorithms, extended in such way so as to support the integration of obtained bigram information, in order to improve the recognition rates. Thereinafter, we experiment with multitask architectures with CTC, both hierarchical and block. Our experiments culminate in significant improvement in the recognition rate. With the multitask approach we exploit the language information (domain knowledge) in two ways. We integrate it both in the learning procedure via the ngrams, that are selected as target units, and the decoding process via the statistical language models. Finally, we implement a fully convolutional architecture where both the optical and sequential models were composed of convolutions. We show that the CTC layer can be successfully employed on top of a CNN network. Also, we found out that one-dimensional convolution can model sufficiently the temporal relationships among the features. Finally, our fully convolutional model converges fast, has significantly lower training and inference time and has also respectfully fewer parameters than the aforementioned architectures. en
heal.advisorName Μαραγκός, Πέτρος el
heal.committeeMemberName Τζαφέστας, Κωνσταντίνος el
heal.committeeMemberName Ψυλάκης, Χαράλαμπος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Επικοινωνιών, Ηλεκτρονικής και Συστημάτων Πληροφορικής. Εργαστήριο Ακουστικής Επικοινωνίας και Τεχνολογίας Μέσων Μαζικής Επικοινωνίας el
heal.academicPublisherID ntua
heal.numberOfPages 162 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα