Aυτόματη αναγνώριση ομιλίας σχολιαστών ποδοσφαίρου
με χρήση τεχνικών βαθιάς μάθησης

Τσιλιγιάννη, Ελένη; Tsiligianni, Eleni

dc.contributor.author	Τσιλιγιάννη, Ελένη	el
dc.contributor.author	Tsiligianni, Eleni	en
dc.date.accessioned	2024-02-09T08:01:51Z
dc.date.available	2024-02-09T08:01:51Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/58810
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.26506
dc.description	Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση"	el
dc.rights	Αναφορά Δημιουργού-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nd/3.0/gr/	*
dc.subject	Αυτόματη Αναγνώριση Ομιλίας από άκρο-σε-άκρο	el
dc.subject	Word Error Rate	en
dc.subject	Self-Supervised Learning	en
dc.subject	Ακουστικό Μοντέλο	el
dc.subject	Γλωσσικό Μοντέλο	el
dc.subject	Αυτο-εποπτευόμενη μάθηση	el
dc.subject	Βαθιά Μάθηση	el
dc.subject	End-to-End Automatic Speech Recognition	en
dc.subject	Connectionist Temporal Classification	en
dc.title	Aυτόματη αναγνώριση ομιλίας σχολιαστών ποδοσφαίρου με χρήση τεχνικών βαθιάς μάθησης	el
heal.type	masterThesis
heal.secondaryTitle	Automatic Recognition in football commentator Speech using Deep Learning techniques	en
heal.classification	Βαθιά Μηχανική Μάθηση, Επιστήμη Δεδομένων, Επεξεργασία Σήματος	el
heal.language	el
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2023-09-15
heal.abstract	Σκοπός της παρούσας μεταπτυχιακής διπλωματικής εργασίας είναι η μελέτη συστημάτων αυτόματης αναγνώρισης ομιλίας και η δημιουργία ενός τέτοιου συστήματος ειδικού σκοπού που εφαρμόζεται πάνω σε δεδομένα περιγραφών ποδοσφαιρικών αγώνων. Μελετήθηκαν κλασικές μέθοδοι κατασκευής ASR συστημάτων που χρησιμοποιούνται στην παραγωγή αλλά και οι πιο σύγχρονες, όπως τα συστήματα από άκρη-σε-άκρη που χρησιμοποιούν μετασχηματιστές, ως επί το πλείστον, όπως και το δικό μας σύστημα. Λάβαμε υπόψιν 3 σετ δεδομένων εκ των οποίων τα 2 ήταν γενικού σκοπού και περιεχομένου ενώ το τρίτο αφορούσε τον ειδικό τομέα. Η πρωτόλεια μορφή αυτού του σετ δεδομένων μας ελήφθη αρχικά ως βίντεο, ενώ στη συνέχεια έγινε εξαγωγή του ηχητικού σήματος, ώστε να παραχθεί το κατάλληλο σύνολο χαρακτηριστικών που να μπορεί να τροφοδοτήσει με ορθό και αποτελεσματικό τρόπο μία διάταξη βαθιάς μάθησης που αποτελεί το ακουστικό μοντέλο. Στο κείμενο αναφοράς που παράχθηκε έγινε διόρθωση και υποσημείωση χειροκίνητα, για να υπολογιστεί σωστά η επίδοση για καθένα από τα πειράματα και η σύγκριση μεταξύ τους. Στα πειράματα που ακολούθησαν αναλύθηκαν η διαδικασία τόσο της εκπαίδευσης όσο και του fine-tuning του μοντέλου για τους διαφορετικούς συνδυασμούς των σετ δεδομένων. Η εργασία ολοκληρώνεται με προτάσεις βελτίωσης της απόδοσης αλλά και μελλοντικών επεκτάσεων χρήσης του συστήματος.	el
heal.abstract	The purpose of this master’s thesis is the study of Automatic Speech Recognition systems and the development of such a system, that is domain specific and trained on football commentator speech data. We examined the standard techniques of constructing an ASR system that is used in production as well as the well-performing state of the art ones, so called End-to-End ASR systems that rely on Transformers. The model that we built in this diploma thesis is an End-to-End Transformer based model, that also uses a Language Model. We are considering 3 datasets, 2 of them of general context and various domains, while the third was targeting a specific domain, the football related speech. The raw data of the latter set is downloaded in video format so we extracted the audio signal, in order to acquire the necessary features that will be properly and efficiently fed into a deep learning network, which is our acoustic model. The ground truth text that was produced had to pass through a manual error correction and annotation process, in order to calculate the performance of the system and apply comparisons between the various experiments. In the experiments that followed we analyzed both training and fine-tuning processes of the model for each dataset combination. In the end the thesis is completed with proposals on performance improvement and some scenario cases on which the E2E ASR system could be used and ported in the future.	en
heal.advisorName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Στάμου, Γεώργιος	el
heal.committeeMemberName	Γιαννακόπουλος, Θεόδωρος	el
heal.committeeMemberName	Βουλόδημος, Αθανάσιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	72 σ.	el
heal.fullTextAvailability	false