Lyrics and vocal melody generation conditioned on accompaniment

Melistas, Thomas; Μελίστας, Θωμάς

dc.contributor.author	Melistas, Thomas	en
dc.contributor.author	Μελίστας, Θωμάς	el
dc.date.accessioned	2021-04-02T06:54:42Z
dc.date.available	2021-04-02T06:54:42Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/53258
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.20956
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Δημιουργία στίχων και μουσικής	el
dc.subject	Βαθιά μάθηση	el
dc.subject	Επεξεργασία φυσικής γλώσσας	el
dc.subject	Γλωσσικά μοντέλα	el
dc.subject	Μουσική ανάλυση	el
dc.subject	Lyrics and symbolic music generation	en
dc.subject	Deep learning	en
dc.subject	Natural language processing	en
dc.subject	Language modeling	en
dc.subject	Music analysis	en
dc.title	Lyrics and vocal melody generation conditioned on accompaniment	en
dc.title	Αυτόματη παραγωγή στίχων και φωνητικής μελωδίας βάσει της μουσικής υπόκρουσης με τεχνικές βαθιάς μηχανικής μάθησης	el
dc.contributor.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	el
heal.type	bachelorThesis
heal.classification	Deep Learning	en
heal.classification	Natural Language Processing	en
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2021-03-12
heal.abstract	The purpose of this dissertation is to study the generation of lyrics and vocal melody for a given instrumental music piece. It is a novel, previously unexplored task. During the last few years, there has been increasing research interest over lyrics generation as a case of language modelling with domain specific structure and attributes, as well as regarding symbolic music generation. The correlation of lyrics and corresponding vocal melody has also recently started gaining attention and a few models that are able to generate lyrics conditioned on melody, and vice versa, have been developed. While the above research directions are very promising, they fail to capture the general musical context of the songwriting process. In the majority of contemporary music, singing coexists with accompaniment and its function is to both provide a melodic line, that is grounded on the instrumental part and advances it musically, as well as to promote the unfolding of a story through lyrical imagery. Moreover, former research on the matter has followed a proof-of-concept approach, working on the level of one or a few sentences, which is insufficient for capturing the structure and the recurring musical and lyrical themes present in a song. Our work models lyrics and vocal melody generation for a given music piece as a sequence-to-sequence task, using for the first time an efficient attention Transformer architecture trained on text event sequences, that describe entire songs. We build a symbolic music dataset, suitable for the described task, and we apply music theory analysis, compressing successfully our training data and making them key-independent. As a result, our models become faster to train and more robust. Furthermore, we come up with a novel architecture, that decouples lyric and melody generation, while also providing the ability to use any pretrained language model and optional conditioning on predefined lyrics. Finally, the output is used together with a singing voice synthesis model to create and add vocals to instrumental tracks, which we use for qualitative evaluation. To the best of our knowledge, this is the first attempt to study both the melodic and lyrical content of singing in relation to the musical context it is found in, and through that, automate the process a singer or songwriter would follow, when presented with an instrumental music piece, in order to enrich it with vocals. We believe that our work can fuel human creativity and provide interesting musical ideas.	en
heal.abstract	Θέμα της παρούσας διπλωματικής εργασίας είναι η αυτόματη παραγωγή στίχων και φωνητικής μελωδίας βάσει της μουσικής υπόκρουσης. Πρόκειται για ένα ανεξερεύνητο μέχρι στιγμής πρόβλημα. Τα τελευταία χρόνια υπάρχει ένα ολοένα αυξανόμενο ενδιαφέρον για την παραγωγή στίχων με γλωσσικά μοντέλα, λαμβάνοντας υπόψιν τις ιδιαιτερότητες στην δομή και το περιεχόμενο. Παράλληλα, έχει υπάρξει ενδιαφέρον για τη συσχέτιση στίχων και φωνητικής μελωδίας, ενώ έχουν αναπτυχθεί μοντέλα που μπορούν να προβλέπουν φωνητικές μελωδίες βάσει στίχων και το αντίστροφο. Ενώ η έρευνα σε αυτόν τον τομέα φαίνεται αρκετά υποσχόμενη μέχρι στιγμής, αποτυγχάνει να λάβει υπόψιν της το γενικότερο μουσικό πλαίσιο. Στη σύγχρονη μουσική, το τραγούδι συνυπάρχει μαζί με την ορχηστρική μουσική και έχει δύο βασικές λειτουργίες. Προσθέτει μια μελωδία η οποία ταιριάζει με την μουσική, και ενδύει στιχουργικά ένα κομμάτι λέγοντας μια ιστορία και προκαλώντας συναισθήματα. Επίσης, η προηγούμενη έρευνα αρκείται στο να μελετάει τη συσχέτιση μιας ή μερικών προτάσεων στίχων και δεν αναλύει την δομή των στίχων και της μελωδίας σε ολόκληρα κομμάτια και τα μοτίβα που προκύπτουν. Η παρούσα εργασία μοντελοποιεί το παραπάνω ως ένα πρόβλημα μετάφρασης ουσιαστικά από μία ακολουθία (μουσική) σε μια άλλη (φωνητικά), χρησιμοποιώντας για πρώτη φορά μοντέλα με μηχανισμούς προσοχής γραμμικής πολυπλοκότητας, τα οποία έχουν εκπαιδευτεί σε αναπαραστάσεις συμβολικής μουσικής. Χρησιμοποιούμε μουσικοθεωρητική ανάλυση για να μικρύνουμε το μέγεθος των ακολουθιών των δεδομένων μας και να τα κάνουμε ανεξάρτητα της μουσικής κλίμακας στην οποία είναι γραμμένα, πετυχαίνοντας έτσι πιο γρήγορη εκπαίδευση και πιο εύρωστα μοντέλα. Επίσης, δημιουργούμε και εφαρμόζουμε μια νέα αρχιτεκτονική για να χωρίσουμε την παραγωγή των στίχων και της μελωδίας, προσφέροντας τη δυνατότητα να χρησιμοποιηθεί κάποιο προεκπαιδευμένο γλωσσικό μοντέλο, αλλά και να χρησιμοποιηθούν δοσμένοι στίχοι, προαιρετικά. Τέλος, χρησιμοποιούμε ένα μοντέλο σύνθεσης φωνής για να διεξάγουμε ποιοτική αξιολόγηση των αποτελεσμάτων. Από όσο γνωρίζουμε, αυτή είναι η πρώτη απόπειρα να μελετηθεί ταυτόχρονα στιχουργικά και μελωδικά η συσχέτιση τραγουδιού και μουσικής. Ο κώδικας και τα εκπαιδευμένα μοντέλα αυτής της εργασίας μιμούνται τη διαδικασία που ακολουθεί ένας τραγουδιστής/τραγουδοποιός για να προσθέσει φωνητικά σε ένα κομμάτι και πιστεύουμε ότι μπορεί να προσφέρει έμπνευση σε καλλιτέχνες και όχι μόνο.	el
heal.advisorName	Potamianos, Alexandros	en
heal.committeeMemberName	Stamou, Georgios	en
heal.committeeMemberName	Giannakopoulos, Theodoros	en
heal.committeeMemberName	Potamianos, Alexandros	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής	el
heal.academicPublisherID	ntua
heal.numberOfPages	119 p.	en
heal.fullTextAvailability	false