HEAL DSpace

Singing voice separation using waveform-level deep neural networks

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Παπαντωνάκης, Παναγιώτης el
dc.contributor.author Papantonakis, Panagiotis en
dc.date.accessioned 2022-02-09T10:52:23Z
dc.date.available 2022-02-09T10:52:23Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/54607
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.22305
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ *
dc.subject Source separation en
dc.subject Singing voice separation en
dc.subject Conv-TasNet en
dc.subject Wave-U-Net en
dc.subject Convolutional neural networks en
dc.subject Διαχωρισμός πηγών el
dc.subject Διαχωρισμός φωνητικών el
dc.subject Συνελικτικά νευρωνικά δίκτυα el
dc.subject Conv-TasNet en
dc.subject Wave-U-Net en
dc.title Singing voice separation using waveform-level deep neural networks en
dc.title Διαχωρισμός Φωνητικών χρησιμοποιώντας Βαθιά Νευρωνικά Δίκτυα σε Επίπεδο Κυματομορφών el
heal.type bachelorThesis
heal.classification Αναγνώριση Προτύπων el
heal.classification Μηχανική Μάθηση el
heal.classification Computer Audition en
heal.language el
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2021-11-05
heal.abstract Singing Voice Separation (SVS) is an important task of Computer Audition, that has been studied intensively for many years. The problem can be described as the automatic isolation of the vocal component from a given musical mixture, without prior knowledge on the properties of the participating signals. Recently, there has been an increase in both the quantity and quality of SVS techniques in the waveform domain, with some models achieving state-of-the-art results. In this thesis we experiment with two of the top performing deep architectures in the waveform domain, using the MUSDB18 dataset. In the first part we reimplement Wave-U-Net, a deep autoencoder architecture with skip connections, along with several modifications, already proposed by other studies. We then perform an ablation study on different model configurations, by enabling individual or multiple modifications each time, in order to examine their effect on the model’s performance. In the second part we experiment with Conv-TasNet, an architecture that transforms the waveform input to a latent space, suitable for separation, constructs and applies a multiplicative mask for each source and then transforms the signal back to the time domain, proposing multiple novel modifications. Preliminary, exploratory experiments indicated that a parallel multi-band separation technique that splits the encoded signal in latent space bands and then processes each band individually, using multiple separators, could be beneficial to the model, as it provided a significant performance boost. As a result, we subsequently proceeded with an in-depth analysis of it, regarding its efficacy and scalability. The results show that the proposed method achieves competitive performance by taking advantage of the discriminative characteristics of each band and generating specialised separators, while keeping the amount of trainable parameters the same. In the last part of the thesis, we combine the proposed multi-band modification with two different encoders proposed in other studies, a trainable one that combines features derived from both waveform and time-frequency domains and a fixed one that models the human auditory system using a gammatone filterbank. Although the results for the former encoder do not display some kind of improvement, the results for the latter point towards performance improvements, with the assistance of a linear layer for band selection. en
heal.advisorName Μαραγκός, Πέτρος el
heal.committeeMemberName Τζαφέστας, Κωνσταντίνος el
heal.committeeMemberName Ποταμιάνος, Γεράσιμος el
heal.committeeMemberName Μαραγκός, Πέτρος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής el
heal.academicPublisherID ntua
heal.numberOfPages 111 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα