Singing voice separation using waveform-level deep
neural networks

Παπαντωνάκης, Παναγιώτης; Papantonakis, Panagiotis

dc.contributor.author	Παπαντωνάκης, Παναγιώτης	el
dc.contributor.author	Papantonakis, Panagiotis	en
dc.date.accessioned	2022-02-09T10:52:23Z
dc.date.available	2022-02-09T10:52:23Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/54607
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.22305
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	Source separation	en
dc.subject	Singing voice separation	en
dc.subject	Conv-TasNet	en
dc.subject	Wave-U-Net	en
dc.subject	Convolutional neural networks	en
dc.subject	Διαχωρισμός πηγών	el
dc.subject	Διαχωρισμός φωνητικών	el
dc.subject	Συνελικτικά νευρωνικά δίκτυα	el
dc.subject	Conv-TasNet	en
dc.subject	Wave-U-Net	en
dc.title	Singing voice separation using waveform-level deep neural networks	en
dc.title	Διαχωρισμός Φωνητικών χρησιμοποιώντας Βαθιά Νευρωνικά Δίκτυα σε Επίπεδο Κυματομορφών	el
heal.type	bachelorThesis
heal.classification	Αναγνώριση Προτύπων	el
heal.classification	Μηχανική Μάθηση	el
heal.classification	Computer Audition	en
heal.language	el
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2021-11-05
heal.abstract	Singing Voice Separation (SVS) is an important task of Computer Audition, that has been studied intensively for many years. The problem can be described as the automatic isolation of the vocal component from a given musical mixture, without prior knowledge on the properties of the participating signals. Recently, there has been an increase in both the quantity and quality of SVS techniques in the waveform domain, with some models achieving state-of-the-art results. In this thesis we experiment with two of the top performing deep architectures in the waveform domain, using the MUSDB18 dataset. In the first part we reimplement Wave-U-Net, a deep autoencoder architecture with skip connections, along with several modifications, already proposed by other studies. We then perform an ablation study on different model configurations, by enabling individual or multiple modifications each time, in order to examine their effect on the model’s performance. In the second part we experiment with Conv-TasNet, an architecture that transforms the waveform input to a latent space, suitable for separation, constructs and applies a multiplicative mask for each source and then transforms the signal back to the time domain, proposing multiple novel modifications. Preliminary, exploratory experiments indicated that a parallel multi-band separation technique that splits the encoded signal in latent space bands and then processes each band individually, using multiple separators, could be beneficial to the model, as it provided a significant performance boost. As a result, we subsequently proceeded with an in-depth analysis of it, regarding its efficacy and scalability. The results show that the proposed method achieves competitive performance by taking advantage of the discriminative characteristics of each band and generating specialised separators, while keeping the amount of trainable parameters the same. In the last part of the thesis, we combine the proposed multi-band modification with two different encoders proposed in other studies, a trainable one that combines features derived from both waveform and time-frequency domains and a fixed one that models the human auditory system using a gammatone filterbank. Although the results for the former encoder do not display some kind of improvement, the results for the latter point towards performance improvements, with the assistance of a linear layer for band selection.	en
heal.advisorName	Μαραγκός, Πέτρος	el
heal.committeeMemberName	Τζαφέστας, Κωνσταντίνος	el
heal.committeeMemberName	Ποταμιάνος, Γεράσιμος	el
heal.committeeMemberName	Μαραγκός, Πέτρος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής	el
heal.academicPublisherID	ntua
heal.numberOfPages	111 σ.	el
heal.fullTextAvailability	false