dc.contributor.author | Παπαντωνάκης, Παναγιώτης | el |
dc.contributor.author | Papantonakis, Panagiotis | en |
dc.date.accessioned | 2022-02-09T10:52:23Z | |
dc.date.available | 2022-02-09T10:52:23Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/54607 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.22305 | |
dc.rights | Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ | * |
dc.subject | Source separation | en |
dc.subject | Singing voice separation | en |
dc.subject | Conv-TasNet | en |
dc.subject | Wave-U-Net | en |
dc.subject | Convolutional neural networks | en |
dc.subject | Διαχωρισμός πηγών | el |
dc.subject | Διαχωρισμός φωνητικών | el |
dc.subject | Συνελικτικά νευρωνικά δίκτυα | el |
dc.subject | Conv-TasNet | en |
dc.subject | Wave-U-Net | en |
dc.title | Singing voice separation using waveform-level deep neural networks | en |
dc.title | Διαχωρισμός Φωνητικών χρησιμοποιώντας Βαθιά Νευρωνικά Δίκτυα σε Επίπεδο Κυματομορφών | el |
heal.type | bachelorThesis | |
heal.classification | Αναγνώριση Προτύπων | el |
heal.classification | Μηχανική Μάθηση | el |
heal.classification | Computer Audition | en |
heal.language | el | |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2021-11-05 | |
heal.abstract | Singing Voice Separation (SVS) is an important task of Computer Audition, that has been studied intensively for many years. The problem can be described as the automatic isolation of the vocal component from a given musical mixture, without prior knowledge on the properties of the participating signals. Recently, there has been an increase in both the quantity and quality of SVS techniques in the waveform domain, with some models achieving state-of-the-art results. In this thesis we experiment with two of the top performing deep architectures in the waveform domain, using the MUSDB18 dataset. In the first part we reimplement Wave-U-Net, a deep autoencoder architecture with skip connections, along with several modifications, already proposed by other studies. We then perform an ablation study on different model configurations, by enabling individual or multiple modifications each time, in order to examine their effect on the model’s performance. In the second part we experiment with Conv-TasNet, an architecture that transforms the waveform input to a latent space, suitable for separation, constructs and applies a multiplicative mask for each source and then transforms the signal back to the time domain, proposing multiple novel modifications. Preliminary, exploratory experiments indicated that a parallel multi-band separation technique that splits the encoded signal in latent space bands and then processes each band individually, using multiple separators, could be beneficial to the model, as it provided a significant performance boost. As a result, we subsequently proceeded with an in-depth analysis of it, regarding its efficacy and scalability. The results show that the proposed method achieves competitive performance by taking advantage of the discriminative characteristics of each band and generating specialised separators, while keeping the amount of trainable parameters the same. In the last part of the thesis, we combine the proposed multi-band modification with two different encoders proposed in other studies, a trainable one that combines features derived from both waveform and time-frequency domains and a fixed one that models the human auditory system using a gammatone filterbank. Although the results for the former encoder do not display some kind of improvement, the results for the latter point towards performance improvements, with the assistance of a linear layer for band selection. | en |
heal.advisorName | Μαραγκός, Πέτρος | el |
heal.committeeMemberName | Τζαφέστας, Κωνσταντίνος | el |
heal.committeeMemberName | Ποταμιάνος, Γεράσιμος | el |
heal.committeeMemberName | Μαραγκός, Πέτρος | el |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Σημάτων, Ελέγχου και Ρομποτικής | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 111 σ. | el |
heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: