COVID-19 Diagnosis from cough samples using Deep Learning methods

Βαλεργάκη, Παρασκευή; Valergaki, Paraskefi

dc.contributor.author	Βαλεργάκη, Παρασκευή	el
dc.contributor.author	Valergaki, Paraskefi	en
dc.date.accessioned	2023-01-24T07:52:53Z
dc.date.available	2023-01-24T07:52:53Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/56850
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.24548
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.subject	COVID-19	en
dc.subject	Multistage Transfer Learning	en
dc.subject	Deep Learning	en
dc.subject	Cough Detection	en
dc.subject	Image Classification	en
dc.subject	Βαθιά μάθηση	el
dc.subject	Εκμάθηση Συνόλου	el
dc.subject	Συνελικτικά Επαναλαμβανόμενα Νευρωνικά Δίκτυα	el
dc.subject	Ερμηνευσιμότητα	el
dc.subject	Εκμάθηση Συνόλου	el
dc.title	COVID-19 Diagnosis from cough samples using Deep Learning methods	en
heal.type	bachelorThesis
heal.classification	Neural Networks	en
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2022-08-30
heal.abstract	Coronavirus disease of 2019 (COVID-19) has affected the lives of millions of people around the globe. Up until July 2022, there were 569.771.691 active cases of COVID-19 globally, and there had been 6.383.776 deaths. The virus is mainly transmitted through droplets generated when an infected person coughs, sneezes, or exhales. The most common occurring symptoms are fever, cough, and fatigue. The current diagnosis method is performed through Reverse-Transcription Polymer Chain Reaction (RT-PCR) testing. However, scarcity, cost, long turnaround time of clinical testing and the fact that they can lead to another infection if done improperly are some downsides of the RT-PCR testing. Furthermore, the in-person testing methods put the medical staff, particularly those with limited protection, at serious risk of infection. Vaccination remains a key component of the approach needed to reduce the impact of SARS-CoV-2. Unfortunately, variants of Covid-19 reduce at some point the effectiveness of vaccines, subsequently leading to reinfections. Therefore, the need for constant testing remains as immunity is often threatened by mutations. The current thesis aims at demonstrating the feasibility of the automatic detection of COVID-19 from cough sounds. This type of screening is non-contact, easy to apply, and can reduce the workload in testing centres as well as limit transmission. Two datasets have been used in this thesis, containing coughs from people from all continents, namely the “Coswara” and the “Cambridge” dataset. Dataset skew was addressed by applying an ensemble learning approach so that the Covid class is not underrepresented. The preprocessing step involves cough detection to identify if and when the cough is present in the raw audio recordings. The datasets are crowdsourced which means that the collected sounds are from differing environments and the quality of the microphone is disputed. As a result, the models could be highly prone to overfitting to unwanted signals. To address this issue, the sound files classified by the cough detector as cough are denoised. Data augmentation is applied to address data scarcity, since Deep Learning Architectures are data hungry. Then, audio samples are converted to mel spectrograms. For the Covid-19 classification task, nine different deep learning architectures are tested and presented in this thesis. Specifically, CNNs combined with bidirectional Long Short-Term Memory (BiLSTM) and bidirectional Gated Recurrent Units (BiGRU) networks in conjunction with an attention mechanism, are implemented. Three pretrained networks on ImageNet and an ensemble model consisting of them are presented as well. VGG-13 and DenseNet Speech, an architecture used to a prior study for voice recognition and keyword spotting, are also implemented. Temporal CRNNs seem to produce promising and consistent results in Covid-19 detection. Multistage Transfer Learning process consists of three stages of transfer learning and uses all of the available datasets. This pretraining on cough related tasks leads to higher classification results for the Cambridge dataset. Eventually, an interpretability attempt of InceptionResnetV2 has been made on mel spectrograms using Local Interpretable Model-agnostic Explanations. The best classification results, obtained through 5-fold cross validation and TCRNNs, have reached an accuracy of 76,67% and an AUC of 76,16%. These results demonstrate that cough can potentially serve as a helpful triage or diagnostic tool for Covid-19 infection. Since this type of cough audio classification is cost-effective and easy to deploy, it is potentially a useful and viable means of non-contact COVID-19 screening.	en
heal.abstract	Η νόσος του κορωνοϊού του 2019 (COVID-19), που προκαλεί το Σοβαρό Οξύ Αναπνευστικό Σύνδρομο τύπου 2 (SARS-CoV-2) έχει επηρεάσει τις ζωές εκατομμυρίων ανθρώπων σε όλο τον κόσμο. Μέχρι τον Ιούλιο του 2022, υπήρχαν 569.771.691 ενεργά κρούσματα COVID-19 παγκοσμίως και είχαν καταγραφεί 6.383.776 θάνατοι. Ο ιός μεταδίδεται κυρίως μέσω σταγονιδίων που δημιουργούνται όταν ένα μολυσμένο άτομο βήχει, φτερνίζεται ή εκπνέει. Τα πιο συχνά εμφανιζόμενα συμπτώματα είναι πυρετός, βήχας και κόπωση. Η τρέχουσα μέθοδος διάγνωσης βασίζεται στη δοκιμή Αλυσιδωτής Αντίδρασης Πολυμεράσης Αντίστροφης Μεταγραφής (RT-PCR). Ωστόσο, η σπανιότητα, το κόστος και ο μεγάλος χρόνος διεκπεραίωσης είναι μερικά μειονεκτήματα της δοκιμής RT-PCR. Επιπλέον, αυτή η διαγνωστική μέθοδος θέτει το ιατρικό προσωπικό σε κίνδυνο λοίμωξης κατά τη διάρκεια της δειγματοληψίας. Ο εμβολιασμός αποτελεί σημαντικό όπλο αντιμετώπισης του κορωνοϊού. Δυστυχώς, οι παραλλαγές της νόσου COVID-19 μπορούν να μειώσουν κάποια στιγμή την αποτελεσματικότητα των εμβολίων, οδηγώντας στη συνέχεια σε επαναλοιμώξεις. Ως εκ τούτου, η ανάγκη για συνεχείς ελέγχους νόσησης παραμένει, καθώς η ανοσία συχνά απειλείται από μεταλλάξεις. Η παρούσα διπλωματική εργασία στοχεύει στη διερεύνηση μεθόδων Βαθιάς Μάθησης για την ανίχνευση της νόσου COVID-19 από ήχους βήχα. Αυτός ο τύπος ελέγχου είναι χωρίς επαφή, είναι εύκολος στην εφαρμογή και μπορεί να μειώσει τον φόρτο εργασίας στα κέντρα ελέγχου καθώς και να περιορίσει τη μετάδοση. Σε αυτή την εργασία έχουν χρησιμοποιηθεί δύο σύνολα δεδομένων, που περιέχουν αρχεία βήχα από συμμετέχοντες από διάφορες χώρες, το σύνολο δεδομένων Coswara και το σύνολο δεδομένων του Cambridge. Η ανισορροπία του συνόλου δεδομένων αντιμετωπίστηκε με την εφαρμογή μιας προσέγγισης εκμάθησης συνόλου, έτσι ώστε η τάξη Covid να μην υποεκπροσωπείται. Το στάδιο προεπεξεργασίας περιλαμβάνει την ανίχνευση βήχα για τον προσδιορισμό του εάν και πότε υπάρχει βήχας στις ακατέργαστες ηχογραφήσεις. Τα σύνολα δεδομένων είναι crowd-sourced, πράγμα που σημαίνει ότι οι ήχοι που συλλέγονται προέρχονται από διαφορετικά περιβάλλοντα και η ποιότητα του μικροφώνου αμφισβητείται. Ως αποτέλεσμα, τα μοντέλα θα μπορούσαν να είναι πολύ επιρρεπή στην υπερβολική προσαρμογή σε ανεπιθύμητα σήματα. Για να αντιμετωπιστεί αυτό το ζήτημα, στα αρχεία ήχου που έχουν ταξινομηθεί ως ηχητικά σήματα βήχα αφαιρείται ο θόρυβος. Η επαύξηση δεδομένων εφαρμόζεται για την αντιμετώπιση του μικρού συνόλου δεδομένων, καθώς οι Αρχιτεκτονικές Βαθιάς Μάθησης (Deep Learning) απαιτούν μεγάλο όγκο δεδομένων. Στη συνέχεια, τα ηχητικά δείγματα μετατρέπονται σε φασματογραφήματα mel (mel spectrograms). Για την ταξινόμηση των δειγμάτων σε COVID-19 ή non COVID-19, δοκιμάζονται και παρουσιάζονται σε αυτή την εργασία εννέα διαφορετικές αρχιτεκτονικές βαθιάς μάθησης. Συγκεκριμένα, υλοποιούνται Συνελικτικά Νευρωνικά Δίκτυα (CNN) σε συνδυασμό με αμφίδρομα δίκτυα Long-Short-Term Memory (BiLSTM) και αμφίδρομα Gated Recurrent Units (BiGRU) σε συνδυασμό με μηχανισμό προσοχής (attention mechanism). Παρουσιάζονται επίσης τρία προεκπαιδευμένα δίκτυα στο ImageNet και ένα μοντέλο συνόλου που αποτελείται από αυτά. Επίσης υλοποιούνται το VGG-13 και το DenseNet Speech, μια αρχιτεκτονική που χρησιμοποιήθηκε σε προηγούμενη μελέτη για την αναγνώριση φωνής και τον εντοπισμό λέξεων-κλειδιών. Η μελέτη ανέδειξε ότι τα CRNN παρέχουν υποσχόμενα αποτελέσματα στην ανίχνευση της νόσου COVID-19. Στη συνέχεια, η διαδικασία εκμάθησης μεταφοράς πολλαπλών σταδίων αποτελείται από τρία στάδια μεταφοράς εκμάθησης και χρησιμοποιεί όλα τα διαθέσιμα σύνολα δεδομένων. Αυτή η προεκπαίδευση σε διαδικασίες που σχετίζονται με τον βήχα οδηγεί σε υψηλότερα αποτελέσματα ταξινόμησης για το σύνολο δεδομένων του Cambridge. Επιπρόσθετα, έγινε μια προσπάθεια ερμηνευσιμότητας του InceptionResnetV2 σε mel spectrograms με χρήση τοπικών ερμηνευτικών μοντέλων LIME (Local Interpretable Model-Agnostic Explanations). Τα καλύτερα αποτελέσματα ταξινόμησης, που προέκυψαν μέσω 5-fold cross validation και TCRNN, έχουν φτάσει σε ακρίβεια 76,67% και AUC 76,16%.Τα παραπάνω αποτελέσματα έδειξαν ότι τα αρχεία βήχα μπορούν να χρησιμοποιηθούν ως εργαλείο διαλογής/διάγνωσης για τη νόσο COVID-19.	el
heal.advisorName	Νικήτα, Κωνσταντίνα	el
heal.committeeMemberName	Σταφυλοπάτης, Ανδρέας-Γεώργιος	el
heal.committeeMemberName	Στάμου, Γεώργιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Συστημάτων Μετάδοσης Πληροφορίας και Τεχνολογίας Υλικών. Εργαστήριο Βιοϊατρικών Προσομοιώσεων και Απεικονιστικής Τεχνολογίας	el
heal.academicPublisherID	ntua
heal.numberOfPages	97 σ.	el
heal.fullTextAvailability	false