Hardware Acceleration Techniques for Computation and Data
Intensive Machine Learning and Bioinformatic Applications

Κολιογεώργη, Κωνσταντίνα; Koliogeorgi, Konstantina

dc.contributor.author	Κολιογεώργη, Κωνσταντίνα	el
dc.contributor.author	Koliogeorgi, Konstantina	en
dc.date.accessioned	2023-08-22T10:01:18Z
dc.date.available	2023-08-22T10:01:18Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/57908
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.25605
dc.rights	Default License
dc.subject	FPGA Acceleration	en
dc.subject	Design Space Exploration	el
dc.subject	Short Read Alignment	el
dc.subject	SW/HW Co-design	el
dc.subject	Next-Generation Sequencing	el
dc.subject	Επιτάχυνση Υλικού	el
dc.subject	Γλώσσες Σύνθεσης Υψηλού Επιπέδου	el
dc.subject	Αλληλούχιση γονιδιώματος	el
dc.subject	Μετασχηματισμός Κώδικα	el
dc.subject	Διερεύνηση Χώρου Σχεδίασης	el
dc.title	Hardware Acceleration Techniques for Computation and Data Intensive Machine Learning and Bioinformatic Applications	en
dc.title	Τεχνικές Επιτάχυνσης σε Hardware για εφαρμογές Τεχνητής Νοημοσύνης και Βιοπληροφορικής απαιτητικές σε Υπολογισμούς και Δεδομένα
heal.type	doctoralThesis
heal.classification	Computer Science	en
heal.language	el
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2023-05-02
heal.abstract	In this thesis, we focus on the hardware acceleration of two representative applications of modern healthcare: a ML-based prediction analysis and Read Alignment of genomic data. Both fields experience an intense growth in the latest decades and generate an immense amount of raw data. Creating value and making decisions based on these data have proved to be a challenging task as both the datasets as well as the computational intensity of the algorithms continue to escalate. To cope with this issue, High Performance techniques such as hardware acceleration have been examined. There is a great surge of works that leverage different programming models and frameworks to develop efficient FPGA-based accelerators, thanks to the bit-level customization capabilities of the devices. However, the frameworks available for programming such devices cannot always straightforwardly fully exploit the acceleration prospects of the applications. Furthermore, in complex applications existing solutions are characterized by a narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In the current research work, the core contribution is based on the delivery of efficient solutions through strategic exploration of the design space and the synergy of hardware and software code modifications. The first application that this thesis examines is efficient hardware acceleration of Support Vector Machine (SVM) classifiers. SVMs have played a crucial role in providing data fusion and high accuracy classification solutions for various, complex, non-linear problems. In this thesis, we explore an application that SVM hardware co-processors perform classification for ECG signal arrhythmia detection. The proposed methodology for accelerating the SVM has been implemented as a framework on top of the state-of-art Vivado High-Level Synthesis (HLS) tool. We propose a systematic two-level approach for SVM acceleration, which first optimizes the global structure of the original SVM's behavioral description to assist the tool in infering the inherent data- and instruction-level parallelism of the algorithm. The second level of optimization further refines the design through a targeted design exploration that matches the accelerator's memory architecture to its computation and memory access patterns. In the second part of the thesis, we study the effect of acceleration techniques on one of the major bottlenecks of a typical genomic pipeline, which is short read alignment. In our study we perform extensive profiling on a popular aligner and identify the bottleneck within alignment as the string-matching algorithm Smith-Waterman. Our approach is to provide a dataflow implementation for this task that targets FPGA devices by taking into account the implications of integrating the accelerator in the original software tool. We therefore present GANDAFL, a novel genome alignment dataflow architecture for Smith-Waterman Matrix-fill and Traceback stages to perform high throughput short-read alignment on Next Generation Sequencing data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that implements an aggregation-batching strategy and feeds the accelerator in high-throughput streaming fashion with minimized transfer and call overheads. The standalone solution delivers up to x116 and x2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a x1.9 speedup. We also examine an alternative approach to accelerating short read alignment. We introduce a high throughput alignment system that combines Banded SmithWaterman accelerators and pre-filtering for alignment optimization by introducing a profile-driven accelerator methdology. Extensive profiling of genomic datasets reveals low edit thresholds that can be leveraged by a heuristic of SmithWaterman, i.e. Banded SmithWaterman, to create resource-efficient accelerators that are customized to the edit profile of the input. We therefore design and deliver a highly optimized dataflow implementation for Banded Smith-Waterman seed-extension targeting FPGA devices, which is leveraged within a multi-dataflow accelerated system. The multi-dataflow system covers the full range of edits and therefore achieves both high throughput as well as high accuracy alignments. The evaluation shows that the proposed Banded Smith-Waterman accelerator delivers a x34 speedup over state-of-the-art software aligners and x1.53 and x3 over state-of-the-art dataflow and RTL SmithWaterman accelerators respectively. The multi-dataflow system delivers average speedups of x1.8 over state-of-art multi-accelerator FPGA solutions that employ generic and input-agnostic accelerators.	en
heal.abstract	Σε αυτή τη διατριβή επικεντρωνόμαστε στην υλοποίηση υλικού επιτάχυνσης για δύο αντιπροσωπευτικές εφαρμογές του σύγχρονου τομέα της υγείας: μια ανάλυση πρόβλεψης που βασίζεται στη μηχανική μάθηση και η ευθυγράμμιση ανάγνωσης γονιδιωματικών δεδομένων. Και οι δύο τομείς βιώνουν έντονη ανάπτυξη τις τελευταίες δεκαετίες και παράγουν έναν τεράστιο όγκο ακατέργαστων δεδομένων, πλούσιο σε πληροφορία. Η ερμηνεία και η λήψη αποφάσεων βασισμένων σε αυτά τα δεδομένα έχουν αποδειχθεί δύσκολες εργασίες καθώς τα δεδομένα και η υπολογιστική πολυπλοκότητα των αλγορίθμων αυξάνονται εκθετικά. Για να αντιμετωπιστεί αυτό το πρόβλημα, έχουν εξεταστεί τεχνικές υψηλής απόδοσης όπως η επιτάχυνση σε hardware. Υπάρχει μια πληθώρα ερευνητικών εργασιών που αξιοποιούν διαφορετικά μοντέλα προγραμματισμού για να αναπτύξουν αποτελεσματικούς επιταχυντές βασισμένους σε FPGA, χάρη στην ευελιξία προγραμματισμού τους σε επίπεδο bit. Ωστόσο, τα διαθέσιμα μοντέλα προγραμματισμού για την προγραμματισμό τέτοιων συσκευών δεν μπορούν πάντα να εκμεταλλευτούν πλήρως τις προοπτικές επιτάχυνσης των εφαρμογών με απλό τρόπο. Επιπλέον, σε πολύπλοκες εφαρμογές, οι υπάρχουσες λύσεις χαρακτηρίζονται από μια περιορισμένη οπτική στην ενσωμάτωση των επιταχυντών σε ένα ρεαλιστικό σύστημα, όπως η επικοινωνία σε επίπεδο συστήματος και οι πρόσθετοι χρόνοι κλήσης των επιταχυντών. Στο τρέχον διδακτορικό, η κύρια συνεισφορά βασίζεται στην παροχή αποτελεσματικών λύσεων μέσω της στρατηγικής εξερεύνησης του χώρου σχεδιασμού και της συνέργιας βελτιστοποιήσεων του κώδικα τόσο σε επίπεδο υλικού όσο και λογισμικού. Η πρώτη εφαρμογή που εξετάζεται σε αυτή τη διατριβή είναι η αποδοτική επιτάχυνση υλικού των ταξινομητών Support Vector Machine (SVM). Σε αυτήν τη διατριβή, εξετάζουμε μια εφαρμογή στην οποία οι επιταχυντές υλικού SVM εκτελούν ταξινόμηση για την ανίχνευση αρρυθμιών σήματος ECG. Η προτεινόμενη μεθοδολογία για την επιτάχυνση του SVM έχει υλοποιηθεί χρησιμοποιώντας το εργαλείο Vivado High-Level Synthesis (HLS). Προτείνουμε μια συστηματική προσέγγιση δύο επιπέδων για την επιτάχυνση του SVM, η οποία πρώτα βελτιστοποιεί τη γενική δομή της αρχικής περιγραφής συμπεριφοράς του SVM για να βοηθήσει το εργαλείο να αναγνωρίσει τον εγγενμή παραλληλισμό σε επίπεδο δεδομένων και εντολών του αλγορίθμου. Το δεύτερο επίπεδο βελτιστοποίησης βελτιώνει επιπρόσθετα το σχεδιασμό μέσω μιας στρατηγικής εξερεύνησης του χώρου σχεδιασμού που σχεδιάζει τη μνήμη του επιταχυντή βάσει των μοτίβων υπολογισμού και πρόσβασης στη μνήμη του. Στο δεύτερο μέρος της διδακτορικής εργασίας, μελετάμε την επίδραση των τεχνικών επιτάχυνσης σε ένα από τα πιο υπολογιστικά απαιτητικά κομμάτια της επεξεργασίας γονιδιώματος, που είναι η ευθυγράμμιση ακολουθιών DNA στο ανθρώπινο γονιδίωμα. Εκτελούμε ανάλυση της απόδοσης ενός εργαλείου αλληλούχισης (το Bowtie2) και εντοπίζουμε τον αλγόριθμο Smith-Waterman ως το πιο χρονοβόρο κομμάτι. Η προσέγγισή μας είναι να παρέχουμε μια υλοποίηση ροής δεδομένων που στοχεύει συσκευές FPGA λαμβάνοντας υπόψη τις συνέπειες της ενσωμάτωσης του επιταχυντή στο εργαλείο αλληλούχισης και επομένως σε ένα πραγματικό σύστημα. Προτείνουμε το GANDAFL, μια νέα αρχιτεκτονική ροής δεδομένων ευθυγράμμισης γονιδιώματος για τον Smith-Waterman για την εκτέλεση ευθυγράμμισης υψηλής απόδοσης σε δεδομένα αλληλουχίας επόμενης γενιάς. Στη συνέχεια, προτείνουμε μια ριζική αναδιάρθρωση του κώδικα του Bowtie2 η οποία ομαδοποιεί πολλά μεμονωμένα αιτήματα αλληλούχισης και τα τροφοδοτεί στον επιταχυντή με υψηλής ρυθμό απόδοσης ελαχιστοποιώντας έξοδα μεταφοράς και κλήσεων. Ο επιταχυντής προσφέρει έως και 116 και 2 φορές επιτάχυνση αντίστοιχα σε σύγκριση με πρόσφατους επιταχυντές λογισμικού και υλικού, αντίστοιχα, και η βελτιωμένη με GANDAFL ευθυγράμμιση Bowtie2 προσφέρει επιτάχυνση 1,9 επί του συνολικού συστήματος. Τέλος εξετάζουμε μια εναλλακτική προσέγγιση, η οποία συνδυάζει μια ευριστική υλοποίηση του Smith-Waterman και ένα στάδιο φιλτραρίσματος των αρχικών δεδομένων. Μελέτη των δεδομένων εισόδου υποδεικνύει ότι η αλληλούχιση συνήθως είναι ακριβής και εντοπίζεται μικρός αριθμός διαφοροποιήσεων από το ανθρώπινο γονιδίωμα. Αυτό μειώνει το χώρο αναζήτησης των λύσεων και μας επιτρέπει να χρησιμοποιήσουμε τον ευριστικό Banded Smith Waterman ο οποίος επιτελεί την ίδια λειτουργία, εντοπίζει λιγότερες διαφοροποιήσεις και καταναλώνει λιγότερους πόρους στο υλικό. Προτείνουμε λοιπόν ένα σύστημα που πλέον αποτελείται από πολλούς επιταχυντές και καλύπτει έως έναν αριθμό διαφοροποιήσεων ενώ εντοπίζει πλέον τις αλληλουχίσεις με ταχύτερο ρυθμό. Το προτεινόμενο σύστημα αποδίδει επιτάχυνση έως 34 φορές σε σχέση με λογισμικά ενώ είναι έως 3 φορές ταχύτερο από σύγχρονους επιταχυντές.	el
heal.advisorName	Σιούντρης, Δημήτριος	el
heal.advisorName	Soudris, Dimitrios	en
heal.committeeMemberName	Pekmestzi, Kiamal
heal.committeeMemberName	Gaydadjiev, Georgi
heal.committeeMemberName	Xydis, Sotirios
heal.committeeMemberName	Pnevmatikatos, Dionisios
heal.committeeMemberName	Alexopoulos, Leonidas
heal.committeeMemberName	Mutlu, Onur
heal.committeeMemberName	Soudris, Dimitrios
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	237
heal.fullTextAvailability	false