Mapping, Characterization and Acceleration of Apache Spark Applications

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Σταμέλος, Ιωάννης el
dc.contributor.author Stamelos, Ioannis en
dc.date.accessioned 2017-10-11T10:00:43Z
dc.date.issued 2017-10-11
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/45733
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.14735
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα *
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ *
dc.subject Ανάλυση μεγάλων δεδομένων el
dc.subject Ενσωματωμένα συστήματα el
dc.subject Μηχανική εκμάθηση el
dc.subject Επιταχυντές υλικού el
dc.subject Κατανεμημένα συστήματα el
dc.subject Apache Spark en
dc.subject Embedded systems en
dc.subject FPGA accelerators en
dc.subject Machine learning en
dc.subject Big data analytics en
dc.title Mapping, Characterization and Acceleration of Apache Spark Applications en
heal.type bachelorThesis
heal.classification Big Data Analytics en
heal.dateAvailable 2018-10-10T21:00:00Z
heal.language el
heal.language en
heal.access campus
heal.recordProvider ntua el
heal.publicationDate 2017-07-04
heal.abstract Emerging web applications like big data analytics have significantly increased the workload on the data centers during the last years. In 2015, the total network traffic of the data centers was around 4.7 Exabytes and it is estimated that by the end of 2018 it will cross the 8.5 Exabytes mark. The growing demands both in performance and energy efficiency, have led companies into charting new paths for developing energy-efficient platforms for heterogeneous datacenters, therefore they recently started deploying FPGA accelerators and further offloading part of the workload to embedded processors (i.e. ARM processors) at a datacenter scale. For this reason we are going to first map Apache Spark, a widely used, fault-tolerant and general-purpose cluster computing framework on several embedded systems including Raspberry Pi 3, DragonBoard 410c and PYNQ-Z1. We present the whole procedure of mapping and deploying Spark on the embedded devices along with any necessary configurations. Subsequently, we are going to create a heterogeneous cluster consisting of four PYNQ-Z1 nodes and a typical Intel based one. Next on, we will go through all the necessary steps and configurations for deploying Spark on the implemented cluster. Then, a proposed framework for the seamless utilization of hardware accelerators for Spark applications will be presented, as well as a set of libraries to hide the accelerator's low-level details, simplifying in this way the incorporation of hardware accelerators in Spark. In the last part of the thesis, we are going to first explore the capabilities of the embedded platforms we used, by taking execution metrics using a set of typical machine learning and graph processing algorithms and further comparing the performance and energy efficiency of each system with a mainstream powerful server. Finally, the proposed framework is evaluated in a machine learning application for a use case scenario on logistic regression. The overall evaluation shows that in general the execution time on embedded systems is 6.2x to 13x higher compared to a typical datacenter server but the embedded platforms are 2x - 3.5x better in terms of energy efficiency. On the other hand, the proposed framework for the utilization of hardware accelerators in Spark shows that PYNQ's heterogeneous accelerator-based ZYNQ MPSoC, can achieve up to 2x system speedup compared to a Xeon system and 18x better energy-efficiency. Especially for embedded applications, the proposed framework can achieve up to 36x speedup compared to the software only implementation on low-power embedded processors (ARM processors) and 29x lower energy consumption. en
heal.advisorName Σούντρης, Δημήτριος el
heal.committeeMemberName Πεκμεστζή, Κιαμάλ el
heal.committeeMemberName Γκούμας, Γεώργιος el
heal.committeeMemberName Σούντρης, Δημήτριος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών. Εργαστήριο Μικροϋπολογιστών και Ψηφιακών Συστημάτων VLSI el
heal.academicPublisherID ntua
heal.numberOfPages 254 σ. en
heal.fullTextAvailability true

Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα Except where otherwise noted, this item's license is described as Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα