Exploiting Partial Reconfiguration of SoC FPGAs: A Hardware-Software Co-design for Accelerating Cryptographic Systems

Ιερωνυμάκης, Γεώργιος; Ieronymakis, Georgios

dc.contributor.author	Ιερωνυμάκης, Γεώργιος	el
dc.contributor.author	Ieronymakis, Georgios	en
dc.date.accessioned	2017-07-21T10:00:24Z
dc.date.available	2017-07-21T10:00:24Z
dc.date.issued	2017-07-21
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/45307
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.14062
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc/3.0/gr/	*
dc.subject	Ετερογενής υπολογιστική	el
dc.subject	Ανασυγκρότητος Υπολογισμός	el
dc.subject	Κρυπτογράφηση	el
dc.subject	Reconfigurable Computing	en
dc.subject	FPGA	en
dc.subject	Xilinx Zynq-7000	en
dc.subject	Partial Reconfiguration	en
dc.subject	Cryptography	en
dc.subject	AES	en
dc.subject	SHA3	en
dc.subject	Scatter/Gather DMA	en
dc.subject	Heterogeneous Computing	en
dc.subject	AXI4-Stream	en
dc.subject	HW/SW co-design	en
dc.subject	AMBA	en
dc.title	Exploiting Partial Reconfiguration of SoC FPGAs: A Hardware-Software Co-design for Accelerating Cryptographic Systems	en
dc.contributor.department	microlab	el
heal.type	bachelorThesis
heal.classification	Computer architecture	en
heal.language	el
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2017-03-24
heal.abstract	In recent years, the continued push to gain the best computing performance possible has led to the realization of Heterogeneous computing and Heterogeneous platforms. These systems gain performance and energy efficiency by adding dissimilar accelerators as co-processors with specialized processing capabilities, to handle specific intensive tasks. Field Programmable Gate Arrays (FPGAs) have gained the interest of system architects due to their rapid prototyping and fast accelerator developing capabilities. As their name denotes, FPGAs are programmable “in the field”, meaning that their internal logic can be configured after the fabrication process and modified, if needed, without going to re-fabrication process, as common ASICs. Partial Reconfiguration (PR) takes this flexibility one step further, by allowing an operating FPGA design to modify a part of itself, while the rest of the system continues to function normally, without compromising the integrity of the computation running on those parts of the device that are not being reconfigured. This technique leads to reduction of the amount of resources required to implement a given function, with consequent reductions in cost and power consumption, provides flexibility in the algorithms/protocols available to an application and accelerates computing by enabling a design to be ready to correspond to new computation requirements much faster. This thesis tried to explore the PR technology on FPGAs and apply the knowledge acquired to implement a cryptographic system on a Xilinx Zynq-7000 SoC device. Zynq combines the coexistence of programmable logic and an embedded ARM processor on a single chip, thus forming a system-on-a-chip (SoC), while enabling fast interconnection between them and power efficiency. For the purposes of this thesis we chose four cryptographic modules (AES128, AES192, AES256 and SHA3-512). Firstly, we made all the appropriate modifications needed to utilize the cryptographic modules in the SoC and designed the appropriate AXI4-Stream compliant interfaces to enable communication between the peripherals and the processor, with respective compromises to the different modules’ architecture, the processing system’s limitations and PR’s restrictions. Then, we established connection between the peripherals and the processing system through an AXI DMA IP in Scatter/Gather mode. Scatter/Gather resulted in a high-speed communication and applied interrupt coalescing strategy to reduce the number of interrupts occupying the ARM, thus it allowed the processor to handle the peripherals more efficiently. We also applied decoupling strategy to isolate the reconfigurable modules during PR to avoid undesirable outcoming signals to affect the rest of the design. Finally, we made an evaluation of our work and constructed a benchmark to show the acceleration advantages of PR. In this benchmark, the system could adapt to computation requirements and reconfigured idle peripherals with others that were needed, to distribute the computational load between them and so, to reduce the total computation time. As a result, we achieved almost full hardware utilization and approximated the optimal speedup.	en
heal.advisorName	Πεκμεστζή, Κιαμάλ	el
heal.committeeMemberName	Πεκμεστζή, Κιαμάλ	el
heal.committeeMemberName	Σούντρης, Δημήτριος	el
heal.committeeMemberName	Γκούμας, Γεώργιος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών	el
heal.academicPublisherID	ntua
heal.numberOfPages	110 σ.	el
heal.fullTextAvailability	true