HEAL DSpace

Development and Evaluation with AI Tools & Devices: Google Edge TPU for General-Purpose Computing

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Σάκος, Χρόνης el
dc.contributor.author Sakos, Chronis en
dc.date.accessioned 2022-10-14T09:57:01Z
dc.date.available 2022-10-14T09:57:01Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/55937
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.23635
dc.rights Default License
dc.subject Τεχνητή Νοημοσύνη el
dc.subject Ενσωματωμένα Συστήματα el
dc.subject Συνελικτικά Νευρωνικά Δίκτυα el
dc.subject Υπολογισμοί γενικού σκοπού el
dc.subject Επιταχυντές el
dc.subject Artificial Intelligence en
dc.subject Embedded Systems en
dc.subject Edge TPU accelerator en
dc.subject Convolutional Neural Networks en
dc.subject GEMM en
dc.title Development and Evaluation with AI Tools & Devices: Google Edge TPU for General-Purpose Computing en
heal.type bachelorThesis
heal.classification Τεχνητή Νοημοσύνη el
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2022-06-30
heal.abstract One of the fastest growing ground-based areas of research is Artificial Intelligence (AI), which has revolutionized a variety of application domains. Modern artificial neural networks (ANNs) impose increased computational complexity, and as a result, general-purpose CPUs struggle to provide sufficient performance. For this reason, developers are forced to integrate AI into broader customer bases with smaller, more power efficient, AI microchips and accelerators. Anticipating this trend, Google provides the Tensor Processing Units (TPUs) to accelerate AI inference in data-centers and at the edge. In this thesis, targeting embedded AI, we focus on the Edge TPU. The Edge TPU is a small Application-Specific Integrated Circuit (ASIC) that delivers high performance in a small physical and power footprint, enabling the deployment of high accuracy AI at the edge. It is a dedicated hardware that enables the parallelization of certain computations in order to achieve faster inference of them. The Edge TPU processor is capable of performing 4 Trillion Operations Per Second (TOPS), using 0.5 Watt for each TOPS (2 TOPS per Watt). However, the architecture and the instructions of such an AI-specific accelerator imposes hardware challenges and limitations for non-AI workloads for general-purpose computing. In this thesis, our goal is to provide solutions to this challenge by proposing a custom methodology for building Edge TPU compatible networks for general-purpose calculations. Moreover, we propose a solution for overcoming the barrier of the 8-bit-only operations on the TPU by breaking N-bit algrebraic computations in 8-bit parts. In this way, we support both element-wise and matrix multiplications for larger bit-widths without significant decrease in performance. Initially, we perform benchmarking on the TPU to explore and evaluate its capabilities, including both pre-trained and custom networks. For our Ship Detection network we achieve 1000-2000 FPS with no significant accuracy loss. The experimental results reveal significant acceleration in comparison to the ARM A53 co-processor and other embedded devices. Overall, the Edge TPU provides remarkable speedup for medium- and large-sized CNNs and MLPs, as well as for custom models dominated by matrix multiplications. The matrix multiplication operations are improved up to 4x compared to the 8-bit quantized ARM execution and up to 7x for 32-bit floating point. Moreover, for classic Digital Signal Processing (DSP) operations, such as the Sobel Edge Detector and Image Binning, the Edge TPU provides up to 6x better performance than ARM A53. en
heal.advisorName Σούντρης, Δημήτριος el
heal.committeeMemberName Τσανάκας, Παναγιώτης el
heal.committeeMemberName Σιώζιος, Κωνσταντίνος el
heal.committeeMemberName Σούντρης, Δημήτριος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών. Εργαστήριο Μικροϋπολογιστών και Ψηφιακών Συστημάτων VLSI el
heal.academicPublisherID ntua
heal.numberOfPages 144 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής