HEAL DSpace

Exploring Kernel approximations for TinyML inference acceleration on microcontrollers

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Mentzos, Georgios en
dc.contributor.author Μέντζος, Γεώργιος el
dc.date.accessioned 2024-04-12T09:57:06Z
dc.date.available 2024-04-12T09:57:06Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/59156
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.26852
dc.rights Αναφορά Δημιουργού 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by/3.0/gr/ *
dc.subject Προσεγγιστικός Υπολογισμός el
dc.subject Μικροσκοπική Μηχανική Μάθηση el
dc.subject Συνελικτικό Νευρωνικό Δίκτυο el
dc.subject Μικροελεγκτές el
dc.subject Προσαρμοσμένη Σχεδίαση el
dc.subject Approximate Computing en
dc.subject Microcontrollers en
dc.subject TinyML en
dc.subject Convolutional Neural Network en
dc.subject Bespoke Design en
dc.title Exploring Kernel approximations for TinyML inference acceleration on microcontrollers en
heal.type bachelorThesis
heal.classification Computer Science and Engineering en
heal.classification Embedded Systems en
heal.language el
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2023-10-24
heal.abstract The rapid growth of always-on microcontroller-based IoT devices has opened up numerous applications, from smart manufacturing to personalized healthcare. Despite the widespread adoption of energy-efficient microcontroller units (MCUs) in the Tiny Maching Learning (TinyML) domain, they face significant limitations in terms of performance and memory (RAM, Flash), especially when considering deep networks for complex classification tasks. In this work, we combine approximate computing and software kernel design to accelerate the inference of approximate CNN models on MCUs. Our kernel-based approximation framework first unpacks the operands of each convolution layer and then performs an offline significance calculation for each operand. Subsequently, through a design space exploration, it employs a computation skipping approximation strategy based on the calculated significance, offering various trade-offs between reduced computations and classification accuracy. Our evaluation, conducted on an STM32-Nucleo board using three popular CNNs trained on the CIFAR-10 dataset, demonstrates that our Pareto optimal solutions can yield significant benefits. Compared to state-of-the-art exact inference methods, our approach achieves 9% reduction in latency with almost zero degradation in Top-1 accuracy loss (<1%) on MCUs with cache-enabled architecture. Furthermore, when targeting non-cached MCUs, the latency reduction is highly increased to 37%, again at the expense of less than 1% Top-1 accuracy loss. The various trade-offs explored in this thesis, hold the potential to enable more practical applications and the deployment of deeper networks on compact MCUs. en
heal.advisorName Σούντρης, Δημήτριος el
heal.committeeMemberName Σούντρης, Δημήτριος el
heal.committeeMemberName Τσανάκας, Παναγιώτης el
heal.committeeMemberName Ξύδης, Σωτήριος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών. Εργαστήριο Μικροϋπολογιστών και Ψηφιακών Συστημάτων VLSI el
heal.academicPublisherID ntua
heal.numberOfPages 92 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού 3.0 Ελλάδα