Exploring Kernel approximations for TinyML inference acceleration on microcontrollers

Mentzos, Georgios; Μέντζος, Γεώργιος

dc.contributor.author	Mentzos, Georgios	en
dc.contributor.author	Μέντζος, Γεώργιος	el
dc.date.accessioned	2024-04-12T09:57:06Z
dc.date.available	2024-04-12T09:57:06Z
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/59156
dc.identifier.uri	http://dx.doi.org/10.26240/heal.ntua.26852
dc.rights	Αναφορά Δημιουργού 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/gr/	*
dc.subject	Προσεγγιστικός Υπολογισμός	el
dc.subject	Μικροσκοπική Μηχανική Μάθηση	el
dc.subject	Συνελικτικό Νευρωνικό Δίκτυο	el
dc.subject	Μικροελεγκτές	el
dc.subject	Προσαρμοσμένη Σχεδίαση	el
dc.subject	Approximate Computing	en
dc.subject	Microcontrollers	en
dc.subject	TinyML	en
dc.subject	Convolutional Neural Network	en
dc.subject	Bespoke Design	en
dc.title	Exploring Kernel approximations for TinyML inference acceleration on microcontrollers	en
heal.type	bachelorThesis
heal.classification	Computer Science and Engineering	en
heal.classification	Embedded Systems	en
heal.language	el
heal.language	en
heal.access	free
heal.recordProvider	ntua	el
heal.publicationDate	2023-10-24
heal.abstract	The rapid growth of always-on microcontroller-based IoT devices has opened up numerous applications, from smart manufacturing to personalized healthcare. Despite the widespread adoption of energy-efficient microcontroller units (MCUs) in the Tiny Maching Learning (TinyML) domain, they face significant limitations in terms of performance and memory (RAM, Flash), especially when considering deep networks for complex classification tasks. In this work, we combine approximate computing and software kernel design to accelerate the inference of approximate CNN models on MCUs. Our kernel-based approximation framework first unpacks the operands of each convolution layer and then performs an offline significance calculation for each operand. Subsequently, through a design space exploration, it employs a computation skipping approximation strategy based on the calculated significance, offering various trade-offs between reduced computations and classification accuracy. Our evaluation, conducted on an STM32-Nucleo board using three popular CNNs trained on the CIFAR-10 dataset, demonstrates that our Pareto optimal solutions can yield significant benefits. Compared to state-of-the-art exact inference methods, our approach achieves 9% reduction in latency with almost zero degradation in Top-1 accuracy loss (<1%) on MCUs with cache-enabled architecture. Furthermore, when targeting non-cached MCUs, the latency reduction is highly increased to 37%, again at the expense of less than 1% Top-1 accuracy loss. The various trade-offs explored in this thesis, hold the potential to enable more practical applications and the deployment of deeper networks on compact MCUs.	en
heal.advisorName	Σούντρης, Δημήτριος	el
heal.committeeMemberName	Σούντρης, Δημήτριος	el
heal.committeeMemberName	Τσανάκας, Παναγιώτης	el
heal.committeeMemberName	Ξύδης, Σωτήριος	el
heal.academicPublisher	Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών. Εργαστήριο Μικροϋπολογιστών και Ψηφιακών Συστημάτων VLSI	el
heal.academicPublisherID	ntua
heal.numberOfPages	92 σ.	el
heal.fullTextAvailability	false