HEAL DSpace

Alleviating Data Scarcity In Industrial Machine Learning Applications

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Θεοδωρόπουλος, Σπύρος-Χριστόφορος
dc.date.accessioned 2025-09-22T09:40:47Z
dc.date.available 2025-09-22T09:40:47Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/62510
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.30206
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ *
dc.subject Έλλειψη Δεδομένων el
dc.subject Υπερδειγματοληψία el
dc.subject Βιομηχανικός Οπτικός Έλεγχος Ποιότητας el
dc.subject Βαθιά Μάθηση el
dc.subject Αναγνώριση Καινοφανών Εισόδων el
dc.subject Data Scarcity en
dc.subject Oversampling el
dc.subject Deep Learning el
dc.subject Open-set Recognition el
dc.subject Industrial Visual Quality Inspection el
dc.title Alleviating Data Scarcity In Industrial Machine Learning Applications en
dc.contributor.department Τεχνολογίας Πληροφορικής και Υπολογιστών el
heal.type doctoralThesis
heal.classification Τεχνητή Νοημοσύνη el
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2025-06-05
heal.abstract Visual defect recognition and its manufacturing applications have been an upcoming topic in recent AI research as an integral part of the manufacturing process that is becoming increasingly automated with the advent of Industry 4.0 and Industry 5.0. While being a very beneficial solution to this problem, AI-driven Computer Vision Algorithms and Deep Neural Networks face several issues that may impede their adoption in practical real-life settings such as a manufacturing shop floor. For instance, defect datasets are often severely imbalanced and can be additionally burdened with separating classes of high visual similarity. Another issue arising during an AI classifier's continuous operation is the frequent lack of robustness to novel defects appearing for the first time. The aim of this thesis is to deal with such challenges by providing augmentations to AI solutions, either on the data or the model level, addressing real-life and benchmark scenarios from the domain of manufacturing. The initial focus is Imbalanced Learning. Although various methods of data augmentation have been proposed to mitigate class imbalances, they often fail to cope with tinier minority classes or have fidelity issues with smaller defects while, at the same time, needing significant computational resources to train. Also, augmentation based on vector-based oversampling struggles to produce high-fidelity inputs and is hard to apply on custom CNN architectures, which often perform better for this type of problem. Our work presents an image-level oversampling method based on an instance-based image generator that can be applied to any CNN directly during the training process without increasing the order of training time required. It is based on identifying a small number of the most uncertain base samples close to the estimated class boundaries and using them as seeds for augmentation. The resulting images are of high visual quality preserving small class differences, and they also improve the classifier boundary leading to higher recall scores than other state-of-the-art approaches. Aside from class imbalance, lack of real-world data as well as the strict safety constrains that need to be imposed to manufacturing AI deployments dictate the need for handling novel inputs. Such unanticipated inputs can pose a significant risk to cyber-physical applications as a resulting out-of-context decision could compromise the integrity of the production process. While recent Machine Learning methods can theoretically tackle this problem from different angles (e.g., open-set recognition, semi-supervised learning, intelligent data augmentation), applying them to a real-life setting with a small, imbalanced dataset and high inter-class similarity can be challenging. This work confronts such a use case aiming at the automation of the visual quality inspection of shaver shell brand prints from the electronics industry, which is characterized by data scarcity and the existence of small local defects. To that end, we introduce a novel data augmentation approach based on the latent space manipulation of StyleGAN, where defect data is intentionally synthesized to simulate novel inputs that can help form a boundary of the model's knowledge. Our approach shows promising results compared to well-established open-set recognition and semi-supervised methods applied to the same problem, while its consistent performance across classifier embeddings indicates lower coupling to the final classifier. The above mentioned method still requires enough data to train a GAN, which might not always be possible or cost-effective. Collecting more and more defect data is also often not a solution as defects occur rarely in production and the ramp-up time of the AI-driven quality inspector becomes significantly slower. To cope with smaller datasets we apply an innovative approach based on Neurosymbolic AI. Specifically, we use a Logic Tensor Network that expresses the outputs of an unsupervised out-of-distribution detector as symbolic rules and uses them to drive the training of a neural network classifier. The resulting algorithm shows improved results in comparison to other related methods, especially in terms of defect recall, meaning that few defects remain undetected even if completely novel. More specifically, it achieves similar or better recall scores than semi-supervised and unsupervised methods when handling novel defects, but significantly outperforms them in defects that were seen during training. Similarly, when compared to supervised methods, it maintains high performance on known defects but significantly improves on novel ones. These best-of-both-worlds results are illustrated through higher F1-scores in the majority of the test datasets of manufacturing products. en
heal.advisorName Τσανάκας, Παναγιώτης
heal.committeeMemberName Κυριαζής, Δημοσθένης
heal.committeeMemberName Αμδίτης, Άγγελος
heal.committeeMemberName Σταφυλοπάτης, Ανδρέας-Γεώργιος
heal.committeeMemberName Βουλόδημος, Αθανάσιος
heal.committeeMemberName Μαρινάκης, Ευάγγελος
heal.committeeMemberName Ξύδης, Σωτήριος
heal.academicPublisher Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών el
heal.academicPublisherID ntua
heal.numberOfPages 144
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα