HEAL DSpace

Enhancing image generation with LLM-Based prompt optimization for diffusion models

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Τερζάκη Παπαδοπούλου, Στυλιανή el
dc.contributor.author Terzaki Papadopoulou, Styliani en
dc.date.accessioned 2025-04-28T09:11:17Z
dc.date.available 2025-04-28T09:11:17Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/61790
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.29486
dc.rights Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ *
dc.subject Generative AI en
dc.subject Diffusion Models en
dc.subject Large Language Models en
dc.subject Prompt Engineering en
dc.subject Text-to-Image Synthesis en
dc.subject Γεννητική Τεχνητή Νοημοσύνη el
dc.subject Μοντέλα Διάχυσης el
dc.subject Μεγάλα Γλωσσικά Μοντέλα el
dc.subject Μηχανική Προτροπής el
dc.subject Σύνθεση Εικόνας από Κείμενο el
dc.title Enhancing image generation with LLM-Based prompt optimization for diffusion models en
heal.type masterThesis
heal.classification Machine learning en
heal.language el
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2024-10-30
heal.abstract The rapid evolution of Generative AI has introduced transformative tools for creating high-quality synthetic data, with applications in art, entertainment, and commercial design. Among these tools, Diffusion Models have emerged as particularly effective for high-resolution image generation by reversing processes of noise addition. However, the quality of images produced by diffusion models, such as Stable Diffusion, is highly contingent on the prompts that guide these models. Ensuring precision and alignment with user intentions in image generation thus requires refined prompt engineering, an area where Large Language Models (LLMs) offer promising solutions. Leveraging LLMs for prompt optimization can enhance the coherence, detail, and overall aesthetic appeal of generated images, making prompt engineering a critical factor in generative AI. This thesis explores the integration of LLM-based prompt optimization with text-to-image Diffusion Models to achieve improved alignment, clarity, and aesthetic consistency in generated images. Our approach investigates multiple prompt engineering methodologies, including zero-shot learning through role prompting, which assigns specialized prompt-generation roles to the LLM; few-shot learning, where targeted prompt examples guide the model's output; and few-shot learning combined with negative prompts, a technique designed to exclude undesirable characteristics, thereby refining image quality further. Our experiments reveal that LLM-enhanced prompts improve both quantitative metrics, such as CLIP score, and qualitative factors, including visual coherence and realism. Compared to baseline prompts, optimized prompts generate images with enhanced semantic accuracy and superior aesthetic qualities. This study highlights the efficacy of LLMs in elevating the standard of generative model outputs, establishing optimized prompting as a versatile tool for AI-driven content creation in diverse fields, from digital art to marketing and beyond. en
heal.advisorName Βουλόδημος, Αθανάσιος el
heal.committeeMemberName Στάμου, Γεώργιος el
heal.committeeMemberName Κόλλιας, Στέφανος el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών el
heal.academicPublisherID ntua
heal.numberOfPages 95 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα