dc.contributor.author | Τερζάκη Παπαδοπούλου, Στυλιανή![]() |
el |
dc.contributor.author | Terzaki Papadopoulou, Styliani![]() |
en |
dc.date.accessioned | 2025-04-28T09:11:17Z | |
dc.date.available | 2025-04-28T09:11:17Z | |
dc.identifier.uri | https://dspace.lib.ntua.gr/xmlui/handle/123456789/61790 | |
dc.identifier.uri | http://dx.doi.org/10.26240/heal.ntua.29486 | |
dc.rights | Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/gr/ | * |
dc.subject | Generative AI | en |
dc.subject | Diffusion Models | en |
dc.subject | Large Language Models | en |
dc.subject | Prompt Engineering | en |
dc.subject | Text-to-Image Synthesis | en |
dc.subject | Γεννητική Τεχνητή Νοημοσύνη | el |
dc.subject | Μοντέλα Διάχυσης | el |
dc.subject | Μεγάλα Γλωσσικά Μοντέλα | el |
dc.subject | Μηχανική Προτροπής | el |
dc.subject | Σύνθεση Εικόνας από Κείμενο | el |
dc.title | Enhancing image generation with LLM-Based prompt optimization for diffusion models | en |
heal.type | masterThesis | |
heal.classification | Machine learning | en |
heal.language | el | |
heal.language | en | |
heal.access | free | |
heal.recordProvider | ntua | el |
heal.publicationDate | 2024-10-30 | |
heal.abstract | The rapid evolution of Generative AI has introduced transformative tools for creating high-quality synthetic data, with applications in art, entertainment, and commercial design. Among these tools, Diffusion Models have emerged as particularly effective for high-resolution image generation by reversing processes of noise addition. However, the quality of images produced by diffusion models, such as Stable Diffusion, is highly contingent on the prompts that guide these models. Ensuring precision and alignment with user intentions in image generation thus requires refined prompt engineering, an area where Large Language Models (LLMs) offer promising solutions. Leveraging LLMs for prompt optimization can enhance the coherence, detail, and overall aesthetic appeal of generated images, making prompt engineering a critical factor in generative AI. This thesis explores the integration of LLM-based prompt optimization with text-to-image Diffusion Models to achieve improved alignment, clarity, and aesthetic consistency in generated images. Our approach investigates multiple prompt engineering methodologies, including zero-shot learning through role prompting, which assigns specialized prompt-generation roles to the LLM; few-shot learning, where targeted prompt examples guide the model's output; and few-shot learning combined with negative prompts, a technique designed to exclude undesirable characteristics, thereby refining image quality further. Our experiments reveal that LLM-enhanced prompts improve both quantitative metrics, such as CLIP score, and qualitative factors, including visual coherence and realism. Compared to baseline prompts, optimized prompts generate images with enhanced semantic accuracy and superior aesthetic qualities. This study highlights the efficacy of LLMs in elevating the standard of generative model outputs, establishing optimized prompting as a versatile tool for AI-driven content creation in diverse fields, from digital art to marketing and beyond. | en |
heal.advisorName | Βουλόδημος, Αθανάσιος | el |
heal.committeeMemberName | Στάμου, Γεώργιος | el |
heal.committeeMemberName | Κόλλιας, Στέφανος | el |
heal.academicPublisher | Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών | el |
heal.academicPublisherID | ntua | |
heal.numberOfPages | 95 σ. | el |
heal.fullTextAvailability | false |
Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο: