HEAL DSpace

Multi-objective query optimization for massively parallel processing in Cloud Computing

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Γεωργουλάκης Μισεγιάννης, Μιχαήλ el
dc.contributor.author Georgoulakis Misegiannis, Michail en
dc.date.accessioned 2022-04-19T11:19:53Z
dc.date.available 2022-04-19T11:19:53Z
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/55115
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.22813
dc.rights Default License
dc.subject Query optimization en
dc.subject Cloud computing en
dc.subject multi-objective optimization en
dc.subject Apache spark en
dc.subject Massively parallel processing en
dc.subject Βελτιστοποίηση ερωτημάτων el
dc.subject Υπολογιστικό νέφος el
dc.subject Βελτιστοποίηση με πολλαπλά κριτήρια el
dc.subject Περιβάλλον παράλληλης επεξεργασίας el
dc.subject Μοντέλο κοστολόγησης el
dc.title Multi-objective query optimization for massively parallel processing in Cloud Computing en
heal.type bachelorThesis
heal.classification Βάσεις Δεδομένων el
heal.classification Databases en
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2021-11-08
heal.abstract Data processing has become a hot topic lately, as large volumes of data that need to be analyzed are produced every minute. The transition to the big data era was made easier with the commercial rise of cloud computing, and the use of massively parallel processing frameworks like Apache Spark for its processing in a parallel and distributed manner. Query optimization is a traditional DBMS optimization problem, where the query optimizer selects the optimal way to execute a query. Cloud computing features like its pricing policy led us to tackle query optimization in cloud environments as a multi-objective optimization problem, considering the objectives of execution time and monetary cost. In this thesis, we propose a baseline query optimizer system architecture for efficient and multi-objective query optimization in a cloud-like environment. Components of this system are implemented, and it is used as a basis in our experiments. Working with Apache Spark allows us to benefit from parallel processing and gain useful insights about processing big data in a distributed, cloud-like environment. However, trying to solve multi-objective query optimization problems using Spark comes with a significant limitation, as the optimizer of Spark SQL, Catalyst, is mostly based on heuristics and not cost based estimations. As a result, it is difficult to consider alternative query plans to compare and apply query optimization techniques that have been successfully used in relational databases. To overcome this limitation, we reimplemented a state of the art cost model for Spark SQL from scratch to provide theoretical estimations for the costs of alternative query execution plans. Its accuracy is evaluated with large scale experiments, and an additional formula is presented and integrated into the cost model that gives an estimation for the monetary cost of a query plan in Amazon EC2, based on its execution time and computing resources used. The cost model and the formula allow us to provide solutions for multi-objective query optimization problems. After implementing a baseline query optimization system, we move to integrate a state of the art query optimization technique, multi-objective parametric query optimization in our contribution and observe its relevance, as it is an optimization technique evaluated in a relational database. In this technique, a query is modeled as a function of a set of parameters, which must be sensitive factors for the optimization objectives. en
heal.advisorName Καντερέ, Βασιλική el
heal.committeeMemberName Καντερέ, Βασιλική el
heal.committeeMemberName D'Orazio, Laurent en
heal.committeeMemberName Παπαβασιλείου, Συμεών el
heal.academicPublisher Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών el
heal.academicPublisherID ntua
heal.numberOfPages 145 σ. el
heal.fullTextAvailability false


Αρχεία σε αυτό το τεκμήριο

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής