dc.contributor.author |
Γεωργουλάκης Μισεγιάννης, Μιχαήλ
|
el |
dc.contributor.author |
Georgoulakis Misegiannis, Michail
|
en |
dc.date.accessioned |
2022-04-19T11:19:53Z |
|
dc.date.available |
2022-04-19T11:19:53Z |
|
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/55115 |
|
dc.identifier.uri |
http://dx.doi.org/10.26240/heal.ntua.22813 |
|
dc.rights |
Default License |
|
dc.subject |
Query optimization |
en |
dc.subject |
Cloud computing |
en |
dc.subject |
multi-objective optimization |
en |
dc.subject |
Apache spark |
en |
dc.subject |
Massively parallel processing |
en |
dc.subject |
Βελτιστοποίηση ερωτημάτων |
el |
dc.subject |
Υπολογιστικό νέφος |
el |
dc.subject |
Βελτιστοποίηση με πολλαπλά κριτήρια |
el |
dc.subject |
Περιβάλλον παράλληλης επεξεργασίας |
el |
dc.subject |
Μοντέλο κοστολόγησης |
el |
dc.title |
Multi-objective query optimization for massively parallel processing in Cloud Computing |
en |
heal.type |
bachelorThesis |
|
heal.classification |
Βάσεις Δεδομένων |
el |
heal.classification |
Databases |
en |
heal.language |
en |
|
heal.access |
free |
|
heal.recordProvider |
ntua |
el |
heal.publicationDate |
2021-11-08 |
|
heal.abstract |
Data processing has become a hot topic lately, as large volumes of data that need to be analyzed are produced every minute. The transition to the big data era was made easier with the commercial rise of cloud computing, and the use of massively parallel processing frameworks like Apache Spark for its processing in a parallel and distributed manner. Query optimization is a traditional DBMS optimization problem, where the query optimizer selects the optimal way to execute a query. Cloud computing features like its pricing policy led us to tackle query optimization in cloud environments as a multi-objective optimization problem, considering the objectives of execution time and monetary cost.
In this thesis, we propose a baseline query optimizer system architecture for efficient and multi-objective query optimization in a cloud-like environment. Components of this system are implemented, and it is used as a basis in our experiments.
Working with Apache Spark allows us to benefit from parallel processing and gain useful insights about processing big data in a distributed, cloud-like environment. However, trying to solve multi-objective query optimization problems using Spark comes with a significant limitation, as the optimizer of Spark SQL, Catalyst, is mostly based on heuristics and not cost based estimations. As a result, it is difficult to consider alternative query plans to compare and apply query optimization techniques that have been successfully used in relational databases.
To overcome this limitation, we reimplemented a state of the art cost model for Spark SQL from scratch to provide theoretical estimations for the costs of alternative query execution plans. Its accuracy is evaluated with large scale experiments, and an additional formula is presented and integrated into the cost model that gives an estimation for the monetary cost of a query plan in Amazon EC2, based on its execution time and computing resources used. The cost model and the formula allow us to provide solutions for multi-objective query optimization problems.
After implementing a baseline query optimization system, we move to integrate a state of the art query optimization technique, multi-objective parametric query optimization in our contribution and observe its relevance, as it is an optimization technique evaluated in a relational database. In this technique, a query is modeled as a function of a set of parameters, which must be sensitive factors for the optimization objectives. |
en |
heal.advisorName |
Καντερέ, Βασιλική |
el |
heal.committeeMemberName |
Καντερέ, Βασιλική |
el |
heal.committeeMemberName |
D'Orazio, Laurent |
en |
heal.committeeMemberName |
Παπαβασιλείου, Συμεών |
el |
heal.academicPublisher |
Εθνικό Μετσόβιο Πολυτεχνείο. Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών. Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών |
el |
heal.academicPublisherID |
ntua |
|
heal.numberOfPages |
145 σ. |
el |
heal.fullTextAvailability |
false |
|