HEAL DSpace

Μοντελοποίηση εφαρμογών και τελεστών μεγάλων δεδομένων σε περιβάλλοντα υπολογιστικών νεφών

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Giannakopoulos, Ioannis K. en
dc.contributor.author Γιαννακόπουλος, Ιωάννης Κ. el
dc.date.accessioned 2020-07-22T10:07:40Z
dc.date.available 2020-07-22T10:07:40Z
dc.date.issued 2020-07-22
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/50906
dc.identifier.uri http://dx.doi.org/10.26240/heal.ntua.18604
dc.description.abstract The Big Data revolution has created new requirements for the design of applications and operators that are able to handle the volume of the data sources. The adoption of distributed architectures and the increasing popularity of the Cloud paradigm has complexed their structure, making the problem of modeling their behavior increasingly difficulty. Moreover, the wide variety of the existing datasets have complicated the problem of selecting the appropriate inputs for a given operator, since the examination of the data utility for a given workflow is a largely manual process that requires exhaustive execution for the entirety of the available datasets. This thesis attempts to model the behavior of an arbitrary Big Data operator from two different viewpoints. First, we wish to model the operator’s performance when deployed under different resource configurations. To this end, we present an adaptive performance modeling methodology that relies on recursively partitioning the configuration space in disjoint regions, distributing a predefined number of samples to each region based on different region characteristics (i.e., size, modeling error) and deploying the given operator for the selected samples. The performance is, then, approximated for the entire space using a combination of linear models for each subregion. Intuitively, this approach attempts to compromise the contradicting aspects of exploring the configuration space and exploiting the obtained knowledge through focusing on areas with higher approximation error. Second and in order to accelerate data analysis, we wish to model the operator’s output when deployed over different datasets. Based on the observation that similar datasets tend to affect the operators that are applied to them similarly, we propose a content-based methodology that models the output of a provided operator for all datasets. Our approach measures the similarity between the different datasets in the light of some fundamental properties commonly used in xvii xviii List of Tables data analysis tasks, i.e., the statistical distribution, the dataset size and the tuple ordering. These similarities are, next, projected to a low dimensional metric space that is utilized as an input domain by Neural Networks in order to approximate the operator’s output for all datasets, given the actual operator output for a mere subset of them. Our evaluation, conducted using several real-world operators applied for real and synthetic datasets, indicated that the introduced methodologies manage to accurately model the operator’s behavior from both angles. The adoption of a divide-and-conquer approach that equally respects space exploration and knowledge exploitation for the performance modeling part, proved to be the main reason that our scheme outperforms other state-of-the-art methodologies. On the same time, the construction of a low dimensional dataset metric space for the second part, proved to be particularly informative in order to allow Machine Learning models to approximate operator output for a wide variety of operators with diverse characteristics. en
dc.rights Αναφορά Δημιουργού 3.0 Ελλάδα *
dc.rights.uri http://creativecommons.org/licenses/by/3.0/gr/ *
dc.subject Modeling en
dc.subject Big data en
dc.subject Cloud computing en
dc.subject Machine learning en
dc.subject Performance el
dc.title Μοντελοποίηση εφαρμογών και τελεστών μεγάλων δεδομένων σε περιβάλλοντα υπολογιστικών νεφών el
dc.title Modeling Big Data Applications and Operators in Cloud environments en
dc.contributor.department Computing Systems Laboratory en
heal.type doctoralThesis
heal.classification Computer Engineering en
heal.classification Distributed Systems en
heal.classification Cloud Computing en
heal.language el
heal.language en
heal.access free
heal.recordProvider ntua el
heal.publicationDate 2018-11-26
heal.advisorName Koziris, Nectarios en
heal.committeeMemberName Koziris, Nectarios en
heal.committeeMemberName Tsanakas, Panagiotis en
heal.committeeMemberName Tsoumakos, Dimitrios en
heal.committeeMemberName Pallis, Georgios en
heal.committeeMemberName Kotidis, Yannis en
heal.committeeMemberName Rousopoulou, Dimitra en
heal.committeeMemberName Pitoura, Evangelia en
heal.academicPublisher Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών el
heal.academicPublisherID ntua
heal.numberOfPages 182 σ.
heal.fullTextAvailability true


Αρχεία σε αυτό το τεκμήριο

Οι παρακάτω άδειες σχετίζονται με αυτό το τεκμήριο:

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού 3.0 Ελλάδα Εκτός από όπου ορίζεται κάτι διαφορετικό, αυτή η άδεια περιγράφεται ως Αναφορά Δημιουργού 3.0 Ελλάδα