HEAL DSpace

Optimizing joins in a map-reduce environment

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Afrati, FN en
dc.contributor.author Ullman, JD en
dc.date.accessioned 2014-03-01T02:46:54Z
dc.date.available 2014-03-01T02:46:54Z
dc.date.issued 2010 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/32931
dc.subject IT Value en
dc.subject Large Data en
dc.subject.other Dimension tables en
dc.subject.other Fixed numbers en
dc.subject.other Key attributes en
dc.subject.other MAP process en
dc.subject.other New approaches en
dc.subject.other Social Networks en
dc.subject.other Star join en
dc.subject.other Very large datum en
dc.subject.other Database systems en
dc.subject.other Optimization en
dc.subject.other Technology en
dc.title Optimizing joins in a map-reduce environment en
heal.type conferenceItem en
heal.identifier.primary 10.1145/1739041.1739056 en
heal.identifier.secondary http://dx.doi.org/10.1145/1739041.1739056 en
heal.publicationDate 2010 en
heal.abstract Implementations of map-reduce are being used to perform many operations on very large data. We examine strategies for joining several relations in the map-reduce environment. Our new approach begins by identifying the ""map-key,"" the set of attributes that identify the Reduce process to which a Map process must send a particular tuple. Each attribute of the map-key gets a ""share,"" which is the number of buckets into which its values are hashed, to form a component of the identifier of a Reduce process. Relations have their tuples replicated in limited fashion, the degree of replication depending on the shares for those map-key attributes that are missing from their schema. We study the problem of optimizing the shares, given a fixed number of Reduce processes. An algorithm for detecting and fixing problems where an attribute is ""mistakenly"" included in the map-key is given. Then, we consider two important special cases: chain joins and star joins. In each case we are able to determine the map-key and determine the shares that yield the least replication. While the method we propose is not always superior to the conventional way of using map-reduce to implement joins, there are some important cases involving large-scale data where our method wins, including: (1) analytic queries in which a very large fact table is joined with smaller dimension tables, and (2) queries involving paths through graphs with high out-degree, such as the Web or a social network. Copyright 2010 ACM. en
heal.journalName Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings en
dc.identifier.doi 10.1145/1739041.1739056 en
dc.identifier.spage 99 en
dc.identifier.epage 110 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής