HEAL DSpace

Optimizing multiway joins in a map-reduce environment

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Afrati, FN en
dc.contributor.author Ullman, JD en
dc.date.accessioned 2014-03-01T01:36:36Z
dc.date.available 2014-03-01T01:36:36Z
dc.date.issued 2011 en
dc.identifier.issn 1041-4347 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/21351
dc.subject joins en
dc.subject Map-reduce en
dc.subject parallel computing en
dc.subject query optimization en
dc.subject.classification Computer Science, Artificial Intelligence en
dc.subject.classification Computer Science, Information Systems en
dc.subject.classification Engineering, Electrical & Electronic en
dc.subject.other Dimension tables en
dc.subject.other Fixed numbers en
dc.subject.other joins en
dc.subject.other MAP process en
dc.subject.other Map-reduce en
dc.subject.other Multi-way join en
dc.subject.other query optimization en
dc.subject.other Social Networks en
dc.subject.other Star join en
dc.subject.other Very large datum en
dc.subject.other Parallel architectures en
dc.subject.other User interfaces en
dc.subject.other Optimization en
dc.title Optimizing multiway joins in a map-reduce environment en
heal.type journalArticle en
heal.identifier.primary 10.1109/TKDE.2011.47 en
heal.identifier.secondary http://dx.doi.org/10.1109/TKDE.2011.47 en
heal.identifier.secondary 5710932 en
heal.language English en
heal.publicationDate 2011 en
heal.abstract Implementations of map-reduce are being used to perform many operations on very large data. We examine strategies for joining several relations in the map-reduce environment. Our new approach begins by identifying the map-key, the set of attributes that identify the Reduce process to which a Map process must send a particular tuple. Each attribute of the map-key gets a share, which is the number of buckets into which its values are hashed, to form a component of the identifier of a Reduce process. Relations have their tuples replicated in limited fashion, the degree of replication depending on the shares for those map-key attributes that are missing from their schema. We study the problem of optimizing the shares, given a fixed number of Reduce processes. An algorithm for detecting and fixing problems where a variable is mistakenly included in the map-key is given. Then, we consider two important special cases: chain joins and star joins. In each case, we are able to determine the map-key and determine the shares that yield the least replication. While the method we propose is not always superior to the conventional way of using map-reduce to implement joins, there are some important cases involving large-scale data where our method wins, including: 1) analytic queries in which a very large fact table is joined with smaller dimension tables, and 2) queries involving paths through graphs with high out-degree, such as the Web or a social network. © 2006 IEEE. en
heal.publisher IEEE COMPUTER SOC en
heal.journalName IEEE Transactions on Knowledge and Data Engineering en
dc.identifier.doi 10.1109/TKDE.2011.47 en
dc.identifier.isi ISI:000292888400002 en
dc.identifier.volume 23 en
dc.identifier.issue 9 en
dc.identifier.spage 1282 en
dc.identifier.epage 1298 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής