HEAL DSpace

Distributed indexing of web scale datasets for the cloud

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Konstantinou, I en
dc.contributor.author Angelou, E en
dc.contributor.author Tsoumakos, D en
dc.contributor.author Koziris, N en
dc.date.accessioned 2014-03-01T02:46:46Z
dc.date.available 2014-03-01T02:46:46Z
dc.date.issued 2010 en
dc.identifier.uri http://hdl.handle.net/123456789/32831
dc.subject cloud computing en
dc.subject Hadoop en
dc.subject HBase en
dc.subject MapReduce en
dc.subject NoSQL en
dc.subject.other Cloud computing en
dc.subject.other Cluster prototype en
dc.subject.other Data sets en
dc.subject.other Distributed architecture en
dc.subject.other Indexing systems en
dc.subject.other Open sources en
dc.subject.other Response time en
dc.subject.other Semi-structured en
dc.subject.other Text-indexing en
dc.subject.other Unstructured data en
dc.subject.other Distributed computer systems en
dc.subject.other World Wide Web en
dc.subject.other Indexing (of information) en
dc.title Distributed indexing of web scale datasets for the cloud en
heal.type conferenceItem en
heal.identifier.primary 10.1145/1779599.1779600 en
heal.identifier.secondary 1779600 en
heal.identifier.secondary http://dx.doi.org/10.1145/1779599.1779600 en
heal.publicationDate 2010 en
heal.abstract In this paper, we present a distributed architecture for indexing and serving large and diverse datasets. It incorporates and extends the functionality of Hadoop, the open source MapReduce framework, and of HBase, a distributed, sparse, NoSQL database, to create a fully parallel indexing system. Experiments with structured, semi-structured and unstructured data of various sizes demonstrate the flexibility, speed and robustness of our implementation and contrast it with similarly oriented projects. Our 11 node cluster prototype managed to keep full-text indexing time of 150GB raw content in less than 3 hours, whereas the system's response time under sustained query load of more than 1000 queries/sec was kept in the order of milliseconds. © 2010 ACM. en
heal.journalName ACM International Conference Proceeding Series en
dc.identifier.doi 10.1145/1779599.1779600 en


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record