Adaptive-sampling algorithms for answering aggregation queries on Web sites

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Afrati, FN en
dc.contributor.author Lekeas, PV en
dc.contributor.author Li, C en
dc.date.accessioned 2014-03-01T01:27:50Z
dc.date.available 2014-03-01T01:27:50Z
dc.date.issued 2008 en
dc.identifier.issn 0169023X en
dc.identifier.uri http://hdl.handle.net/123456789/18598
dc.subject Adaptive sampling en
dc.subject Aggregation queries en
dc.subject Web site en
dc.subject.other Algorithms en
dc.subject.other Data mining en
dc.subject.other Database systems en
dc.subject.other Internet en
dc.subject.other Query processing en
dc.subject.other Statistical methods en
dc.subject.other Adaptive-sampling algorithms en
dc.subject.other Aggregation queries en
dc.subject.other Synthetic data sets en
dc.subject.other Websites en
dc.title Adaptive-sampling algorithms for answering aggregation queries on Web sites en
heal.type journalArticle en
heal.identifier.primary 10.1016/j.datak.2007.09.014 en
heal.identifier.secondary http://dx.doi.org/10.1016/j.datak.2007.09.014 en
heal.publicationDate 2008 en
heal.abstract Many Web sites publish their data in a hierarchical structure. For instance, Amazon.com organizes its pages on books as a hierarchy, in which each leaf node corresponds to a collection of pages of books in the same class (e.g., books on Data Mining). Users can easily browse this class by following a path from the root to the corresponding leaf node, such as ""Computers & Internet - Databases - Storage - Data Mining"". Business applications often require to submit aggregation queries on such data, such as ""finding the average price of books on Data Mining"". On the other hand, it is computationally expensive to compute the exact answer to such a query due to the large amount of data, its dynamicity, and limited Web-access resources. In this paper, we study how to answer such aggregation queries approximately with quality guarantees using sampling. We study how to use adaptive-sampling techniques that allocate the resources adaptively based on partial samples retrieved from different nodes in the hierarchy. Based on statistical methods, we study how to estimate the quality of the answer using the sample. Our experimental study using real and synthetic data sets validates the proposed techniques. © 2007 Elsevier B.V. All rights reserved. en
heal.journalName Data and Knowledge Engineering en
dc.identifier.doi 10.1016/j.datak.2007.09.014 en
dc.identifier.volume 64 en
dc.identifier.issue 2 en
dc.identifier.spage 462 en
dc.identifier.epage 490 en

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record