dc.contributor.author |
Karras, P |
en |
dc.contributor.author |
Mamoulis, N |
en |
dc.contributor.author |
Sacharidis, D |
en |
dc.date.accessioned |
2014-03-01T02:51:05Z |
|
dc.date.available |
2014-03-01T02:51:05Z |
|
dc.date.issued |
2007 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/35358 |
|
dc.subject |
Efficiency |
en |
dc.subject |
Histograms |
en |
dc.subject |
Synopses |
en |
dc.subject |
Wavelets |
en |
dc.subject.other |
Abstracting |
en |
dc.subject.other |
Computational complexity |
en |
dc.subject.other |
Hierarchical systems |
en |
dc.subject.other |
Information systems |
en |
dc.subject.other |
Wavelet analysis |
en |
dc.subject.other |
Histogram construction algorithms |
en |
dc.subject.other |
Histograms |
en |
dc.subject.other |
Space efficiency |
en |
dc.subject.other |
Synopses |
en |
dc.subject.other |
Data mining |
en |
dc.title |
Exploiting duality in summarization with deterministic guarantees |
en |
heal.type |
conferenceItem |
en |
heal.identifier.primary |
10.1145/1281192.1281235 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1145/1281192.1281235 |
en |
heal.publicationDate |
2007 |
en |
heal.abstract |
Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over log*factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over log*+ logn in time and B(1-log B over log n) in space, where *is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation. © 2007 ACM. |
en |
heal.journalName |
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
en |
dc.identifier.doi |
10.1145/1281192.1281235 |
en |
dc.identifier.spage |
380 |
en |
dc.identifier.epage |
389 |
en |