HEAL DSpace

Exploiting duality in summarization with deterministic guarantees

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Karras, P en
dc.contributor.author Mamoulis, N en
dc.contributor.author Sacharidis, D en
dc.date.accessioned 2014-03-01T02:51:05Z
dc.date.available 2014-03-01T02:51:05Z
dc.date.issued 2007 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/35358
dc.subject Efficiency en
dc.subject Histograms en
dc.subject Synopses en
dc.subject Wavelets en
dc.subject.other Abstracting en
dc.subject.other Computational complexity en
dc.subject.other Hierarchical systems en
dc.subject.other Information systems en
dc.subject.other Wavelet analysis en
dc.subject.other Histogram construction algorithms en
dc.subject.other Histograms en
dc.subject.other Space efficiency en
dc.subject.other Synopses en
dc.subject.other Data mining en
dc.title Exploiting duality in summarization with deterministic guarantees en
heal.type conferenceItem en
heal.identifier.primary 10.1145/1281192.1281235 en
heal.identifier.secondary http://dx.doi.org/10.1145/1281192.1281235 en
heal.publicationDate 2007 en
heal.abstract Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over log*factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over log*+ logn in time and B(1-log B over log n) in space, where *is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation. © 2007 ACM. en
heal.journalName Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining en
dc.identifier.doi 10.1145/1281192.1281235 en
dc.identifier.spage 380 en
dc.identifier.epage 389 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής