Approximating a collection of frequent sets

Afrati, F; Gionis, A; Mannila, H

dc.contributor.author	Afrati, F	en
dc.contributor.author	Gionis, A	en
dc.contributor.author	Mannila, H	en
dc.date.accessioned	2014-03-01T02:42:30Z
dc.date.available	2014-03-01T02:42:30Z
dc.date.issued	2004	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/31022
dc.subject	Foundations of data mining	en
dc.subject	Mining frequent itemsets	en
dc.subject.other	Algorithms	en
dc.subject.other	Computational complexity	en
dc.subject.other	Data mining	en
dc.subject.other	Database systems	en
dc.subject.other	Polynomial approximation	en
dc.subject.other	Probability	en
dc.subject.other	Foundations of data mining	en
dc.subject.other	Frequent patterns	en
dc.subject.other	Mining frequent itemsets	en
dc.subject.other	Polynomial-time approximation algorithms	en
dc.subject.other	Set theory	en
dc.title	Approximating a collection of frequent sets	en
heal.type	conferenceItem	en
heal.identifier.primary	10.1145/1014052.1014057	en
heal.identifier.secondary	http://dx.doi.org/10.1145/1014052.1014057	en
heal.publicationDate	2004	en
heal.abstract	One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem. In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets ? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.	en
heal.journalName	KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining	en
dc.identifier.doi	10.1145/1014052.1014057	en
dc.identifier.spage	12	en
dc.identifier.epage	19	en