Using k-nearest neighbor and feature selection as an improvement to hierarchical clustering

Mylonas, P; Wallace, M; Kollias, S

dc.contributor.author	Mylonas, P	en
dc.contributor.author	Wallace, M	en
dc.contributor.author	Kollias, S	en
dc.date.accessioned	2014-03-01T02:43:00Z
dc.date.available	2014-03-01T02:43:00Z
dc.date.issued	2004	en
dc.identifier.issn	0302-9743	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/31189
dc.subject	Curse of Dimensionality	en
dc.subject	Error Propagation	en
dc.subject	Feature Selection	en
dc.subject	Hierarchical Clustering	en
dc.subject	K Nearest Neighbor	en
dc.subject	Number of Clusters	en
dc.subject.classification	Computer Science, Theory & Methods	en
dc.subject.other	Algorithms	en
dc.subject.other	Database systems	en
dc.subject.other	Error analysis	en
dc.subject.other	Feature extraction	en
dc.subject.other	Problem solving	en
dc.subject.other	Set theory	en
dc.subject.other	Data sets	en
dc.subject.other	Feature scales	en
dc.subject.other	Hierarchical clustering	en
dc.subject.other	Partitioning clustering	en
dc.subject.other	Hierarchical systems	en
dc.title	Using k-nearest neighbor and feature selection as an improvement to hierarchical clustering	en
heal.type	conferenceItem	en
heal.identifier.primary	10.1007/978-3-540-24674-9_21	en
heal.identifier.secondary	http://dx.doi.org/10.1007/978-3-540-24674-9_21	en
heal.language	English	en
heal.publicationDate	2004	en
heal.abstract	Clustering of data is a difficult problem that is related to various fields and applications. Challenge is greater, as input space dimensions become larger and feature scales are different from each other. Hierarchical clustering methods are more flexible than their partitioning counterparts, as they do not need the number of clusters as input. Still, plain hierarchical clustering does not provide a satisfactory framework for extracting meaningful results in such cases. Major drawbacks have to be tackled, such as curse of dimensionality and initial error propagation, as well as complexity and data set size issues. In this paper we propose an unsupervised extension to hierarchical clustering in the means of feature selection, in order to overcome the first drawback, thus increasing the robustness of the whole algorithm. The results of the application of this clustering to a portion of dataset in question are then refined and extended to the whole dataset through a classification step, using k-nearest neighbor classification technique, in order to tackle the latter two problems. The performance of the proposed methodology is demonstrated through the application to a variety of well known publicly available data sets.	en
heal.publisher	SPRINGER-VERLAG BERLIN	en
heal.journalName	Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)	en
heal.bookName	LECTURE NOTES IN COMPUTER SCIENCE	en
dc.identifier.doi	10.1007/978-3-540-24674-9_21	en
dc.identifier.isi	ISI:000221610800021	en
dc.identifier.volume	3025	en
dc.identifier.spage	191	en
dc.identifier.epage	200	en