HEAL DSpace

Exploiting wikipedia knowledge for conceptual hierarchical clustering of documents

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Spanakis, G en
dc.contributor.author Siolas, G en
dc.contributor.author Stafylopatis, A en
dc.date.accessioned 2014-03-01T02:08:59Z
dc.date.available 2014-03-01T02:08:59Z
dc.date.issued 2012 en
dc.identifier.issn 00104620 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/29761
dc.subject conceptual clustering en
dc.subject document clustering en
dc.subject document representation en
dc.subject Wikipedia knowledge en
dc.subject.other Application programmer's interfaces en
dc.subject.other Bag-of-words models en
dc.subject.other Clustering process en
dc.subject.other Computational costs en
dc.subject.other Conceptual clustering en
dc.subject.other Document Clustering en
dc.subject.other Document Representation en
dc.subject.other F-measure en
dc.subject.other Hier-archical clustering en
dc.subject.other Link structure en
dc.subject.other Textual content en
dc.subject.other Wikipedia en
dc.subject.other Clustering algorithms en
dc.subject.other Knowledge representation en
dc.subject.other Semantics en
dc.subject.other Websites en
dc.title Exploiting wikipedia knowledge for conceptual hierarchical clustering of documents en
heal.type journalArticle en
heal.identifier.primary 10.1093/comjnl/bxr024 en
heal.identifier.secondary http://dx.doi.org/10.1093/comjnl/bxr024 en
heal.publicationDate 2012 en
heal.abstract In this paper, we propose a novel method for conceptual hierarchical clustering of documents using knowledge extracted from Wikipedia. The proposed method overcomes the classic bag-of-words models disadvantages through the exploitation of Wikipedia textual content and link structure. A robust and compact document representation is built in real-time using the Wikipedia application programmer's interface, without the need to store locally any Wikipedia information. The clustering process is hierarchical and extends the idea of frequent items by using Wikipedia article titles for selecting cluster labels that are descriptive and important for the examined corpus. Experiments show that the proposed technique greatly improves over the baseline approach, both in terms of F-measure and entropy on the one hand and computational cost on the other. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. en
heal.journalName Computer Journal en
dc.identifier.doi 10.1093/comjnl/bxr024 en
dc.identifier.volume 55 en
dc.identifier.issue 3 en
dc.identifier.spage 299 en
dc.identifier.epage 312 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής