HEAL DSpace

An automatic web wrapper for extracting information from web sources, using clustering techniques

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Papadakis, N en
dc.contributor.author Skoutas, D en
dc.contributor.author Raftopoulos, K en
dc.contributor.author Varvarigou, T en
dc.date.accessioned 2014-03-01T02:43:06Z
dc.date.available 2014-03-01T02:43:06Z
dc.date.issued 2005 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/31234
dc.subject Information Retrieval en
dc.subject Web Pages en
dc.subject.other Automated systems en
dc.subject.other Clustering techniques en
dc.subject.other Extracting information en
dc.subject.other Further utilization en
dc.subject.other Human intervention en
dc.subject.other In-buildings en
dc.subject.other Precision and recall en
dc.subject.other Semantic relationships en
dc.subject.other Semi-structured en
dc.subject.other Web page en
dc.subject.other Web sources en
dc.subject.other Web wrappers en
dc.subject.other Automation en
dc.subject.other Internet en
dc.subject.other World Wide Web en
dc.subject.other Semantic Web en
dc.title An automatic web wrapper for extracting information from web sources, using clustering techniques en
heal.type conferenceItem en
heal.identifier.primary 10.1109/SAINT.2005.12 en
heal.identifier.secondary http://dx.doi.org/10.1109/SAINT.2005.12 en
heal.identifier.secondary 1386093 en
heal.publicationDate 2005 en
heal.abstract We present a fully automated system (wrapper) for extracting information from semistructured web pages. The emerging need for such systems occurs due to the need for going beyond the concept of ""human browsing"" by automating the process of information retrieval enabling further utilization by targeted applications. The key idea in our novel system is to exploit the format of the information contained in the web pages discovering the underlying structure and finally map it to semantic relationships. In doing this we identify one section of the web page as the one containing the useful information and we proceed in extracting semantic tokens contained in this section by using clustering techniques and other tools of statistical origin. Our innovation consists in building a system that can operate without human intervention or training and yet achieving excellent extraction precision and recall. en
heal.journalName Proceedings - 2005 Symposium on Applications and the Internet, SAINT'2005 en
dc.identifier.doi 10.1109/SAINT.2005.12 en
dc.identifier.spage 24 en
dc.identifier.epage 30 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής