HEAL DSpace

To compare or not to compare: Making entity resolution more efficient

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Papadakis, G en
dc.contributor.author Ioannou, E en
dc.contributor.author Niederee, C en
dc.contributor.author Palpanas, T en
dc.contributor.author Nejdl, W en
dc.date.accessioned 2014-03-01T02:53:30Z
dc.date.available 2014-03-01T02:53:30Z
dc.date.issued 2011 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/36371
dc.subject attribute-agnostic blocking en
dc.subject data cleaning en
dc.subject entity resolution en
dc.subject.other Approximate methods en
dc.subject.other attribute-agnostic blocking en
dc.subject.other Blocking method en
dc.subject.other Blocking technique en
dc.subject.other Data cleaning en
dc.subject.other entity resolution en
dc.subject.other Heterogeneous data en
dc.subject.other Its efficiencies en
dc.subject.other Real world data en
dc.subject.other Schema information en
dc.subject.other Second layer en
dc.subject.other User-generated content en
dc.subject.other Web 2.0 en
dc.subject.other Approximation theory en
dc.subject.other Information management en
dc.subject.other Semantics en
dc.subject.other User interfaces en
dc.subject.other Virtual reality en
dc.subject.other Efficiency en
dc.title To compare or not to compare: Making entity resolution more efficient en
heal.type conferenceItem en
heal.identifier.primary 10.1145/1999299.1999302 en
heal.identifier.secondary http://dx.doi.org/10.1145/1999299.1999302 en
heal.publicationDate 2011 en
heal.abstract Blocking methods are crucial for making the inherently quadratic task of Entity Resolution more efficient. The blocking methods proposed in the literature rely on the homogeneity of data and the availability of binding schema information; thus, they are inapplicable to the voluminous, noisy, and highly heterogeneous data of the Web 2.0 user-generated content. To deal with such data, attribute-agnostic blocking has been recently introduced, following a two-fold strategy: the first layer places entities into overlapping blocks in order to achieve high effectiveness, while the second layer reduces the number of unnecessary comparisons in order to enhance efficiency. In this paper, we present a set of techniques that can be plugged into the second strategy layer of attribute-agnostic blocking to further improve its efficiency. We introduce a technique that eliminates redundant comparisons, and, based on this, we incorporate an approximate method for pruning comparisons that are highly likely to involve non-matching entities. We also introduce a novel measure for quantifying the redundancy a blocking method entails and explain how it can be used to a-priori tune the process of comparisons pruning. We apply our blocking techniques on two large, real-world data sets and report results that demonstrate a substantial increase in efficiency at a negligible (if any) cost in effectiveness. © 2011 ACM. en
heal.journalName Proceedings of the International Workshop on Semantic Web Information Management, SWIM 2011 en
dc.identifier.doi 10.1145/1999299.1999302 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής