HEAL DSpace

Eliminating the redundancy in blocking-based entity resolution methods

Αποθετήριο DSpace/Manakin

Εμφάνιση απλής εγγραφής

dc.contributor.author Papadakis, G en
dc.contributor.author Ioannou, E en
dc.contributor.author Niederee, C en
dc.contributor.author Palpanas, T en
dc.contributor.author Nejdl, W en
dc.date.accessioned 2014-03-01T02:53:15Z
dc.date.available 2014-03-01T02:53:15Z
dc.date.issued 2011 en
dc.identifier.issn 15525996 en
dc.identifier.uri https://dspace.lib.ntua.gr/xmlui/handle/123456789/36192
dc.subject data cleaning en
dc.subject entity resolution en
dc.subject redundancy-based blocking en
dc.subject.other Abstract levels en
dc.subject.other Blocking method en
dc.subject.other Citation matching en
dc.subject.other Computational costs en
dc.subject.other Data cleaning en
dc.subject.other entity resolution en
dc.subject.other Heterogeneous data en
dc.subject.other Novel techniques en
dc.subject.other Optimal solutions en
dc.subject.other Real world data en
dc.subject.other Real-world objects en
dc.subject.other redundancy-based blocking en
dc.subject.other Resolution methods en
dc.subject.other Space complexity en
dc.subject.other Space limitations en
dc.subject.other Time efficiencies en
dc.subject.other Redundancy en
dc.subject.other Virtual reality en
dc.subject.other Digital libraries en
dc.title Eliminating the redundancy in blocking-based entity resolution methods en
heal.type conferenceItem en
heal.identifier.primary 10.1145/1998076.1998093 en
heal.identifier.secondary http://dx.doi.org/10.1145/1998076.1998093 en
heal.publicationDate 2011 en
heal.abstract Entity resolution is the task of identifying entities that refer to the same real-world object. It has important applications in the context of digital libraries, such as citation matching and author disambiguation. Blocking is an established methodology for efficiently addressing this problem; it clusters similar entities together, and compares solely entities inside each cluster. In order to effectively deal with the current large, noisy and heterogeneous data collections, novel blocking methods that rely on redundancy have been introduced: they associate each entity with multiple blocks in order to increase recall, thus increasing the computational cost, as well. In this paper, we introduce novel techniques that remove the superfluous comparisons from any redundancy-based blocking method. They improve the time-efficiency of the latter without any impact on the end result. We present the optimal solution to this problem that discards all redundant comparisons at the cost of quadratic space complexity. For applications with space limitations, we also present an alternative, lightweight solution that operates at the abstract level of blocks in order to discard a significant part of the redundant comparisons. We evaluate our techniques on two large, real-world data sets and verify the significant improvements they convey when integrated into existing blocking methods. © 2011 ACM. en
heal.journalName Proceedings of the ACM/IEEE Joint Conference on Digital Libraries en
dc.identifier.doi 10.1145/1998076.1998093 en
dc.identifier.spage 85 en
dc.identifier.epage 94 en


Αρχεία σε αυτό το τεκμήριο

Αρχεία Μέγεθος Μορφότυπο Προβολή

Δεν υπάρχουν αρχεία που σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στην ακόλουθη συλλογή(ές)

Εμφάνιση απλής εγγραφής