Artículo / Article
An alignment comparator for entity resolution with multi-valued attributes
Fecha
2014-11-22Registro en:
03029743
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
8857
272
284
Autor
Mazzucchi-Augel, Pablo N
Ceballos Cancino, Héctor Gibrán
Institución
Resumen
Entity matching is a problem that concerns many data management processes. If we consider matching between entities represented by RDF individuals we might find attributes values lists with variable-length for some properties, which will lead us to the problem of comparing multi-valued attributes, e.g. comparing author names lists for determining publication matching. This matching technique would be more complex than comparing fixed-length records, but less complex than comparing XML documents. Instead of comparing a single string, representing the concatenation of these values, each value of one vector should be compared against all values of the other vector. We propose a set of heuristics to address the alignment and comparison process of multi-valued attributes and evaluate them in the context of bibliographic databases. Our first results show that it is possible to reduce the comparisons amount and provide an aggregated similarity metric that outperforms the average similarity of cross product comparisons.