Files in this item
Using metric space indexing for complete and efficient record linkage
Item metadata
dc.contributor.author | Akgün, Özgür | |
dc.contributor.author | Dearle, Alan | |
dc.contributor.author | Kirby, Graham Njal Cameron | |
dc.contributor.author | Christen, Peter | |
dc.contributor.editor | Phung, Dinh | |
dc.contributor.editor | Tseng, Vincent S. | |
dc.contributor.editor | Webb, Geoff | |
dc.contributor.editor | Ho, Bao | |
dc.contributor.editor | Ganji, Mohadeseh | |
dc.contributor.editor | Rashidi, Lida | |
dc.date.accessioned | 2018-07-10T12:30:05Z | |
dc.date.available | 2018-07-10T12:30:05Z | |
dc.date.issued | 2018 | |
dc.identifier | 252460914 | |
dc.identifier | 2bf3e6cc-02c5-4043-9b29-015862135919 | |
dc.identifier | 85049367618 | |
dc.identifier.citation | Akgün , Ö , Dearle , A , Kirby , G N C & Christen , P 2018 , Using metric space indexing for complete and efficient record linkage . in D Phung , V S Tseng , G Webb , B Ho , M Ganji & L Rashidi (eds) , Advances in Knowledge Discovery and Data Mining : 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol. 10939 LNCS , Springer , Cham , pp. 89-101 , 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) , Melbourne , Victoria , Australia , 3/06/18 . https://doi.org/10.1007/978-3-319-93040-4_8 | en |
dc.identifier.citation | conference | en |
dc.identifier.isbn | 9783319930398 | |
dc.identifier.isbn | 9783319930404 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.other | ORCID: /0000-0002-4422-0190/work/46569125 | |
dc.identifier.other | ORCID: /0000-0001-9519-938X/work/46569180 | |
dc.identifier.other | ORCID: /0000-0002-1157-2421/work/167496147 | |
dc.identifier.uri | https://hdl.handle.net/10023/15181 | |
dc.description.abstract | Record linkage is the process of identifying records that refer to the same real-world entities in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their similarity. Linkage is usually performed in a three-step process: first, groups of similar candidate records are identified using indexing, then pairs within the same group are compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as locality sensitive hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity, or they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing (MSI) to perform complete linkage, resulting in a parameter-free process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. An evaluation on real-world data from several domains shows that linkage using MSI can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process. | |
dc.format.extent | 13 | |
dc.format.extent | 430206 | |
dc.language.iso | eng | |
dc.publisher | Springer | |
dc.relation.ispartof | Advances in Knowledge Discovery and Data Mining | en |
dc.relation.ispartofseries | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | en |
dc.rights | © 2018, Springer. This work has been made available online in accordance with the publisher’s policies. This is the author created, accepted version manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at https://doi.org/10.1007/978-3-319-93040-4_8 | en |
dc.subject | Entity resolution | en |
dc.subject | Data matching | en |
dc.subject | Similarity search | en |
dc.subject | Blocking | en |
dc.subject | QA75 Electronic computers. Computer science | en |
dc.subject | ZA4050 Electronic information resources | en |
dc.subject | Theoretical Computer Science | en |
dc.subject | General Computer Science | en |
dc.subject | DAS | en |
dc.subject | BDC | en |
dc.subject | R2C | en |
dc.subject | ~DC~ | en |
dc.subject.lcc | QA75 | en |
dc.subject.lcc | ZA4050 | en |
dc.title | Using metric space indexing for complete and efficient record linkage | en |
dc.type | Conference item | en |
dc.contributor.sponsor | Economic & Social Research Council | en |
dc.contributor.sponsor | Economic & Social Research Council | en |
dc.contributor.sponsor | Scottish Funding Council | en |
dc.contributor.institution | University of St Andrews.School of Computer Science | en |
dc.contributor.institution | University of St Andrews.Office of the Principal | en |
dc.contributor.institution | University of St Andrews.Centre for Interdisciplinary Research in Computational Algebra | en |
dc.identifier.doi | 10.1007/978-3-319-93040-4_8 | |
dc.date.embargoedUntil | 2018-06-17 | |
dc.identifier.grantnumber | ES/L007487/1 | en |
dc.identifier.grantnumber | ES/K00574X/2 | en |
dc.identifier.grantnumber | en |
This item appears in the following Collection(s)
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.