St Andrews Research Repository

St Andrews University Home
View Item 
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Using metric space indexing for complete and efficient record linkage

Thumbnail
View/Open
akgun2018pakdd_camera_ready.pdf (420.1Kb)
Date
2018
Author
Akgün, Özgür
Dearle, Alan
Kirby, Graham Njal Cameron
Christen, Peter
Funder
Economic & Social Research Council
Economic & Social Research Council
Scottish Funding Council
Grant ID
ES/L007487/1
ES/K00574X/2
Keywords
Entity resolution
Data matching
Similarity search
Blocking
QA75 Electronic computers. Computer science
ZA4050 Electronic information resources
Theoretical Computer Science
Computer Science(all)
DAS
BDC
R2C
~DC~
Metadata
Show full item record
Altmetrics Handle Statistics
Altmetrics DOI Statistics
Abstract
Record linkage is the process of identifying records that refer to the same real-world entities in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their similarity. Linkage is usually performed in a three-step process: first, groups of similar candidate records are identified using indexing, then pairs within the same group are compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as locality sensitive hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity, or they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing (MSI) to perform complete linkage, resulting in a parameter-free process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. An evaluation on real-world data from several domains shows that linkage using MSI can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process.
Citation
Akgün , Ö , Dearle , A , Kirby , G N C & Christen , P 2018 , Using metric space indexing for complete and efficient record linkage . in D Phung , V S Tseng , G Webb , B Ho , M Ganji & L Rashidi (eds) , Advances in Knowledge Discovery and Data Mining : 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol. 10939 LNCS , Springer , Cham , pp. 89-101 , 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) , Melbourne , Victoria , Australia , 3/06/18 . https://doi.org/10.1007/978-3-319-93040-4_8
 
conference
 
Publication
Advances in Knowledge Discovery and Data Mining
DOI
https://doi.org/10.1007/978-3-319-93040-4_8
ISSN
0302-9743
Type
Conference item
Rights
© 2018, Springer. This work has been made available online in accordance with the publisher’s policies. This is the author created, accepted version manuscript following peer review and may differ slightly from the final published version. The final published version of this work is available at https://doi.org/10.1007/978-3-319-93040-4_8
Collections
  • Centre for Interdisciplinary Research in Computational Algebra (CIRCA) Research
  • Computer Science Research
  • University of St Andrews Research
URI
http://hdl.handle.net/10023/15181

Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Advanced Search

Browse

All of RepositoryCommunities & CollectionsBy Issue DateNamesTitlesSubjectsClassificationTypeFunderThis CollectionBy Issue DateNamesTitlesSubjectsClassificationTypeFunder

My Account

Login

Open Access

To find out how you can benefit from open access to research, see our library web pages and Open Access blog. For open access help contact: openaccess@st-andrews.ac.uk.

Accessibility

Read our Accessibility statement.

How to submit research papers

The full text of research papers can be submitted to the repository via Pure, the University's research information system. For help see our guide: How to deposit in Pure.

Electronic thesis deposit

Help with deposit.

Repository help

For repository help contact: Digital-Repository@st-andrews.ac.uk.

Give Feedback

Cookie policy

This site may use cookies. Please see Terms and Conditions.

Usage statistics

COUNTER-compliant statistics on downloads from the repository are available from the IRUS-UK Service. Contact us for information.

© University of St Andrews Library

University of St Andrews is a charity registered in Scotland, No SC013532.

  • Facebook
  • Twitter