Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorAkgün, Özgür
dc.contributor.authorDearle, Alan
dc.contributor.authorKirby, Graham Njal Cameron
dc.contributor.authorChristen, Peter
dc.contributor.editorPhung, Dinh
dc.contributor.editorTseng, Vincent S.
dc.contributor.editorWebb, Geoff
dc.contributor.editorHo, Bao
dc.contributor.editorGanji, Mohadeseh
dc.contributor.editorRashidi, Lida
dc.date.accessioned2018-07-10T12:30:05Z
dc.date.available2018-07-10T12:30:05Z
dc.date.issued2018
dc.identifier252460914
dc.identifier2bf3e6cc-02c5-4043-9b29-015862135919
dc.identifier85049367618
dc.identifier.citationAkgün , Ö , Dearle , A , Kirby , G N C & Christen , P 2018 , Using metric space indexing for complete and efficient record linkage . in D Phung , V S Tseng , G Webb , B Ho , M Ganji & L Rashidi (eds) , Advances in Knowledge Discovery and Data Mining : 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol. 10939 LNCS , Springer , Cham , pp. 89-101 , 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) , Melbourne , Victoria , Australia , 3/06/18 . https://doi.org/10.1007/978-3-319-93040-4_8en
dc.identifier.citationconferenceen
dc.identifier.isbn9783319930398
dc.identifier.isbn9783319930404
dc.identifier.issn0302-9743
dc.identifier.otherORCID: /0000-0002-4422-0190/work/46569125
dc.identifier.otherORCID: /0000-0001-9519-938X/work/46569180
dc.identifier.urihttps://hdl.handle.net/10023/15181
dc.description.abstractRecord linkage is the process of identifying records that refer to the same real-world entities in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their similarity. Linkage is usually performed in a three-step process: first, groups of similar candidate records are identified using indexing, then pairs within the same group are compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as locality sensitive hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity, or they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing (MSI) to perform complete linkage, resulting in a parameter-free process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. An evaluation on real-world data from several domains shows that linkage using MSI can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process.
dc.format.extent13
dc.format.extent430206
dc.language.isoeng
dc.publisherSpringer
dc.relation.ispartofAdvances in Knowledge Discovery and Data Miningen
dc.relation.ispartofseriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en
dc.subjectEntity resolutionen
dc.subjectData matchingen
dc.subjectSimilarity searchen
dc.subjectBlockingen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectZA4050 Electronic information resourcesen
dc.subjectTheoretical Computer Scienceen
dc.subjectComputer Science(all)en
dc.subjectDASen
dc.subjectBDCen
dc.subjectR2Cen
dc.subject~DC~en
dc.subject.lccQA75en
dc.subject.lccZA4050en
dc.titleUsing metric space indexing for complete and efficient record linkageen
dc.typeConference itemen
dc.contributor.sponsorEconomic & Social Research Councilen
dc.contributor.sponsorEconomic & Social Research Councilen
dc.contributor.sponsorScottish Funding Councilen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.contributor.institutionUniversity of St Andrews. Office of the Principalen
dc.contributor.institutionUniversity of St Andrews. Centre for Interdisciplinary Research in Computational Algebraen
dc.identifier.doi10.1007/978-3-319-93040-4_8
dc.date.embargoedUntil2018-06-17
dc.identifier.grantnumberES/L007487/1en
dc.identifier.grantnumberES/K00574X/2en
dc.identifier.grantnumberen


This item appears in the following Collection(s)

Show simple item record