St Andrews Research Repository

St Andrews University Home
View Item 
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  •   St Andrews Research Repository
  • University of St Andrews Research
  • University of St Andrews Research
  • University of St Andrews Research
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Defining disease phenotypes in primary care electronic health records by a machine learning approach : a case study in identifying rheumatoid arthritis

Thumbnail
View/Open
Sullivan_2016_PLoS_DiseasePhenotypes_CC.pdf (1.105Mb)
Date
02/05/2016
Author
Zhou, Shang-Ming
Fernandez-Gutierrez, Fabiola
Kennedy, Jonathan
Cooksey, Roxanne
Atkinson, Mark
Denaxas, Spiros
Siebert, Stefan
Dixon, William G.
O'Neill, Terence W.
Choy, Ernest
Sudlow, Cathie
UK Biobank Follow-up and Outcomes Group
Brophy, Sinead
Sullivan, Frank
Keywords
RA0421 Public health. Hygiene. Preventive Medicine
ZA4050 Electronic information resources
T-DAS
BDC
Metadata
Show full item record
Abstract
Objectives : 1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs. Methods : This study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge. Results : Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods. Conclusion : Data-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs.
Citation
Zhou , S-M , Fernandez-Gutierrez , F , Kennedy , J , Cooksey , R , Atkinson , M , Denaxas , S , Siebert , S , Dixon , W G , O'Neill , T W , Choy , E , Sudlow , C , UK Biobank Follow-up and Outcomes Group , Brophy , S & Sullivan , F 2016 , ' Defining disease phenotypes in primary care electronic health records by a machine learning approach : a case study in identifying rheumatoid arthritis ' , PLoS One , vol. 11 , no. 5 , e0154515 . https://doi.org/10.1371/journal.pone.0154515
Publication
PLoS One
Status
Peer reviewed
DOI
https://doi.org/10.1371/journal.pone.0154515
ISSN
1932-6203
Type
Journal article
Rights
© 2016 Zhou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Description
The work was supported by the UK Biobank, and undertaken with the support of the National Centre for Population Health and Wellbeing Research (NCPHWR) and the Farr Institute of Health Informatics Research. The NCPHWR is funded by Health and Care Research Wales (grant ref. : CA02). The Farr Institute is funded by a consortium of ten UK research organisations (grant ref. : MR/K006525/1): Arthritis Research UK, the British Heart Foundation, Cancer Research UK, the Economic and Social Research Council, the Engineering and Physical Sciences Research Council, the Medical Research Council, the National Institute of Health Research, the National Institute for Social Care and Health Research (Welsh Government) and the Chief Scientist Office (Scottish Government Health Directorates). WGD was supported by an MRC Clinician Scientist Fellowship (G0902272).
Collections
  • University of St Andrews Research
URL
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154515#sec018
URI
http://hdl.handle.net/10023/11028

Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

Advanced Search

Browse

All of RepositoryCommunities & CollectionsBy Issue DateNamesTitlesSubjectsClassificationTypeFunderThis CollectionBy Issue DateNamesTitlesSubjectsClassificationTypeFunder

My Account

Login

Open Access

To find out how you can benefit from open access to research, see our library web pages and Open Access blog. For open access help contact: openaccess@st-andrews.ac.uk.

Accessibility

Read our Accessibility statement.

How to submit research papers

The full text of research papers can be submitted to the repository via Pure, the University's research information system. For help see our guide: How to deposit in Pure.

Electronic thesis deposit

Help with deposit.

Repository help

For repository help contact: Digital-Repository@st-andrews.ac.uk.

Give Feedback

Cookie policy

This site may use cookies. Please see Terms and Conditions.

Usage statistics

COUNTER-compliant statistics on downloads from the repository are available from the IRUS-UK Service. Contact us for information.

© University of St Andrews Library

University of St Andrews is a charity registered in Scotland, No SC013532.

  • Facebook
  • Twitter