Show simple item record

Files in this item


Item metadata

dc.contributor.authorSchrempf, Patrick
dc.contributor.authorWatson, Hannah
dc.contributor.authorPark, Eunsoo
dc.contributor.authorPajak, Maciej
dc.contributor.authorMacKinnon, Hamish
dc.contributor.authorMuir, Keith W.
dc.contributor.authorHarris-Birtill, David
dc.contributor.authorO’Neil, Alison Q.
dc.identifier.citationSchrempf , P , Watson , H , Park , E , Pajak , M , MacKinnon , H , Muir , K W , Harris-Birtill , D & O’Neil , A Q 2021 , ' Templated text synthesis for expert-guided multi-label extraction from radiology reports ' , Machine Learning and Knowledge Extraction , vol. 3 , no. 2 , pp. 299-317 .
dc.identifier.otherBibtex: make3020015
dc.identifier.otherORCID: /0000-0003-2484-6855/work/91341053
dc.identifier.otherORCID: /0000-0002-0740-3668/work/91341085
dc.descriptionFunding:This work is part of the Industrial Centre for AI Research in digital Diagnostics (iCAIRD), which is funded by Innovate UK on behalf of UK Research and Innovation (UKRI) project number 104690. The Data Lab has also provided support and funding.en
dc.description.abstractTraining medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.
dc.relation.ispartofMachine Learning and Knowledge Extractionen
dc.subjectRadiology report labellingen
dc.subjectData synthesisen
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectR Medicineen
dc.titleTemplated text synthesis for expert-guided multi-label extraction from radiology reportsen
dc.typeJournal articleen
dc.contributor.sponsorTechnology Strategy Boarden
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.description.statusPeer revieweden

This item appears in the following Collection(s)

Show simple item record