Templated text synthesis for expert-guided multi-label extraction from radiology reports

Schrempf, Patrick; Watson, Hannah; Park, Eunsoo; Pajak, Maciej; MacKinnon, Hamish; Muir, Keith W.; Harris-Birtill, David; O’Neil, Alison Q.

Show simple item record

Files in this item

Name:: Schrempf_2021_Templated_text_synthesis_MAKE_03_00015.pdf
Size:: 1.409Mb
Format:: PDF

View/Open

Item metadata

dc.contributor.author	Schrempf, Patrick
dc.contributor.author	Watson, Hannah
dc.contributor.author	Park, Eunsoo
dc.contributor.author	Pajak, Maciej
dc.contributor.author	MacKinnon, Hamish
dc.contributor.author	Muir, Keith W.
dc.contributor.author	Harris-Birtill, David
dc.contributor.author	O’Neil, Alison Q.
dc.date.accessioned	2021-03-24T15:30:03Z
dc.date.available	2021-03-24T15:30:03Z
dc.date.issued	2021-03-24
dc.identifier	273470282
dc.identifier	b1eeb9ad-5aaa-4175-be3e-1900b54cbbb0
dc.identifier	000646866800001
dc.identifier	85113449876
dc.identifier.citation	Schrempf , P , Watson , H , Park , E , Pajak , M , MacKinnon , H , Muir , K W , Harris-Birtill , D & O’Neil , A Q 2021 , ' Templated text synthesis for expert-guided multi-label extraction from radiology reports ' , Machine Learning and Knowledge Extraction , vol. 3 , no. 2 , pp. 299-317 . https://doi.org/10.3390/make3020015	en
dc.identifier.issn	2504-4990
dc.identifier.other	Bibtex: make3020015
dc.identifier.other	ORCID: /0000-0003-2484-6855/work/91341053
dc.identifier.other	ORCID: /0000-0002-0740-3668/work/91341085
dc.identifier.uri	https://hdl.handle.net/10023/21706
dc.description	Funding:This work is part of the Industrial Centre for AI Research in digital Diagnostics (iCAIRD), which is funded by Innovate UK on behalf of UK Research and Innovation (UKRI) project number 104690. The Data Lab has also provided support and funding.	en
dc.description.abstract	Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.
dc.format.extent	19
dc.format.extent	1478282
dc.language.iso	eng
dc.relation.ispartof	Machine Learning and Knowledge Extraction	en
dc.subject	NLP	en
dc.subject	Radiology report labelling	en
dc.subject	BERT	en
dc.subject	Data synthesis	en
dc.subject	Templates	en
dc.subject	QA75 Electronic computers. Computer science	en
dc.subject	R Medicine	en
dc.subject	E-DAS	en
dc.subject.lcc	QA75	en
dc.subject.lcc	R	en
dc.title	Templated text synthesis for expert-guided multi-label extraction from radiology reports	en
dc.type	Journal article	en
dc.contributor.sponsor	Technology Strategy Board	en
dc.contributor.institution	University of St Andrews. School of Computer Science	en
dc.identifier.doi	https://doi.org/10.3390/make3020015
dc.description.status	Peer reviewed	en
dc.identifier.url	https://www.mdpi.com/2504-4990/3/2/15	en
dc.identifier.grantnumber	TS/S013121/1	en

This item appears in the following Collection(s)

University of St Andrews Research

Show simple item record