Assessing unknown potential—quality and limitations of different large language models in the field of otorhinolaryngology

Buhr, Christoph R; Smith, Harry; Huppertz, Tilman; Bahr-Hamm, Katharina; Matthias, Christoph; Cuny, Clemens; Snijders, Jan Phillipp; Ernst, Benjamin Philipp; Blaikie, Andrew; Kelsey, Tom; Kuhn, Sebastian; Eckrich, Jonas

Show simple item record

Files in this item

Name:: Buhr_2024_AOL_Assessing-unknown-potential_CC.pdf
Size:: 1.503Mb
Format:: PDF

View/Open

Item metadata

dc.contributor.author	Buhr, Christoph R
dc.contributor.author	Smith, Harry
dc.contributor.author	Huppertz, Tilman
dc.contributor.author	Bahr-Hamm, Katharina
dc.contributor.author	Matthias, Christoph
dc.contributor.author	Cuny, Clemens
dc.contributor.author	Snijders, Jan Phillipp
dc.contributor.author	Ernst, Benjamin Philipp
dc.contributor.author	Blaikie, Andrew
dc.contributor.author	Kelsey, Tom
dc.contributor.author	Kuhn, Sebastian
dc.contributor.author	Eckrich, Jonas
dc.date.accessioned	2024-06-04T16:30:07Z
dc.date.available	2024-06-04T16:30:07Z
dc.date.issued	2024-05-23
dc.identifier	302369391
dc.identifier	0474e647-31de-427e-a662-ee2893b76830
dc.identifier	85193978711
dc.identifier.citation	Buhr , C R , Smith , H , Huppertz , T , Bahr-Hamm , K , Matthias , C , Cuny , C , Snijders , J P , Ernst , B P , Blaikie , A , Kelsey , T , Kuhn , S & Eckrich , J 2024 , ' Assessing unknown potential—quality and limitations of different large language models in the field of otorhinolaryngology ' , Acta Oto-Laryngologica , vol. Latest Articles , pp. 1-6 . https://doi.org/10.1080/00016489.2024.2352843	en
dc.identifier.issn	1651-2251
dc.identifier.other	Bibtex: doi:10.1080/00016489.2024.2352843
dc.identifier.other	ORCID: /0000-0001-7913-6872/work/160316516
dc.identifier.other	ORCID: /0000-0002-8091-1458/work/160317043
dc.identifier.uri	https://hdl.handle.net/10023/29986
dc.description.abstract	Background Large Language Models (LLMs) might offer a solution for the lack of trained health personnel, particularly in low- and middle-income countries. However, their strengths and weaknesses remain unclear. Aims/objectives Here we benchmark different LLMs (Bard 2023.07.13, Claude 2, ChatGPT 4) against six consultants in otorhinolaryngology (ORL). Material and methods Case-based questions were extracted from literature and German state examinations. Answers from Bard 2023.07.13, Claude 2, ChatGPT 4, and six ORL consultants were rated blindly on a 6-point Likert-scale for medical adequacy, comprehensibility, coherence, and conciseness. Given answers were compared to validated answers and evaluated for hazards. A modified Turing test was performed and character counts were compared. Results LLMs answers ranked inferior to consultants in all categories. Yet, the difference between consultants and LLMs was marginal, with the clearest disparity in conciseness and the smallest in comprehensibility. Among LLMs Claude 2 was rated best in medical adequacy and conciseness. Consultants’ answers matched the validated solution in 93% (228/246), ChatGPT 4 in 85% (35/41), Claude 2 in 78% (32/41), and Bard 2023.07.13 in 59% (24/41). Answers were rated as potentially hazardous in 10% (24/246) for ChatGPT 4, 14% (34/246) for Claude 2, 19% (46/264) for Bard 2023.07.13, and 6% (71/1230) for consultants. Conclusions and significance Despite consultants superior performance, LLMs show potential for clinical application in ORL. Future studies should assess their performance on larger scale.
dc.format.extent	6
dc.format.extent	1576870
dc.language.iso	eng
dc.relation.ispartof	Acta Oto-Laryngologica	en
dc.subject	3rd-NDAS	en
dc.title	Assessing unknown potential—quality and limitations of different large language models in the field of otorhinolaryngology	en
dc.type	Journal article	en
dc.contributor.institution	University of St Andrews. School of Medicine	en
dc.contributor.institution	University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis	en
dc.contributor.institution	University of St Andrews. Infection and Global Health Division	en
dc.contributor.institution	University of St Andrews. School of Computer Science	en
dc.contributor.institution	University of St Andrews. Centre for Interdisciplinary Research in Computational Algebra	en
dc.identifier.doi	10.1080/00016489.2024.2352843
dc.description.status	Peer reviewed	en

This item appears in the following Collection(s)

University of St Andrews Research

Show simple item record