Urology consultants versus large language models : potentials and hazards for medical advice in urology

Eckrich, Johanna; Ellinger, Jörg; Cox, Alexander; Stein, Johannes; Ritter, Manuel; Blaikie, Andrew; Kuhn, Sebastian; Buhr, Christoph Raphael

Show simple item record

Files in this item

Name:: Eckrich_2024_BJUI-C_Urology-consultants_CC.pdf
Size:: 575.5Kb
Format:: PDF

View/Open

Item metadata

dc.contributor.author	Eckrich, Johanna
dc.contributor.author	Ellinger, Jörg
dc.contributor.author	Cox, Alexander
dc.contributor.author	Stein, Johannes
dc.contributor.author	Ritter, Manuel
dc.contributor.author	Blaikie, Andrew
dc.contributor.author	Kuhn, Sebastian
dc.contributor.author	Buhr, Christoph Raphael
dc.date.accessioned	2024-04-03T09:30:07Z
dc.date.available	2024-04-03T09:30:07Z
dc.date.issued	2024-04-03
dc.identifier	300916197
dc.identifier	a52d52e9-4d1c-49c4-92ad-3488ac199a47
dc.identifier	85189903280
dc.identifier.citation	Eckrich , J , Ellinger , J , Cox , A , Stein , J , Ritter , M , Blaikie , A , Kuhn , S & Buhr , C R 2024 , ' Urology consultants versus large language models : potentials and hazards for medical advice in urology ' , BJUI Compass , vol. Early View . https://doi.org/10.1002/bco2.359	en
dc.identifier.issn	2688-4526
dc.identifier.other	RIS: urn:7195D15AB83DA25B09E60EA5010BAE91
dc.identifier.other	ORCID: /0000-0001-7913-6872/work/157140228
dc.identifier.uri	https://hdl.handle.net/10023/29591
dc.description.abstract	Background Current interest surrounding large language models (LLMs) will lead to an increase in their use for medical advice. Although LLMs offer huge potential, they also pose potential misinformation hazards. Objective This study evaluates three LLMs answering urology-themed clinical case-based questions by comparing the quality of answers to those provided by urology consultants. Methods Forty-five case-based questions were answered by consultants and LLMs (ChatGPT 3.5, ChatGPT 4, Bard). Answers were blindly rated using a six-step Likert scale by four consultants in the categories: ‘medical adequacy’, ‘conciseness’, ‘coherence’ and ‘comprehensibility’. Possible misinformation hazards were identified; a modified Turing test was included, and the character count was matched. Results Higher ratings in every category were recorded for the consultants. LLMs' overall performance in language-focused categories (coherence and comprehensibility) was relatively high. Medical adequacy was significantly poorer compared with the consultants. Possible misinformation hazards were identified in 2.8% to 18.9% of answers generated by LLMs compared with <1% of consultant's answers. Poorer conciseness rates and a higher character count were provided by LLMs. Among individual LLMs, ChatGPT 4 performed best in medical accuracy (p < 0.0001) and coherence (p = 0.001), whereas Bard received the lowest scores. Generated responses were accurately associated with their source with 98% accuracy in LLMs and 99% with consultants. Conclusions The quality of consultant answers was superior to LLMs in all categories. High semantic scores for LLM answers were found; however, the lack of medical accuracy led to potential misinformation hazards from LLM ‘consultations’. Further investigations are necessary for new generations.
dc.format.extent	7
dc.format.extent	589386
dc.language.iso	eng
dc.relation.ispartof	BJUI Compass	en
dc.subject	Artificial intelligence (AI)	en
dc.subject	Bard	en
dc.subject	Chatbots	en
dc.subject	ChatGPT	en
dc.subject	Digital health	en
dc.subject	Global health	en
dc.subject	Large language models (LLMs)	en
dc.subject	Low- and middle-income countries (LMICs)	en
dc.subject	Telehealth	en
dc.subject	Telemedicine	en
dc.subject	Urology	en
dc.subject	DAS	en
dc.title	Urology consultants versus large language models : potentials and hazards for medical advice in urology	en
dc.type	Journal article	en
dc.contributor.institution	University of St Andrews. School of Medicine	en
dc.contributor.institution	University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis	en
dc.contributor.institution	University of St Andrews. Infection and Global Health Division	en
dc.identifier.doi	https://doi.org/10.1002/bco2.359
dc.description.status	Peer reviewed	en

This item appears in the following Collection(s)

University of St Andrews Research

Show simple item record