Show simple item record

Files in this item

Thumbnail

Item metadata

dc.contributor.authorBuhr, Christoph Raphael
dc.contributor.authorSmith, Harry
dc.contributor.authorHuppertz, Tilman
dc.contributor.authorBahr-Hamm, Katharina
dc.contributor.authorMatthias, Christoph
dc.contributor.authorBlaikie, Andrew
dc.contributor.authorKelsey, Tom
dc.contributor.authorKuhn, Sebastian
dc.contributor.authorEckrich, Jonas
dc.date.accessioned2024-02-22T12:30:11Z
dc.date.available2024-02-22T12:30:11Z
dc.date.issued2023-12-05
dc.identifier297350302
dc.identifier9a3fb956-eaa4-43f1-9bb0-08afc2528e31
dc.identifier85180330765
dc.identifier.citationBuhr , C R , Smith , H , Huppertz , T , Bahr-Hamm , K , Matthias , C , Blaikie , A , Kelsey , T , Kuhn , S & Eckrich , J 2023 , ' ChatGPT versus consultants : blinded evaluation on answering otorhinolaryngology case–based questions ' , JMIR Medical Education , vol. 9 , e49183 . https://doi.org/10.2196/49183en
dc.identifier.issn2369-3762
dc.identifier.otherORCID: /0000-0002-8091-1458/work/148888725
dc.identifier.otherORCID: /0000-0001-7913-6872/work/153977058
dc.identifier.urihttps://hdl.handle.net/10023/29325
dc.description.abstractBackground : Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more “consultations” of LLMs about personal medical symptoms. Objective : This study aims to evaluate ChatGPT’s performance in answering clinical case–based questions in otorhinolaryngology (ORL) in comparison to ORL consultants’ answers. Methods : We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-point Likert scale. They also identified (in a blinded setting) if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared. Due to the rapidly evolving pace of technology, a comparison between responses generated by ChatGPT 3 and ChatGPT 4 was included to give an insight into the evolving potential of LLMs. Results : Ratings in all categories were significantly higher for ORL consultants (P<.001). Although inferior to the scores of the ORL consultants, ChatGPT’s scores were relatively higher in semantic categories (conciseness, coherence, and comprehensibility) compared to medical adequacy. ORL consultants identified ChatGPT as the source correctly in 98.4% (121/123) of cases. ChatGPT’s answers had a significantly higher character count compared to ORL consultants (P<.001). Comparison between responses generated by ChatGPT 3 and ChatGPT 4 showed a slight improvement in medical accuracy as well as a better coherence of the answers provided. Contrarily, neither the conciseness (P=.06) nor the comprehensibility (P=.08) improved significantly despite the significant increase in the mean amount of characters by 52.5% (n= (1470-964)/964; P<.001). Conclusions : While ChatGPT provided longer answers to medical problems, medical adequacy and conciseness were significantly lower compared to ORL consultants’ answers. LLMs have potential as augmentative tools for medical care, but their “consultation” for medical problems carries a high risk of misinformation as their high semantic quality may mask contextual deficits.
dc.format.extent9
dc.format.extent374457
dc.language.isoeng
dc.relation.ispartofJMIR Medical Educationen
dc.subjectLarge language modelsen
dc.subjectLLMsen
dc.subjectLLMen
dc.subjectArtificial intelligenceen
dc.subjectAIen
dc.subjectChatGPTen
dc.subjectOtorhinolaryngologyen
dc.subjectORLen
dc.subjectDigital healthen
dc.subjectChatbotsen
dc.subjectGlobal healthen
dc.subjectLow- and middle- income countriesen
dc.subjectTelemedicineen
dc.subjectTelehealthen
dc.subjectLanguage modelen
dc.subjectChatboten
dc.subjectQA75 Electronic computers. Computer scienceen
dc.subjectR Medicine (General)en
dc.subject3rd-DASen
dc.subject.lccQA75en
dc.subject.lccR1en
dc.titleChatGPT versus consultants : blinded evaluation on answering otorhinolaryngology case–based questionsen
dc.typeJournal articleen
dc.contributor.institutionUniversity of St Andrews. School of Computer Scienceen
dc.contributor.institutionUniversity of St Andrews. Sir James Mackenzie Institute for Early Diagnosisen
dc.contributor.institutionUniversity of St Andrews. Centre for Interdisciplinary Research in Computational Algebraen
dc.contributor.institutionUniversity of St Andrews. Infection and Global Health Divisionen
dc.contributor.institutionUniversity of St Andrews. School of Medicineen
dc.identifier.doihttps://doi.org/10.2196/49183
dc.description.statusPeer revieweden


This item appears in the following Collection(s)

Show simple item record