
臺大公衛學院為加強宣導本院教師之學術成果,進而提升本院國際能見度,由李達宇老師之撰寫下列文章投稿至ASPPH Friday Letter.
Evaluation of performance of generative large language models for stroke care
該文目前已刊登於 ASPPH Friday Letter, November 7, 2025
篇名:
Evaluation of Performance of Generative Large Language Models for Stroke Care
AUTHOR
John Tayu Lee ✉️ (本所專任李達宇副教授), Li VC, Wu JJ, Chen HH, Su SS, Chang BP, Lai RL, Liu CH, Chen CT, Tanapima V, Shen TK, Atun R.
JOURNAL NPJ Digit Med.
PUBLISHED 2025.07.29
Abstract
Stroke is a leading cause of global morbidity and mortality, disproportionately impacting lower socioeconomic groups. In this study, we evaluated three generative LLMs—GPT, Claude, and Gemini—across four stages of stroke care: prevention, diagnosis, treatment, and rehabilitation. Using three prompt engineering techniques—Zero-Shot Learning (ZSL), Chain of Thought (COT), and Talking Out Your Thoughts (TOT)—we applied each to realistic stroke scenarios. Clinical experts assessed the outputs across five domains: (1) accuracy; (2) hallucinations; (3) specificity; (4) empathy; and (5) actionability, based on clinical competency benchmarks. Overall, the LLMs demonstrated suboptimal performance with inconsistent scores across domains. Each prompt engineering method showed strengths in specific areas: TOT does well in empathy and actionability, COT was strong in structured reasoning during diagnosis, and ZSL provided concise, accurate responses with fewer hallucinations, especially in the Treatment stage. However, none consistently met high clinical standards across all stroke care stages.
Keyword
Health care, Vascular diseases