2025/11/21
【ASPPH Friday Letter】李達宇老師 Evaluation of performance of generative large language models for stroke care

臺大公衛學院為加強宣導本院教師之學術成果,進而提升本院國際能見度,由李達宇老師之撰寫下列文章投稿至ASPPH Friday Letter.

 

Evaluation of performance of generative large language models for stroke care

 

該文目前已刊登於 ASPPH Friday Letter, November 7, 2025

 

篇名:

 

Evaluation of Performance of Generative Large Language Models for Stroke Care


AUTHOR

John Tayu Lee ✉️ (本所專任李達宇副教授), Li VC, Wu JJ, Chen HH, Su SS, Chang BP, Lai RL, Liu CH, Chen CT, Tanapima V, Shen TK, Atun R.

 

 

JOURNAL  NPJ Digit Med.

PUBLISHED 2025.07.29

 

Abstract

 

Stroke is a leading cause of global morbidity and mortality, disproportionately impacting lower socioeconomic groups. In this study, we evaluated three generative LLMs—GPT, Claude, and Gemini—across four stages of stroke care: prevention, diagnosis, treatment, and rehabilitation. Using three prompt engineering techniques—Zero-Shot Learning (ZSL), Chain of Thought (COT), and Talking Out Your Thoughts (TOT)—we applied each to realistic stroke scenarios. Clinical experts assessed the outputs across five domains: (1) accuracy; (2) hallucinations; (3) specificity; (4) empathy; and (5) actionability, based on clinical competency benchmarks. Overall, the LLMs demonstrated suboptimal performance with inconsistent scores across domains. Each prompt engineering method showed strengths in specific areas: TOT does well in empathy and actionability, COT was strong in structured reasoning during diagnosis, and ZSL provided concise, accurate responses with fewer hallucinations, especially in the Treatment stage. However, none consistently met high clinical standards across all stroke care stages.

 

Keyword

 

 

Health care, Vascular diseases