RT Journal Article
SR Electronic
T1 AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical
JF American Journal of Neuroradiology
JO Am. J. Neuroradiol.
FD American Society of Neuroradiology
SP 244
OP 248
DO 10.3174/ajnr.A8102
VO 45
IS 2
A1 Chien, Aichi
A1 Tang, Hubert
A1 Jagessar, Bhavita
A1 Chang, Kai-wei
A1 Peng, Nanyun
A1 Nael, Kambiz
A1 Salamon, Noriko
YR 2024
UL http://www.ajnr.org/content/45/2/244.abstract
AB BACKGROUND AND PURPOSE: The review of clinical reports is an essential part of monitoring disease progression. Synthesizing multiple imaging reports is also important for clinical decisions. It is critical to aggregate information quickly and accurately. Machine learning natural language processing (NLP) models hold promise to address an unmet need for report summarization.MATERIALS AND METHODS: We evaluated NLP methods to summarize longitudinal aneurysm reports. A total of 137 clinical reports and 100 PubMed case reports were used in this study. Models were 1) compared against expert-generated summary using longitudinal imaging notes collected in our institute and 2) compared using publicly accessible PubMed case reports. Five AI models were used to summarize the clinical reports, and a sixth model, the online GPT3davinci NLP large language model (LLM), was added for the summarization of PubMed case reports. We assessed the summary quality through comparison with expert summaries using quantitative metrics and quality reviews by experts.RESULTS: In clinical summarization, BARTcnn had the best performance (BERTscore = 0.8371), followed by LongT5Booksum and LEDlegal. In the analysis using PubMed case reports, GPT3davinci demonstrated the best performance, followed by models BARTcnn and then LEDbooksum (BERTscore = 0.894, 0.872, and 0.867, respectively).CONCLUSIONS: AI NLP summarization models demonstrated great potential in summarizing longitudinal aneurysm reports, though none yet reached the level of quality for clinical usage. We found the online GPT LLM outperformed the others; however, the BARTcnn model is potentially more useful because it can be implemented on-site. Future work to improve summarization, address other types of neuroimaging reports, and develop structured reports may allow NLP models to ease clinical workflow.BARTbidirectional and auto-regressive transformerBERTbidirectional encoder representations from transformerLEDlongformer-encoder-decoderLLMlarge language modelNLPnatural language processingROUGErecall-oriented understudy for gisting evaluation