RT Journal Article SR Electronic T1 AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical JF American Journal of Neuroradiology JO Am. J. Neuroradiol. FD American Society of Neuroradiology SP 244 OP 248 DO 10.3174/ajnr.A8102 VO 45 IS 2 A1 Chien, Aichi A1 Tang, Hubert A1 Jagessar, Bhavita A1 Chang, Kai-wei A1 Peng, Nanyun A1 Nael, Kambiz A1 Salamon, Noriko YR 2024 UL http://www.ajnr.org/content/45/2/244.abstract AB BACKGROUND AND PURPOSE: The review of clinical reports is an essential part of monitoring disease progression. Synthesizing multiple imaging reports is also important for clinical decisions. It is critical to aggregate information quickly and accurately. Machine learning natural language processing (NLP) models hold promise to address an unmet need for report summarization.MATERIALS AND METHODS: We evaluated NLP methods to summarize longitudinal aneurysm reports. A total of 137 clinical reports and 100 PubMed case reports were used in this study. Models were 1) compared against expert-generated summary using longitudinal imaging notes collected in our institute and 2) compared using publicly accessible PubMed case reports. Five AI models were used to summarize the clinical reports, and a sixth model, the online GPT3davinci NLP large language model (LLM), was added for the summarization of PubMed case reports. We assessed the summary quality through comparison with expert summaries using quantitative metrics and quality reviews by experts.RESULTS: In clinical summarization, BARTcnn had the best performance (BERTscore = 0.8371), followed by LongT5Booksum and LEDlegal. In the analysis using PubMed case reports, GPT3davinci demonstrated the best performance, followed by models BARTcnn and then LEDbooksum (BERTscore = 0.894, 0.872, and 0.867, respectively).CONCLUSIONS: AI NLP summarization models demonstrated great potential in summarizing longitudinal aneurysm reports, though none yet reached the level of quality for clinical usage. We found the online GPT LLM outperformed the others; however, the BARTcnn model is potentially more useful because it can be implemented on-site. Future work to improve summarization, address other types of neuroimaging reports, and develop structured reports may allow NLP models to ease clinical workflow.BARTbidirectional and auto-regressive transformerBERTbidirectional encoder representations from transformerLEDlongformer-encoder-decoderLLMlarge language modelNLPnatural language processingROUGErecall-oriented understudy for gisting evaluation