This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.
ABSTRACT
BACKGROUND AND PURPOSE: Accurate glioma segmentation has the potential to enhance clinical decision-making and treatment planning. Uncertainty quantification methods, including conformal prediction (CP), can enhance segmentation models reliability. CP quantifies uncertainty with statistical confidence guarantees. This study aims to use CP in glioma segmentation.
MATERIALS AND METHODS: We used the publicly available UCSF and UPenn glioma datasets, with the UCSF dataset (495 cases) split into training (70%), validation (10%), calibration (10%), and test (10%) sets, and the UPenn dataset (147 cases) divided into external calibration (30%) and external test (70%) sets. A UNet model was trained, and its optimal threshold was set to 0.5 using prediction normalization. To apply CP, the conformal threshold was selected based on the internal/external calibration nonconformity score, and CP was subsequently applied to the internal/external test sets, with coverage -the proportion of true labels within prediction sets-reported for all. We defined the uncertainty ratio (UR) and assessed its correlation with the Dice score coefficient (DSC) and 95th percentile Hausdorff distance (HD95). Additionally, we categorized cases into certain and uncertain groups based on UR and compared their DSC and HD95. We also evaluate the correlation between UR and the evaluation metrics (DSC and HD95) of the BraTS fusion model segmentation (BFMS), and compare evaluation metrics in the certain and uncertain subgroups.
RESULTS: The base model achieved a DSC of 0.86 and 0.83, and an HD95 of 7.35 and 11.71 on the internal and external test sets, respectively. The CP coverage was 0.9982 for the internal test set and 0.9977 for the external test set. Statistical analysis showed significant correlations between UR and evaluation metrics for test sets (p values <0.001). Additionally, certain cases had significantly better evaluation metrics (higher DSC and lower HD95) than uncertain cases in test sets and the BFMS (p values <0.001).
CONCLUSIONS: CP effectively quantifies uncertainty in glioma segmentation. Using CONSeg improves the reliability of segmentation models and enhances human-computer interaction. Additionally, CONSeg can identify uncertain cases and suggest them for manual segmentation.
ABBREVIATIONS: CP = conformal prediction; UR = uncertainty ratio; DSC = Dice score coefficient; BFMS = BraTS fusion model segmentation; DL = deep learning; UQ = uncertainty quantification; BCE = binary cross-entropy; BMOT = base model optimal threshold; NCST = nonconformity score threshold; CONSeg = conformal segmentation; BMPN = base model prediction normalization.
Footnotes
The authors declare no conflicts of interest related to the content of this article.
- © 2025 by American Journal of Neuroradiology