Abstract
BACKGROUND AND PURPOSE: The DWI-FLAIR mismatch is used to determine thrombolytic eligibility in patients with acute ischemic stroke when the time since stroke onset is unknown. Commercial software packages have been developed for automated DWI-FLAIR classification. We aimed to use e-Stroke software for automated classification of the DWI-FLAIR mismatch in a cohort of patients with acute ischemic stroke and in a comparative analysis with 2 expert neuroradiologists.
MATERIALS AND METHODS: In this retrospective study, patients with acute ischemic stroke who had MR imaging and known time since stroke onset were included. The DWI-FLAIR mismatch was evaluated by 2 neuroradiologists blinded to the time since stroke onset and automatically by the e-Stroke software. After 4 weeks, the neuroradiologists re-evaluated the MR images, this time equipped with automated predicted e-Stroke results as a computer-assisted tool. Diagnostic performances of e-Stroke software and the neuroradiologists were evaluated for prediction of DWI-FLAIR mismatch status.
RESULTS: A total of 157 patients met the inclusion criteria. A total of 82 patients (52%) had a time since stroke onset of ≤4.5 hours. By means of consensus reads, 81 patients (51.5%) had a DWI-FLAIR mismatch. The diagnostic accuracy (area under the curve/sensitivity/specificity) of e-Stroke software for the determination of the DWI-FLAIR mismatch was 0.72/90.0/53.9. The diagnostic accuracy (area under the curve/sensitivity/specificity) for neuroradiologists 1 and 2 was 0.76/69.1/84.2 and 0.82/91.4/73.7, respectively; both significantly (P < .05) improved to 0.83/79.0/86.8 and 0.89/92.6/85.5, respectively, following the use of e-Stroke predictions as a computer-assisted tool. The interrater agreement (κ) for determination of DWI-FLAIR status was improved from 0.49 to 0.57 following the use of the computer-assisted tool.
CONCLUSIONS: This automated quantitative approach for DWI-FLAIR mismatch provides results comparable with those of human experts and can improve the diagnostic accuracies of expert neuroradiologists in the determination of DWI-FLAIR status.
ABBREVIATIONS:
- AIS
- acute ischemic stroke
- AUC
- area under the curve
- CAT
- computer-assisted tool
- ROC
- receiver operating characteristic
- rSIR
- relative signal intensity ratio
- TSS
- time since stroke onset
In patients with acute ischemic stroke (AIS), the time since stroke onset (TSS) of < 4.5 hours has been used as a criterion for thrombolytic eligibility.1 Recently, advanced imaging has played a critical role in showing that a greater number of patients may benefit from thrombolytic therapy when using a “tissue clock” concept rather than considering the TSS alone. For example, in the Extending the Time for Thrombolysis in Emergency Neurological Deficits (EXTEND) trial,2 perfusion imaging was successfully used to extend the thrombolytic window up to 9 hours in patients who had salvageable brain tissue.
The DWI-FLAIR mismatch has been used as a tissue clock imaging biomarker that may better guide the appropriate use of thrombolytic therapy than the TSS alone.1,3 Generally, stroke lesions become more visible on FLAIR images as time passes from stroke onset. This concept was used in the design of the Efficacy and Safety of MRI-based Thrombolysis in Wake-up Stroke: a Randomised, Double-blind, Placebo-controlled Trial (WAKE-UP),3 which showed the benefit of thrombolytic treatment in patients with AIS with unknown onset or wake-up stroke as long as they had a DWI-FLAIR mismatch. However, the DWI-FLAIR mismatch has some limitations. These include its subjective nature, which introduces variability among human interpreters, which may, in part, depend on the level of expertise. A binary reporting standard of negative or positive is also limiting because the signal intensity difference between DWI and FLAIR often has a range and may be weakly positive or weakly negative rather than absolute. These limitations have resulted in modest interobserver agreement and diagnostic accuracies.4,5
Advances in image segmentation and machine learning techniques have shown promising results in an automated analysis of MR images to determine the DWI-FLAIR status.6⇓-8 In this study, we aimed to use an automated image-segmentation algorithm that is now commercially available (e-Stroke software; Brainomix) to automatically classify the DWI-FLAIR mismatch in a cohort of patients with AIS and to perform a comparative analysis with expert neuroradiologists. Specifically, we performed the following: 1) comparison of the diagnostic accuracy of e-stroke DWI-FLAIR mismatch output with that of expert neuroradiologists in the determination of the TSS; 2) assessment of the diagnostic accuracy of e-stroke DWI-FLAIR mismatch output in the prediction of the tissue clock as determined by consensus reads of 2 expert neuroradiologists; and 3) evaluation of the added value of e-Stroke DWI-FLAIR mismatch output when used as computer-assisted tool (CAT) to the diagnostic performance of expert neuroradiologists.
MATERIALS AND METHODS
Study Design and Patient Selection
In this retrospective study, consecutive patients with AIS who had pretreatment MR imaging and known TSS were included between September 2011 to August 2021. Institutional review board approval was obtained. The clinical characteristics such as age, sex, NIHSS, TSS, and location of the arterial occlusion if known were documented. Patients were excluded if they had an unknown or questionable TSS and poor MR image quality that impaired diagnostic evaluation by neuroradiologists.
Image Acquisition
MR imaging was performed on either a 1.5T MR imaging scanner (Avanto; Siemens) or a 3T MR imaging scanner (Magnetom Trio; Siemens) in our hospital. DWI was acquired using a single-shot spin-echo EPI sequence (TR/TE = 4900/98 ms [1.5T] or 4100/95 ms [3T]; FOV = 220 × 220 mm; matrix = 128 × 128 mm; slices = 30× 5 mm). Diffusion gradients were applied along 3 orthogonal directions with b=0 and 1000 s/mm2. The FLAIR images were acquired using a TR/TE = 9000/89 ms at 1.5T and 9000/122 ms at 3T; matrix = 256 × 256 mm; slices = 30 × 5 mm. The TI was 2504 ms at 1.5T and 2500 ms at 3T.
Image Analysis
For automated image analysis, MR diffusion and FLAIR images were uploaded to e-Stroke software (e-MRI module, Version 11.1; Brainomix) for automated image-processing and quantitative analysis. The software used an ADC threshold of 620 × 10 − 6 mm2/s to guide segmentation and generated a volume of interest that was used as an infarction mask.9 The FLAIR images were spatially realigned in 3D with the B0 image from the DWI data set. The process of realignment used a standard 3D rigid registration to determine the image transformation function with 6 df consisting of 3 rotations and 3 translations.10,11 Then, the coregistered flipped FLAIR images were used to compute the voxelwise relative FLAIR maps. The coregistered flipped FLAIR images were first smoothed with a 3D median filter (size: 7, 7, and 1 mm in x, y, z dimension). Tissue masks were generated by thresholding the B0 image to remove the CSF. The threshold was obtained by a K-means algorithm to group the voxels within the brain mask region to 2 clusters (CSF range and tissue range).12 For each voxel within the brain, the voxel value from the intensity-normalized FLAIR image was divided by the corresponding intensity value in its contralateral voxel and resulted in a relative FLAIR map. Voxel-based relative signal intensity ratios (rSIRs) from the infarction mask were computed and the values were reported as median and interquartile range within the infarction mask. Following calculation of rSIRs, the software automatically assigned each case as a match or mismatch using a median rSIR cutoff of ≥1.15 for a match.13
Two board-certified neuroradiologists (with 10 and 18 years of experience) blinded to the TSS and the results of automated analysis independently assessed the MR imaging studies to classify the DWI-FLAIR mismatch status for each patient. Mismatch was assigned when there was reduced diffusion on DWI with no signal on FLAIR, and match assignment was for cases in which there was corresponding FLAIR signal along the infarction territory. All disagreements were subsequently resolved by consensus between the 2 neuroradiologists.
In a subsequent follow-up analysis approximately 4 weeks after the initial readout session, the neuroradiologists were instructed to reclassify the DWI-FLAIR mismatch status while using e-Stroke predicted results as a CAT.
The final consensus reads of 2 neuroradiologists were used as the reference standard for final assignment of DWI-FLAIR mismatch status.
Statistical Analysis
Data were presented as mean (SD) for continuous data and median and interquartile range with relative frequencies (percentages) for categoric data. Receiver operating characteristic (ROC) analysis was performed, and the area under the curve (AUC) was calculated for the prediction of the TSS and tissue clock with accuracy measures including sensitivity and specificity. Interobserver agreement between readers was evaluated using a weighted κ test. For prediction of the TSS, the accuracy of the DWI-FLAIR mismatch status was compared against a dichotomized stroke-onset time using a TSS ≤4.5 or >4.5 hours. For prediction of the tissue clock, the consensus reads of the 2 neuroradiologists were used as the reference standard. The diagnostic performance of the e-Stroke software and each neuroradiologist before and after using e-Stroke as the CAT was then analyzed against the consensus reads. The added value of e-Stroke predictions to the accuracy of each neuroradiologist was evaluated using comparative ROC analysis and tested by the Delong test. The significance level was defined as P < .05. Statistical analyses were performed with MedCalc for Windows (Version 20.008; MedCalc Software).
RESULTS
Clinical Characteristics of the Patient Population
A total of 157 patients met our inclusion criteria. The average age was a mean of 68.7 (SD, 16.3) years, and a total of 79 (50.3%) patients were women. The severity of stroke determined by the NIHSS was a median of 10 (interquartile range, 5–16). A total of 151 (96%) patients had an identifiable intracranial arterial occlusion, including of the ICA (n = 18, 11.5%), M1 (n = 100, 63.7%), M2 (n = 19, 12%), anterior cerebral artery (n = 2, 1.2%), or posterior cerebral artery (n = 12, 7.6%). Three (2%) patients had lacunar infarction, and the other 3 (2%) patients had multiple small foci of infarctions in >2 vascular territories, likely related to an embolic shower. The infarct volume was a mean of 18.0 (SD, 25.6) mL. The TSS was a mean of 267.4 (SD, 269.2) minutes. By means of 4.5 hours as a threshold for thrombolytic treatment eligibility, a total of 75 (48%) patients had a TSS of >4.5 hours, while 82 (52%) patients had a TSS of ≤4.5 hours.
Determination of DWI-FLAIR Status
Automated image analysis by e-Stroke software using the FLAIR rSIR showed matched DWI-FLAIR in 49 patients and mismatch in 108 patients. Neuroradiologist 1 assigned 89 patients as matched and 68 patients as mismatched, while neuroradiologist 2 identified 63 as matched and 94 as mismatched for DWI-FLAIR status. The interobserver agreement for the determination of the DWI-FLAIR mismatch status was moderate (κ= 0.49; 95% CI, 0.36–0.62).
Following consensus reads between the 2 readers, a total of 76 patients were assigned as matched; and 81 patients as mismatched.
Diagnostic accuracy (AUC/sensitivity/specificity) of e-Stroke software for the determination of the DWI-FLAIR mismatch against consensus reads was 0.72/90.0/53.9 (P < .001) (Table).
Consensus Interpretation | ||||
---|---|---|---|---|
Matched (n = 76)a | Mismatched (n = 81)a | AUC/Sensitivity/Specificity | P Valueb | |
e-Stroke | 41 (54.0%) | 73 (90.1%) | 0.72/90.0/53.9 | <.001 |
R1 | 64 (84.2%) | 56 (69.1%) | 0.76/69.1/84.2 | <.001 |
R1-CAT | 66 (86.8%) | 64 (79.0%) | 0.83/79.0/86.8 | <.001 |
R2 | 56 (73.7%) | 74 (91.3%) | 0.82/91.4/73.7 | <.001 |
R2-CAT | 65 (85.5%) | 75 (92.5%) | 0.89/92.6/85.5 | <.001 |
The breakdown of correctly identified matched and mismatched DWI-FLAIR status in addition to diagnostic performances for e-Stroke software, for each neuroradiologist alone and in conjunction with CAT
For neuroradiologist 1, the diagnostic performance (AUC/sensitivity/specificity) of the initial interpretation was 0.76/69.1/84.2, which was significantly (P = .003) improved to 0.83/79.0/86.8 in the second interpretation following the use of e-Stroke predictions as a CAT (Table and Fig 1).
Comparative analysis of ROC curves for diagnostic performance of each neuroradiologist alone (R1, R2) and in conjunction with predicted results of e-Stroke software used as a CAT (R1-CAT, R2-CAT). The diagnostic performance of both neuroradiologists in determination of the DWI-FLAIR status was significantly improved compared with the consensus interpretations.
For neuroradiologist 2, the diagnostic performance (AUC/sensitivity/specificity) of the initial interpretation was 0.82/91.4/73.7, which was significantly (P = .005) improved to 0.89/92.6/85.5 after using e-Stroke predictions as a CAT (Table and Fig 1).
The interrater agreement for determination of the DWI-FLAIR status following the use of the CAT was also modestly improved to κ = 0.57 (95% CI, 0.44–0.72). In a sub-analysis to assess the diagnostic performance in determination of TSS (≤ or > 4.5 hours), the AUC/sensitivity/specificity were 0.63/81.7/45.3 (P < .001) for e-Stroke software, 0.67/57.3/76.0 for neuroradiologist 1 (P <.001) and 0.70/69.5/70.7 for neuroradiologist 2 (P < .001). There was no statistically significant difference between the neuroradiologists and e-Stroke software in prediction of TSS. Delong test showed P-values of 0.51 for e-Stroke software versus neuroradiologist 1, 0.13 for e-Stroke software versus neuroradiologists 2, and 0.38 between the two neuroradiologists.
The Figure 2 shows an example of a patient with TSS < 4.5 hours who was correctly classified by both neuroradiologists as DWI-FLAIR mismatch and automatically assigned as mismatch by e-Stroke software. The Figure 3 shows an example in a patient who had weak FLAIR signal associated with infarct region resulting in discrepant interpretation of DWI-FLAIR status between two neuroradiologists during the initial assessment. This case was subsequently corrected after using e-Stroke prediction as CAT to match the consensus reads.
A 73-year-old man with a left MCA-M1 occlusion who presented within 117 minutes from stroke onset. There is infarction involving the left frontal lobe, opercular region, and insula with reduced diffusion that is negative on FLAIR (ie, DWI-FLAIR mismatch). The infarction was automatically segmented by e-Stroke software (highlighted in purple), and the relative signal intensity of the infarction bed was calculated from corresponding FLAIR images at 1.03, rendering the DWI-FLAIR mismatch classification concordant with both neuroradiologists and TSS.
A 90-year-old woman with right internal carotid occlusion presented 190 minutes from stroke onset. The infarction was automatically segmented by e-Stroke software (highlighted in purple). The signal intensity ratio calculated automatically by e-Stroke software at 1.14, rendered the correct assignment of the DWI-FLAIR mismatch. The weak FLAIR signal associated with the infarct region resulted in a discrepant interpretation between 2 neuroradiologists. However, the neuroradiologist who initially classified this case as a match changed his interpretation to mismatch after using the e-Stroke software as a CAT, which was concordant with the consensus read.
DISCUSSION
Our results showed that automated image analysis afforded by advanced and streamlined image-segmentation techniques that are now commercially available can provide results similar to those of human experts in the determination of the DWI-FLAIR mismatch as a biomarker for the tissue clock. We would like to highlight 2 major findings in our results.
Our first finding is that e-Stroke software provided improved diagnostic accuracy and interrater agreement for determination of the tissue clock when used in conjunction with human interpreters. Assessment of the DWI-FLAIR mismatch is a difficult task that requires extensive training. Due to the binary reporting nature (negative or positive) of the DWI-FLAIR mismatch, current human assessment does not consider the wide range of signal intensities on FLAIR images. The heterogeneity of FLAIR signal intensity change across the infarction bed is one of the major contributing factors resulting in inconsistency and possible disagreements in interpretation of the DWI-FLAIR mismatch status.4,5 This limitation is reflected in modest interobserver agreement (κ = 0.49), similar to previously reported values ranging from 0.4 to 0.6 by human observers.4,5 However, after using the e-Stroke prediction as a CAT, the interrater agreement improved to κ = 0.57. Furthermore, the use of e-Stroke prediction as a CAT resulted in significantly improved diagnostic accuracy of the DWI-FLAIR mismatch (tissue clock) assignment, with approximately 10% increased sensitivity for one neuroradiologist and 11% increased specificity for the other.
Comparable diagnostic accuracies for e-Stroke software in the determination of the tissue clock (DWI-FLAIR mismatch) with the consensus reads of 2 expert neuroradiologists highlight the potential for this solution to aid thrombolytic decision-making, to supplant human interpretation when used as a decision support tool. By means of leveraging automated analysis tools in e-Stroke, the potential benefit may be even more relevant in settings where there is lack of neuroimaging expertise to ensure that efficient and consistent assessment can be obtained for treatment decisions equally in all patients.
Our second finding is that e-Stroke software provided results comparable with those of expert neuroradiologists in the prediction of the TSS using a cutoff value of 4.5 hours. Prior reports have shown that approximately 27%–50% of patients with stroke have positive FLAIR findings within 3 hours and 93% at >6 hours.14⇓-16 Our results are concordant with the results of prior reports showing only modest sensitivity in the range of 60% for TSS prediction by human observers.15,17,18 While the diagnostic performance of e-Stroke software in the prediction of the TSS was comparable with that of expert neuroradiologists, the automated TSS prediction provided by e-Stroke resulted in higher sensitivity (81.7%) in comparison with the modest sensitivity of human experts but at a cost of lower specificity.
Although the 4.5-hour cutoff for the TSS remains a thrombolytic eligibility criterion, there is now a transition toward accepting tissue status rather than the TSS alone for thrombolytic decision-making, at least for patients with an unknown TSS or wake-up strokes.3 In addition, there are some patients with stroke who may become FLAIR-positive in <4.5 hours and others who could remain FLAIR-negative even after 6 hours. Therefore, the classification of the TSS based on a 4.5 hour cutoff is imperfect19 and a waning cause.
Application of advanced image-processing techniques and artificial intelligence has shown promising potential to provide more consistent results for the prediction of the TSS and DWI-FLAIR status while mitigating the variability issues related to human observers.6⇓-8 However, these algorithms are yet to become commercially available for broad clinical use. Automated image-processing and segmentation by the e-Stroke solution that is now commercially available provides an opportunity for routine use to support treatment decisions if its potential is realized in a broader clinical setting.
Our study has several limitations. First, it was a retrospective study, which may introduce unknown bias. Second, this was a single-institution study with MR images included from a limited number of scanners. Including data from multicenter studies with greater variability in image-acquisition parameters and MR imaging scanners will be required to further generalize our results. Third, we were unable to test how e-Stroke software could affect treatment decisions in our retrospective design. In our cohort, the decision for thrombolysis was based solely on the TSS, which was determined at the time of patient presentation. We did not screen for patients with extensive white matter disease, and it is plausible that underlying leukoaraiosis could confound quantitative assessment of signal intensity ratios in a subset of our patients. Although the software algorithm considers the presence of non-normal voxels such as CSF and older white matter lesions, this potential mitigating effect of the software was not tested systematically for the presence of substantial white matter disease. Last, the criterion standard for ischemic brain tissue status was consensus reads of the DWI-FLAIR mismatch by 2 neuroradiologists. This is less than ideal but the best practical reference standard that could be adopted for our study because the DWI-FLAIR mismatch has been used as a surrogate for tissue clock.
CONCLUSIONS
Our study demonstrates the potential diagnostic utility of a fully automated quantitative approach provided by e-Stroke software to assess the DWI-FLAIR mismatch in patients with AIS. We showed that the automated software provides diagnostic accuracies comparable with those of expert neuroradiologists. Most important, when used by neuroradiologists as a CAT, the automated software significantly improved the diagnostic performance of neuroradiologists for more accurate classification of the DWI-FLAIR mismatch as a surrogate for tissue clock.
Footnotes
Disclosure forms provided by the authors are available with the full text and PDF of this article at www.ajnr.org.
References
- Received September 14, 2023.
- Accepted after revision January 12, 2024.
- © 2024 by American Journal of Neuroradiology