Article Text

Download PDFPDF

Original research
Automatic radiomic feature extraction using deep learning for angiographic parametric imaging of intracranial aneurysms
  1. Alexander R Podgorsak1,2,3,
  2. Ryan A Rava2,3,
  3. Mohammad Mahdi Shiraz Bhurwani2,3,
  4. Anusha R Chandra2,3,
  5. Jason M Davies3,4,5,
  6. Adnan H Siddiqui3,4,
  7. Ciprian N Ionita1,2,3
  1. 1 Department of Medical Physics, University at Buffalo, State University of New York, Buffalo, New York, USA
  2. 2 Department of Biomedical Engineering, University at Buffalo, State University of New York, Buffalo, New York, USA
  3. 3 Canon Stroke and Vascular Research Center, Buffalo, New York, USA
  4. 4 Department of Neurosurgery, University at Buffalo, State University of New York, Buffalo, New York, USA
  5. 5 Department of Biomedical Informatics, University at Buffalo, State University of New York, Buffalo, New York, USA
  1. Correspondence to Dr Ciprian N Ionita, Department of Biomedical Engineering, University at Buffalo, State University of New York, Buffalo, NY 14260, USA; cnionita{at}buffalo.edu

Abstract

Background Angiographic parametric imaging (API) is an imaging method that uses digital subtraction angiography (DSA) to characterize contrast media dynamics throughout the vasculature. This requires manual placement of a region of interest over a lesion (eg, an aneurysm sac) by an operator.

Objective The purpose of our work was to determine if a convolutional neural network (CNN) was able to identify and segment the intracranial aneurysm (IA) sac in a DSA and extract API radiomic features with minimal errors compared with human user results.

Methods Three hundred and fifty angiographic images of IAs were retrospectively collected. The IAs and surrounding vasculature were manually contoured and the masks put to a CNN tasked with semantic segmentation. The CNN segmentations were assessed for accuracy using the Dice similarity coefficient (DSC) and Jaccard index (JI). Area under the receiver operating characteristic curve (AUROC) was computed. API features based on the CNN segmentation were compared with the human user results.

Results The mean JI was 0.823 (95% CI 0.783 to 0.863) for the IA and 0.737 (95% CI 0.682 to 0.792) for the vasculature. The mean DSC was 0.903 (95% CI 0.867 to 0.937) for the IA and 0.849 (95% CI 0.811 to 0.887) for the vasculature. The mean AUROC was 0.791 (95% CI 0.740 to 0.817) for the IA and 0.715 (95% CI 0.678 to 0.733) for the vasculature. All five API features measured inside the predicted masks were within 18% of those measured inside manually contoured masks.

Conclusions CNN segmentation of IAs and surrounding vasculature from DSA images is non-inferior to manual contours of aneurysms and can be used in parametric imaging procedures.

  • standards

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key points

  • Quantitative assessment of aneurysms assists in surgical planning.

  • Current methods are too time consuming and require a manual operator.

  • A neural network was used to automate and speed up the quantitative imaging process.

  • The performance of the trained network is non-inferior to a manual operator.

  • Such a network may help to streamline the clinical workflow of aneurysm treatment.

Introduction

Digital subtraction angiography (DSA) is often used to evaluate the structural basis of neurovascular disease such as stenosis, arteriovenous malformations (AVMs), and intracranial aneurysms (IAs).1–3 In addition to the geometry and connectivity shown by standard DSA, angiographic parametric imaging (API) has helped clinicians evaluate the functional nature of these lesions. This may be useful in understanding natural history, predicting disease progression, or planning interventions. API uses contrast media flow characteristics in arteries and perfused tissue to synthesize a regional time density curve.4 By parametrization of the curve, features such as mean transit time and time to peak can be extracted. These parameters have been shown to relate to different physiological conditions such as AVMs,5 carotid stenting,6 and vasospasm.7

Until recently, API was used only as a semi-qualitative tool, visualized as color maps overlain on DSA images, primarily due to an incomplete understanding of how the image features correlated with complex blood flow conditions and treatment outcomes. Studies have indicated that this tool may enable precise quantitative measurements and treatment assessments.4 8 The current workflow for targeted measurement of API features requires a user to manually outline the inlet region of the contrast media to the targeted vasculature and the IA. Manual contouring is not optimal in a clinical environment or for large-scale studies due to temporal inefficiencies of the process, but can be automated.

The use of convolutional neural networks (CNNs) has gained popularity in numerous applications including facial recognition, language processing, drug discovery, and even the game of checkers. This is possible due to the network’s ability to analyze connections and patterns between sets of image data at a dimension that humans cannot. With the advent of parallel processing and inexpensive graphical processing units, CNNs have been applied on a timescale that is more realistic to clinical applications. CNNs are capable of entire-image classification9 10 and more local classification processing such as bounding box detection.11–13 Further improvements that have increased the resolution to the individual pixel scale have opened the door to a pixel-by-pixel classification, essentially carrying out a semantic segmentation process with the network.14

We studied whether a CNN trained with angiography data could automatically extract blood flow-related radiomic features with the same accuracy as that of a human user. Such a process may increase the clinical usability of temporal angiographic data using API processing, and may allow its use in support of clinical decisions.

Methods

Image collection

Institutional review board approval was obtained and informed consent was waived for this Health Insurance Portability and Accountability Act-compliant retrospective study. The total inclusion flowchart schema is shown in figure 1. Inclusion criteria included image data of adult patients (between 18 and 90 years of age) scheduled for treatment for or treated for a saccular aneurysm with an endovascular coil at our institute’s hospital. DSA image acquisition was carried out using a Canon Infinix-i Biplane angiography unit (Canon Medical Systems Corporation, Otawara, Japan) using iohexol contrast. After removing poor quality image sequences, 350 DSA acquisitions remained and were included in our study, 192 prior to and 158 following aneurysm treatment. There were 313 aneurysms visible in the DSAs, 165 on the internal carotid artery (9 C2, 18 C4, 46 C5, 65 C6, and 27 C7), 98 on the anterior cerebral artery, 35 on the middle cerebral artery, 13 on the basilar artery, and 2 on the posterior cerebral artery. Note that there were acquisitions with multiple aneurysms. In such cases, we treated the aneurysms as separate in the same image acquisition. Due to the field of view or viewing angle selected, 45 DSAs did not have the aneurysm visible and were used as true-negative examples. This study tasked the network to identify aneurysms prior to treatment, aneurysms partially occluded due to intervention, and aneurysms completely occluded due to the coiling process.

Figure 1

Study inclusion flowchart schema. DSA, digital subtraction angiography.

Ground-truth label creation

The IAs and surrounding vasculature were hand-contoured by users using a custom algorithm developed in the Laboratory Virtual Instrument Engineering Workbench (LabVIEW) (National Instruments, Austin, Texas, USA) programming environment. The users had a 1 year minimum exposure to evaluation of neurovascular angiograms at our institute. To reduce the background image noise, thus enhancing the vascular signal, the frames in the DSA sequences were averaged together when arteries and IAs were fully opacified to create a single frame. The averaged frame was passed through a two-dimensional median filter sized 3×3 pixels to attenuate background structure caused by motion artifacts. Then, an intensity threshold was applied to separate the background from the vasculature. Background pixels were assigned a value of 0; pixels within the vasculature class were assigned a value of 1. The aneurysm sac was separated from the surrounding vasculature using hand-contouring. Pixels within the aneurysm sac were assigned a value of 2. This entire process took approximately 2 min for each DSA sequence.

Data augmentation

The 350 image frames were randomly split into two cohorts, 250 for training and 100 for testing. CNNs require large datasets15 to optimize the weights at each network layer. We used a data augmentation scheme created with Python (Python Software Foundation, Wilmington, Delaware, USA) to increase the size of our training dataset.16 The training and testing sets were expanded using a combination of rotations and zoom settings. Each averaged frame from the DSA sequences was rotated 90°, 180°, and 270° and zoomed in to 200% of its original size. The augmentation process expanded the training set to 1500 examples. The process of zooming in on regions of the image data during augmentation created new true-negative examples where no aneurysm was present, improving the trained network generalizability.

Machine learning

CNN architecture was created within Keras.17 This architecture utilized a Visual Geometry Group-16 (VGG-16) encoder, two fully convolutional layers with rectified linear unit activation and 50% dropout, and a decoder consisting of transposed convolutional layers to up-sample the image data back to the original input image size. We tasked this network with a pixel-by-pixel semantic segmentation problem, where each pixel was classified as a member of one of three groups: background, vasculature, or aneurysm.

We ran modeling on an Nvidia Quadro K2200 GPU (Nvidia Corporation, Santa Clara, California, USA). Our network used the ADADELTA optimizer, an adaptive method18 for gradient descent which adapts the learning rate over time such that network learning continues even after many epochs. The summation of the Dice loss (the complement of the Dice similarity coefficient (DSC)19 20 and the binary cross-entropy) was used to compute the loss between the network’s predicted mask and the ground-truth mask in the training cohort following each training epoch and was used to steer the gradient descent during training. Following the network weight optimization, the model was assessed using the testing cohort.

Network training relies on the ground-truth label accuracy for proper weight optimization. Two users created the training labels indicating that this process may be subject to interobserver variability. This variability was assessed by computing the percent overlap between hand-segmented aneurysm and vasculature contours in the 350 image frames by the two users.

Quantitative analysis

Trained network predictive capability was assessed through multiple metrics. The area under the receiver operating characteristic curve21 (AUROC) was computed for the predicted masks using Python’s scikit-learn version 0.20. The agreement between the predicted masks and the hand-contoured ground-truth labels was computed using the Jaccard index (JI)22 and the DSC to measure similarity between datasets. These metrics were computed and averaged for the aneurysm and vasculature predictions over the entire testing cohort.

To assess the similarities between the API features computed within the network predicted aneurysm region and those from the ground-truth aneurysm region in the API software, a mean percent difference over the testing cohort for each API feature was computed. Additionally, Pearson correlation coefficients23 24 were calculated to measure the correlation between the ground-truth API feature values and the network-extracted API feature values.

All analyses were repeated considering subgroupings of pre- and post-coiled aneurysm cases to assess any network sensitivity to the type of input image data. Significance (p>0.05) of differences between subgroups was assessed with a one-tailed heteroscedastic t-test.

Results

The entire training process took approximately 24 hours for the network weights to converge to optimal values. Each example in the testing cohort was labeled in approximately 5 s by the network, and radiomic feature extraction using the API software within the aneurysm sac took an additional 5 s.

Segmentation accuracy

The mean JI over the testing cohort was 0.823 (95% CI 0.783 to 0.863) for the IA and 0.737 (95% CI 0.682 to 0.792) for the vasculature. The mean DSC over the testing cohort was 0.903 (95% CI 0.867 to 0.937) for the IA and 0.849 (95% CI 0.811 to 0.887) for the vasculature. Considering only the pre-coiled cases in the testing cohort, the mean JI was 0.826 (95% CI 0.788 to 0.862) for the IA and 0.731 (95% CI 0.727 to 0.735) for the vasculature. Mean DSC over the pre-coiled cases in the testing cohort was 0.904 (95% CI 0.861 to 0.947) for the IA and 0.845 (95% CI 0.821 to 0.866) for the vasculature. Considering only the post-coiled cases in the testing cohort, the mean JI was 0.811 (95% CI 0.784 to 0.838) for the IA and 0.740 (95% CI 0.688 to 0.792) for the vasculature. Mean DSC over the post-coiled cases in the testing cohort was 0.891 (95% CI 0.869 to 0.913) for the IA and 0.851 (95% CI 0.811 to 0.891) for the vasculature. The differences in segmentation accuracy between the pre- and post-coiled examples in the testing cohort were not significant.

Diagnostic accuracy

The mean AUROC over the testing cohort was 0.791 (95% CI 0.740 to 0.817) for the IA and 0.715 (95% CI 0.678 to 0.733) for the vasculature. Considering only the pre-coiled cases in the testing cohort, the mean AUROC was 0.793 (95% CI 0.744 to 0.842) for the IA and 0.716 (95% CI 0.680 to 0.752) for the vasculature. Considering only the post-coiled cases in the testing cohort, the mean AUROC was 0.787 (95% CI 0.738 to 0.821) for the IA and 0.711 (95% CI 0.689 to 0.733) for the vasculature. The differences in the diagnostic accuracy between the pre- and post-coiled examples in the testing cohort were not significant. Receiver operating characteristic curves of the network’s segmentation predictions using the entire testing cohort and the sub-groupings in the testing cohort are shown in figure 2.

Figure 2

(A) Receiver operating characteristic curves for the predicted segmentation of the aneurysm sac (solid line) and surrounding vasculature (dashed line) using the entire testing cohort. (B) Receiver operating characteristic curves for the predicted segmentation of the aneurysm sac and surrounding vasculature using the diagnostic cases in the testing cohort. (C) Receiver operating characteristic curves for the predicted segmentation of the aneurysm sac and surrounding vasculature using the coiled cases in the testing cohort. AUROC, area under the receiver operating characteristic curve.

Agreement analysis

Qualitatively, the segmented masks for the aneurysm sac had notable overlap with the manual aneurysm labels, indicating that there should be good agreement expected between API values extracted within the two masks. Quantitatively, figure 3 shows box-and-whisker plots detailing the absolute average percent difference over the entire testing cohort, as well and when considering the pre- and post-coiled subgroups between five selected API features computed within the predicted aneurysm masks and API features computed within hand-contoured masks. The average percent differences were below 18% for all parameters when the entire testing cohort is considered. Also included on the box-and-whisker plots is the Pearson correlation coefficient between the API features extracted using the trained network-segmented masks and those of the manual user for all API features. Pearson correlation coefficients averaged over the features was 0.906 when the entire testing cohort is considered, indicating a linear correlation between the manually contoured masks and the network segmented masks.

Figure 3

(A) Box-and-whisker plots showing the mean percent difference between aneurysmal radiomic features computed within network IA predictions and those computed within human-contoured IA regions over the entire test cohort. X represents mean values for each feature. All five network-computed feature values had an average of <18% difference from those computed in the human-contoured regions. Pearson correlation coefficients (ρ) for each feature are shown. (B) Box-and-whisker plots showing the mean percent difference between aneurysmal radiomic features computed within network IA predictions and those computed within human-contoured IA regions considering just the diagnostic cases in the testing cohort. (C) Box-and-whisker-plots showing the mean percent difference between aneurysmal radiomic features computed within network IA predictions and those computed within human-contoured IA regions considering just the coiled cases in the testing cohort. AUC, area under the (time density) curve; BAT, bolus arrival time; IA, intracranial aneurysm; MTT, mean transit time; PH, peak height (of the time density curve); TTP, time to peak.

Considering the diagnostic subgroup in the testing cohort, all API features had Pearson correlation coefficients greater than 0.86 indicating a linear correlation between the manually contoured masks and network segmented masks. The average percent differences were below 15% for all parameters. Considering the coiled subgroup in the testing cohort, all API features except bolus arrival time had Pearson correlation coefficients greater than 0.74, indicating a linear correlation between the manually contoured masks and network segmented masks for those features. There was a significant difference between the percent difference of the bolus arrival time API feature computed within the network and hand-contoured aneurysm regions considering the coiled and diagnostic examples in the test cohort. Figure 4 shows qualitative agreement between the aneurysm network segmentation and the manual aneurysm segmentation.

Figure 4

(A) Single averaged digital subtraction angiography frame showing an intracranial aneurysm on the posterior communicating artery. (B) Corresponding angiographic parametric imaging feature map showing manual contour of aneurysm region by the human user. (C) Network segmentation overlaid onto image data in (A) with the aneurysm shown in blue and the vasculature shown in red, indicating agreement with the manually contoured mask shown in (B).

Interobserver variability

Over all image frames, the percent overlap between the ground-truth contours created by the two users was 95.5%, indicating little variability between the users during the label creation process.

Discussion

We presented a machine learning framework designed to automatically segment saccular IAs and surrounding vasculature from DSAs. The generated masks were used for a targeted API acquisition where radiomic features were extracted for quantitative lesion assessment. Pearson correlation coefficients indicated a strong association (ρ >0.78) between computed API feature values generated with the CNN or manual contouring. To our knowledge, this is one of the first studies to compare the capacity of a CNN-based algorithm to automatically extract API features from within an aneurysm sac against the current standard method of hand-contouring.

The use of VGG-16 architecture for the network encoder comes at the cost of computational time compared with shallower network architectures, necessitating a training time of approximately 24 hours for complete model weight convergence. When compared with other architectures for segmentation such as u-net25 that has over 31 million tunable parameters, the network used in this work has fewer parameters and can optimize the network weights more quickly and with fewer data.

Superior network performance was observed when segmenting the surrounding vasculature from the background in regions close to the aneurysm sac. As the contrast decreases towards the peripheral regions of the DSA frames, there is degraded segmentation accuracy. If the quantitative analysis of the segmentation accuracy were to be constrained to regions close to the aneurysm sac, the performance would improve.

When considering the diagnostic and coiled subgroups in the testing cohort, there was no significant difference in network segmentation accuracy for the aneurysm and vasculature classes indicating network independence to the input data in terms of treatment status. When comparing percent differences of specific API features between the diagnostic and coiled cases in the test cohort, only the bolus arrival time parameter had a significant increase in the percent difference. This parameter is prone to artifacts when devices such as catheters cause overestimation of bolus arrival time. If those regions are included in the aneurysm segmentation in either the network or hand-contoured labels, there will be a large mismatch in the estimation of bolus arrival time.

There were cases where the network missed a lesion that the human user found. This generally occurred with pre-coiled aneurysms <3 mm in size located off the internal carotid artery. Flattening the DSA sequences was suboptimal in the visualization of these aneurysms because it limited the contrast between them and the background. Moving forward, the use of class-activation maps26 to assess which aneurysm and vascular features the network finds most salient will be important for input data optimization. The visualization of these features could be enhanced in relation to other less salient features in the image data which would allow better network detection and segmentation of the IA and surrounding vasculature.

Other metrics such as classification accuracy were not considered for this work due to the imbalanced size of the aneurysm sac class compared with the size of the vasculature and background class. The number of background pixels is large compared with the number of pixels within a typical aneurysm sac, leading to an artificially high accuracy measurement regardless of how the aneurysm sac was segmented.

This study had limitations. The amount of data used for the network training is small for such a deep network architecture when compared with other computer vision applications. To address this concern, we augmented the training dataset by a factor that was empirically set to six, which might imply network overfitting to the training data presented. The model’s performance on the testing cohort suggests that the network does not suffer from overfitting. There are published works in the context of medical machine learning applications that have training datasets (n=200),27 (n=30),28 (n=376)29 on a similar scale to ours. Additionally, our network task uses image data that are not as diverse as other computer vision applications. Our image data are grayscale and have similar shape and intensities, reducing the task complexity. Another limitation is that the only type of aneurysms considered were saccular IAs and endovascular coiling was the only treatment method used. Other types of aneurysms (fusiform or dissecting) were not included due to low incidence.

Our use of true-negative cases where the aneurysm was not visible may have introduced bias to the data, as these are not actual true-negative cases where there is no aneurysm at all. With this work, we did not attempt to create an aneurysm identifier that will work to diagnose the presence of an aneurysm. We aimed to use this tool when the presence of an aneurysm had been confirmed and treatment was being planned. We hope that such a tool will be useful in the context of aneurysm treatment planning, not necessarily diagnosis.

Our results indicate that the network performed non-inferiorly to a human user and required less manual input, which was more time efficient. The current standard method of performing API feature extraction can take up to 20 min on a 1024×1024 pixels image compared with approximately 10 s using the trained network. We imagine a clinical workflow where, following the placement of an angiographic device, API can be carried out and the radiomic features computed using a trained network such as the one proposed in this work. Next, the extracted API values can be passed to a clinical outcome predictor tool as demonstrated previously in the literature.8 This may provide the possibility for an informed decision by the neurosurgeon to revise the surgical procedure (eg, add an extra coil) while the patient is still on the table or opt for closer observation. Overall, such a data-driven image data analysis using these radiomic features may lead to better occlusion rates and fewer complications following the procedure.

Conclusion

Our results show that a CNN can be trained to automatically segment saccular aneurysms (pre- or post-coiling) and surrounding vasculature from DSA images and that it is non-inferior at computing API features within the aneurysm sac compared with the standard method of hand-contouring. These trained CNN models may help to streamline clinical workflow and enable the use of more quantitative assessments of angiographic imaging methods.

Acknowledgments

The authors thank W Fawn Dorr for editorial assistance and Paul H Dressel for preparation of the illustration.

References

Footnotes

  • Contributors ARP, MSB, and CNI conceived and designed the research. AHS, JMD, ARC, RAR, and ARP collected and reviewed the data. ARP analyzed the data and performed statistical analysis. CNI handled funding and supervision. ARP drafted the manuscript. All authors made critical revisions of the manuscript and reviewed the final version.

  • Funding This project was partially supported by Canon Medical Systems and the Cummings Foundation.

  • Competing interests CNI: Equipment grant from Canon Medical Systems, support from the Cummings Foundation. JMD: Research grant: National Center for Advancing Translational Sciences of the National Institutes of Health under award number KL2TR001413 to the University at Buffalo; Speakers’ bureau: Penumbra; Honoraria: Neurotrauma Science; shareholder/ownership interests: RIST Neurovascular. AHS: Research grant: NIH/NINDS 1R01NS091075 as a co-investigator for ’Virtual Intervention of Intracranial Aneurysms"; financial interest/investor/stock options/ownership: Amnis Therapeutics, Apama Medical, Blink TBI, Buffalo Technology Partners, Cardinal Consultants, Cerebrotech Medical Systems, Cognition Medical, Endostream Medical, Imperative Care, International Medical Distribution Partners, Neurovascular Diagnostics, Q’Apel Medical, Rebound Therapeutics Corp, Rist Neurovascular, Serenity Medical, Silk Road Medical, StimMed, Synchron, Three Rivers Medical, Viseon Spine; Consultant/advisory board: Amnis Therapeutics, Boston Scientific, Canon Medical Systems USA, Cerebrotech Medical Systems, Cerenovus, Corindus, Endostream Medical, Guidepoint Global Consulting, Imperative Care, Integra LifeSciences Corp, Medtronic, MicroVention, Northwest University–DSMB Chair for HEAT Trial, Penumbra, Q’Apel Medical, Rapid Medical, Rebound Therapeutics Corp, Serenity Medical, Silk Road Medical, StimMed, Stryker, Three Rivers Medical, VasSol, W L Gore & Associates; Principal investigator/steering committee of the following trials: Cerenovus LARGE and ARISE II; Medtronic SWIFT PRIME and SWIFT DIRECT; MicroVention FRED & CONFIDENCE; MUSC POSITIVE; and Penumbra 3D Separator, COMPASS, and INVEST.

  • Ethics approval Institutional review board approval was obtained and informed consent was waived for this Health Insurance Portability and Accountability Act-compliant retrospective study. IRB Study Number: MOD00005495.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Patient consent for publication Not required.