Article Text
Abstract
Background Visual perception of catheters and guidewires on x-ray fluoroscopy is essential for neurointervention. Endovascular robots with teleoperation capabilities are being developed, but they cannot ‘see’ intravascular devices, which precludes artificial intelligence (AI) augmentation that could improve precision and autonomy. Deep learning has not been explored for neurointervention and prior works in cardiovascular scenarios are inadequate as they only segment device tips, while neurointervention requires segmentation of the entire structure due to coaxial devices. Therefore, this study develops an automatic and accurate image-based catheter segmentation method in cerebral angiography using deep learning.
Methods Catheters and guidewires were manually annotated on 3831 fluoroscopy frames collected prospectively from 40 patients undergoing cerebral angiography. We proposed a topology-aware geometric deep learning method (TAG-DL) and compared it with the state-of-the-art deep learning segmentation models, UNet, nnUNet and TransUNet. All models were trained on frontal view sequences and tested on both frontal and lateral view sequences from unseen patients. Results were assessed with centerline Dice score and tip-distance error.
Results The TAG-DL and nnUNet models outperformed TransUNet and UNet. The best performing model was nnUNet, achieving a mean centerline-Dice score of 0.98 ±0.01 and a median tip-distance error of 0.43 (IQR 0.88) mm. Incorporating digital subtraction masks, with or without contrast, significantly improved performance on unseen patients, further enabling exceptional performance on lateral view fluoroscopy despite not being trained on this view.
Conclusions These results are the first step towards AI augmentation for robotic neurointervention that could amplify the reach, productivity, and safety of a limited neurointerventional workforce.
- angiography
- catheter
- navigation
- technology
- technique
Data availability statement
Data are available upon reasonable request. The code and data that support the findings of this study are available from the corresponding author upon reasonable request, with consideration given to the sensitive clinical nature of the data.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
Endovascular robots cannot ‘see’ intravascular devices on x-ray fluoroscopy, hindering the potential of artificial intelligence (AI) augmentation to enhance precision and autonomy in neurointervention. Prior works in cardiovascular device segmentation are inadequate as they only segment device tips, while neurointervention requires segmentation of the entire structure due to coaxial devices.
WHAT THIS STUDY ADDS
This study shows that deep learning methods are successful in complete catheter segmentation and tip tracking for cerebral angiography. It also shows that incorporating DSA masks enables deep learning methods to obtain exceptional results on both unseen patients and views (lateral).
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
By enabling computers to ‘see’ catheters and track their position, this work is the first step towards automation in neurointervention that could augment physician operators and improve patient access to expert life-saving care.
Introduction
Visualization of catheters and guidewires is essential to neurointervention. However, the limitations of intraoperative x-ray fluoroscopy, including poor contrast, image noise, and unexpected device movements, compromise procedure efficacy and pose serious risks to patients, physicians, and staff, such as increased radiation exposure, eyestrain, and orthopedic injuries.1–4 Endovascular robotics offers improved safety, consistency, and patient access in cerebrovascular interventions, with remote intervention already validated for cardiovascular and peripheral vascular indications.3 Telerobotic networks could disseminate life-saving stroke care and systems augmented with artificial intelligence (AI) could enhance precision and autonomy by observing expert interventionalists, similar to how self-driving cars can learn from expert drivers.1 5 6 However, the inability of these robots to ‘see’ and track catheters on natural fluoroscopy has limited progress to phantoms and other idealized models.2 7
Accurate and automatic frame-by-frame segmentation of the catheter and guidewire is the critical first step towards translating this technology to clinical use. In the context of this task, we refer to the combined catheter/guidewire structure as ‘catheter’ for simplicity throughout this paper. Catheter segmentation in fluoroscopy is challenging due to low signal-to-noise-ratio in fluoroscopy images, confounding by other wire-like structures, and sparsity of catheter pixels (<2%) compared with background pixels.8 Some groups have used external signals to sense catheter position, such as optical, piezoelectric, and magnetic trackers.9 While effective in theory, these approaches do not generalize well to the existing workflow of interventional suites that simultaneously use wires and catheters from multiple vendors without aftermarket modification. Consequently, detecting the catheter shape and position directly from fluoroscopy images is desirable.10
Recent work has used deep learning (DL) methods, particularly convolutional neural networks and vision transformer architectures, that have demonstrated superiority over conventional image processing methods.8 9 11–14 Through an iterative process supervised by human-annotated data, DL methods learn to optimally transform raw input data to a desired output by modulating millions of numerical weights in multilayered interconnected ‘neurons’.15 However, DL methods have not been developed for catheter segmentation in neurointervention. Most prior work in cardiovascular, peripheral vascular, or phantom data does not visualize the complete catheter structure, which is critical in neurointerventions where devices are coaxially passed over each other.9 11–13 Manual annotation of the complete catheter structure is considerably more laborious than annotating the tip alone, which has been a significant barrier to generating suitable datasets. Additionally, potential strategies to improve segmentation performance remain unexplored, including incorporating data from DSA roadmapping and exploiting geometric symmetries of the catheter. Furthermore, the centerline Dice is a topology-aware metric that rewards segmentation results that preserve connectivity, which is more suitable for catheter segmentation than the conventional Dice score used in most prior work, as illustrated in figure 1C.16A summary of relevant previous work is described in online supplmental table 1.
Therefore, this study investigates DL methods for catheter segmentation in fluoroscopy data from cerebrovascular interventions and evaluates the impact of application-specific and geometric considerations on performance. The study develops a topology-aware geometric deep learning (TAG-DL) method and compares this against state-of-the-art segmentation methods, UNet, nnUNet, and TransUNet, a hybrid convolutional neural networks/vision transformer architecture, to determine the optimal model architectures, inputs, and performance metrics for catheter segmentation. Currently, in the neurointerventional realm, robotics has largely been ‘robotic assisted’ rather than a true automated procedure.4 17 18 Our technology presented below will be an important step towards complete automation, which is the ultimate goal in the field of robotics.
Materials and methods
All the work presented here involved chart review of the patients’ imaging data and is approved by the institutional review board.
Patient selection and data collection
Due to the lack of publicly available data, we developed a novel annotated fluoroscopy dataset to support this study. The fluoroscopy videos in this dataset were collected prospectively from procedures performed in the neurointerventional radiology department at our hospital. We define a fluoroscopy sequence as a set of s 2D images, , where frames are separated by 0.1 s in time, corresponding to videos acquired at 10 frames/s.12 We included fluoroscopy sequences based on the following criteria: (1) use of DSA roadmapping, (2) containing only 0.035 inch guidewire and/or 5 Fr diagnostic catheter in-frame, and (3) catheter or guidewire tip at or proximal to the intracranial internal carotid artery. This reflects the conditions of diagnostic cerebral angiography, which may be done as either a standalone procedure or the initial component of a therapeutic procedure.19
Manual annotation of fluoroscopy video sequences
The combined structure of the catheter and guidewire were manually segmented as a single mask in frames of fluoroscopy sequences using MRIcron (NITRC, Washington, USA) by an expert MD/PhD student (6 years of experience) under supervision of an experienced imaging scientist (15 years of experience). The catheter and guidewire are two distinct devices, with the guidewire being thinner than the hollow catheter it travels through. Since either the catheter or guidewire may be the most distal intravascular device used for navigation, we treat them as a single segmentation mask, referred to as ‘catheter’, where the catheter pixels have a value of 1 and background pixels have a value of 0.
Deep learning model architecture and input
The proposed TAG-DL deep learning model was built on a rotation-reflection equivariant UNet.20–22 This network structure guarantees identical image output due to input image changes corresponding to four rotations each having two reflections.21 In addition to the equivalent UNet without grouped convolutions, we implemented two other state-of-the-art models for medical image segmentation, nnUNet and TransUNet.23–25 The nnUNet is a self-configuring UNet pipeline that determines optimal model architecture and hyperparameters based on characteristics of the input data.24 The TransUNet model incorporates vision transformers into the structure of UNet that are able to model global context.23 Prior work has used variants of UNet and TransUNet for catheter segmentation, but TAG-DL and nnUNet have not been explored for this task. A visual representation of the implementation of TAG-DL for catheter segmentation is shown in figure 1A.
Five different input schemes were tested to assess the impact of the DSA mask frame on the performance of the models on unseen cases: (1) current frame only; (2) current frame with a non-contrast mask frame as second channel; (3) current frame with a non-contrast mask frame subtracted prior to input as one channel; (4) current frames with a contrast mask frame as the second channel; and (5) current frame with a contrast mask frame subtracted prior to input. An example of the contrast and non-contrast mask frames, along with the result of subtraction with the current frame, is shown in figure 1B. Prior to input, all frames and masks were resized to (512, 512) and intensities were scaled from 0 to 1.
Deep learning model training
For training and validation we used 37 sequences (2860 frames) from 26 patients. For UNet, TransUNet, and TAG-DL, 10 sequences (600 frames) from this set were used for model validation. In the nnUNet pipeline, 20% of the training data is automatically selected for validation.24 All sequences in the training and validation are frontal view, and there are no lateral view sequences. The validation data do not directly train the network; however, in each pass through the training data (or epoch), the model’s performance is measured on the validation dataset. All models were trained on either NVIDIA Titan RTX or NVIDIA RTX A6000 graphical processing units (GPUs). Further details of network hyperparameters are shown in online supplemental table 2.
Supplemental material
Evaluation metrics
To assess the generalizability of the methods to unseen data, the trained model was tested on 17 sequences from 14 unseen patients (971 frames). Eleven sequences are in frontal view (655 frames) and six are in lateral view (316 frames), which the models are not trained on. The models output a continuous value between 0 and 1 for each pixel and a threshold of 0.5 was used to generate a binary segmentation mask from the raw predictions. Each predicted binary segmentation mask was evaluated against the human-annotated label with centerline Dice score, Dice score, and tip-distance error. Equations for the centerline Dice and Dice scores are given in online supplemental figure 1A,B.
The superior quantification of topology by the centerline-Dice compared with the conventional Dice score is shown in figure 1C. The conventional Dice score assumes that all pixels segmented are of equal importance; however, this is not true for catheter segmentation, where sacrificing the edge pixels to preserve a connected centerline structure produces a more acceptable result, but could receive a similar score to a thicker but broken segmentation result that excludes multiple sections of the catheter. The tip-distance error per frame was calculated as the Euclidean distance between the coordinates of the ground truth tip and the predicted tip. The tip coordinate was derived from the predicted catheter segmentation mask through skeletonization followed by detection of all endpoints and selection of the endpoint furthest from the edge of the frame for frontal sequences, and closest to the top edge for lateral sequences.12 Predicted masks were dilated for three iterations prior to skeletonization to remove potential artifactual branches followed by removing all pixels not part of the longest connected skeleton.
Results
Catheters and guidewires were manually segmented on 3831 fluoroscopy frames from 54 distinct fluoroscopy sequences collected prospectively from 40 patients undergoing diagnostic and therapeutic neurointerventions at our hospital. Demographic characteristics of the patients included are shown in table 1. Catheter pixels represented 0.12–1.5% of the total pixels in each frame. All guidewires in our dataset are 0.035 inch Glidewire (Terumo, GR3508). Diagnostic catheters represented in our dataset are 5 Fr Angled Taper Glidecath (Terumo, CG508), 5 Fr Beacon Tip DAV (Cook Medical, G08699), 5 Fr Simmons/Sidewinder 2 (Terumo CG511), and 5 Fr Beacon Tip Sim2 (Cook Medical, G08422). These devices have been used in both conventional and robotic cerebral angiography, although current robotic systems are limited to a 0.018 inch guidewire.26 All fluoroscopy sequences are derived from videos taken at 10 frames/s.
Catheter segmentation and tip position tracking
In the set of unseen patients in frontal view sequences, the best overall performing network, nnUNet, achieved a mean centerline-Dice score of 0.98±0.01 on fluoroscopy sequences (n=11 patients) with a median tip-distance error of 0.43 (IQR 0.88) mm (n=655 frames). This was better than the TAG-DL method, which achieved a mean centerline-Dice score of 0.95±0.05 (p<0.05) and a median tip-distance error of 0.51 (IQR 1.36) mm (p<0.01) on the same dataset.
In the lateral view, TAG-DL achieved the best segmentation performance with a mean centerline-Dice score of 0.92±0.03 on unseen fluoroscopy sequences (n=6 patients). This was not significantly better than nnUNet which achieved a mean centerline-Dice score of 0.91±0.02 (p=0.52). In the lateral view, nnUNet achieved the best median tip-distance error of 0.38 (IQR 1.88) mm (n=316 frames), which was not significantly better than TAG-DL which achieved a median tip-distance error of 0.59 (IQR 3.13) mm (p=0.97). A summary of comparative model performance is shown in table 2. Boxplots of the performance distribution for all models and frame-by-frame performance over time for nnUNet and TAG-DL are shown in online supplemental figures 2–7.
A notable finding is that all networks except the plain UNet gained a substantial performance boost from inclusion of a DSA roadmapping mask in the input compared with the current frame alone. This was true for both unseen frontal and lateral view sequences, with the magnitude of improvement being greater for the unseen lateral view sequences (p<0.001). Additionally, the pure convolutional neural networks, TAG-DL and nnUNet, outperformed TransUNet (p<0.001). The plain UNet, which has an equivalent structure to TAG-DL but without rotation-reflection equivariant convolutions, was unable to learn to segment the catheter and labeled the entire frame as ‘catheter’ in all scenarios.
A visual representation of model performance on tip tracking in three unseen patients including one lateral view is shown in figure 2. On visual inspection, the most common failure mode was false negative detections breaking the connected catheter structure with occasional false positive detections of contrast and other wire-like structures, which is shown in four unseen patients in online supplemental figure 8. Since not all cases can be displayed in the figure, the full frame-by-frame visualization of nnUNet and TAG-DL on all unseen frontal and lateral test sequences is shown in online supplemental video 1. Additionally, a data efficiency study of TAG-DL is reported in online supplemental table 3.
Supplementary video
Discussion
Our study shows that deep learning (DL) methods can generate highly accurate catheter segmentation results, which can subsequently be used to precisely track the tip position during cerebral angiography. This work is significant as it opens a new paradigm of innovation in neurointervention, enabling automation that was previously hindered by the inability of computers to ‘see’ catheters in real-world fluoroscopy. Automation, even in robotic-assisted procedures, has been shown to make them safer and is critical to advancing the field.27 Our dataset of complete catheter and guidewire annotations is one of the largest reported in the literature and, to our knowledge, the first in neurointervention. The strategies we demonstrated for improving performance leverage intraoperative data and exploit the geometric properties of the catheter, which are translatable to other scenarios for catheter segmentation. Surprisingly, these methods perform well in the lateral view, despite not having been trained on this view, highlighting their generalizability and suggesting invariance to specific anatomy. This is crucial for the translation of AI-augmented robotic navigation, which will require further research and development in animal models.28 Even for human operators, highlighting the catheter and its tip could help reduce eye strain and provide helpful feedback to trainees.
The integration of DSA roadmapping masks into DL models significantly boosted performance on unseen data, likely due to the impact of learning guided by subtraction. This follows intuition, as the subtraction of bony features in routine DSA roadmapping accentuates the catheter for human operators. In the context of DL, this reduces the space of features that could confound the learning process. Additionally, this provides some temporal information, which has been shown to be helpful in prior work, without propagating error over time, allowing more reliable performance that can recover from temporary lapses.12 13 Furthermore, models without this information performed significantly worse in the unseen lateral view, suggesting that subtraction can be leveraged by these networks for strong out-of-domain performance that is agnostic to specific anatomy. Notably, the contrast DSA mask, which highlights arterial anatomy, did not improve performance relative to the non-contrast mask. Since the catheter is located intra-arterially, we expected that the contrast-filled DSA mask would address the inherent lack of attention in convolutional neural networks, which can be a major limitation in segmentation, especially due to the sparsity of the catheter (0.12–1.5%) compared with the background. However, the lack of benefit may reflect the variations in contrast injection and final roadmap quality. Nevertheless, the lack of reliance on contrast is preferable due to both variability and potential toxicity.
The TAG-DL method, which combines rotation-reflection equivariance and topology-aware centerline-Dice loss, outperformed its corresponding plain UNet by a large margin at this dataset size by leveraging inherent symmetries in catheter movement. Catheters can rotate arbitrarily and enter the frame from different directions, which DL models require many parameters to learn, while humans can perceive the catheter equally despite its orientation.20 Incorporating this property into neural networks is part of the larger paradigm of geometric DL, which allows for better generalizable learning on smaller datasets.15 However, the base nnUNet performed best overall, suggesting that automated optimization of the UNet structure and hyperparameters can achieve similar performance. TransUNet outperformed the plain UNet, demonstrating some benefit of modeling long-range dependency introduced by the transformer architecture.14 23 However, it performed significantly worse than TAG-DL and nnUNet in this study. This is likely due to the lack of inductive bias in transformers, which require significantly more data to meet or exceed the performance of convolutional neural networks.14
Incorporating the topology-aware centerline Dice score both improved and provided a more meaningful assessment of segmentation outcome. The centerline Dice and its corresponding loss, which preferentially rewards topologically accurate segmentations, better aligns with the goals of catheter segmentation than the conventional Dice score which is used in prior literature.16 This minimizes post-processing repair with more error-prone morphological operations. Nevertheless, even the centerline-Dice has limitations in this setting. It assumes that all regions of the centerline are equally important while, in practice, the tip is more relevant than the rest of the catheter body. Future work that incorporates tip-distance error into the loss function could reward accurate tip segmentation and further improve tip-tracking accuracy.
There are limitations to this work. Although our dataset is the largest reported for this application, it is small by deep learning standards for non-medical tasks, with state-of-the-art commercial and foundational computer vision models being trained on millions of images.14 29 Additionally, our dataset is limited to a single center and fluoroscopy vendor, and exclusively reflects diagnostic cerebral angiography. The devices used in this setting are fully radio-opaque and thus able to be manually annotated. While this provides a consistent space to test the feasibility of DL in this application, further work is needed to improve generalization to additional devices, especially therapeutic devices that may be partially radiolucent. Regardless, diagnostic angiography comprises the largest share of neurointerventional volume and automation in this space would have a significant impact.30 Furthermore, the majority of navigation and selective catheterization is done with the devices included in our dataset, while larger therapeutic devices are typically passed over the guidewire once it has reached its target destination.30
Other failure modes of our methods included false-positive detections of contrast and other wire-like structures appearing near the catheter and failure to detect large sections of catheter body. Adding more data, especially cases with many extraneous implants and lines, would incrementally improve the ability of the model to filter out non-catheter devices. This would also support more sophisticated post-processing methods, which could potentially remove artifacts and join the broken catheter pieces into a single structure through spline fitting. Nevertheless, the excellent performance of our DL methods in this limited application justifies larger scale development that integrates additional devices, scenarios, and model organisms, such as pigs, which have been used in prior work testing robotic neurointervention.28 Pre-training on external data would likely improve data efficiency and generalizability to additional scenarios. Future extensions of this work to other scenarios could test this hypothesis by fine-tuning our model with a relatively small amount of new data.
Conclusions
The successful development in this study of DL methods for catheter segmentation and tip position tracking in cerebral angiography has significant clinical relevance. By enabling computers to ‘see’ catheters and track their position, this work is the first step towards a new paradigm of automation in neurointervention that was previously out of reach. Future AI-augmented systems empowered with computer vision could ‘watch’ expert neurointerventional physicians and learn to autonomously navigate, analogous to how self-driving cars learn from expert drivers. This could make neurointerventions less labor-intensive and more efficient, reducing stress on neurointerventional physicians and improving patient outcomes.
The dataset used for the study is one of the largest complete catheter and guidewire annotations in the literature and, to our knowledge, the first in neurointervention. Additionally, the study demonstrated the strategies for improving performance that leverage intraoperative data and exploit the catheter’s geometric properties, which are translatable to other scenarios. The generalizability of the proposed methods to unseen patients and views suggests that it can be fine-tuned for other settings, such as animal models for preclinical development and evaluation. Ongoing studies evaluating the performance of these methods in various fluoroscopy systems, including those integrated with robotic systems and haptic feedback, are an important step towards realizing the full potential of AI in neurointervention.
Data availability statement
Data are available upon reasonable request. The code and data that support the findings of this study are available from the corresponding author upon reasonable request, with consideration given to the sensitive clinical nature of the data.
Ethics statements
Patient consent for publication
Ethics approval
This study involves imaging data from human participants and was approved by Houston Methodist Office of Research Protections, IRB0607-0094. Participants gave informed consent to participate in the study before taking part.
Acknowledgments
We acknowledge the Houston Methodist Neurointerventional Radiology technologists (Karanathu P Mathew, Daniel A Nugent, Gustavo Valencia, and Juan Rivera) for aiding in fluoroscopy data recording and transfer and Xiaohui Yu for maintaining the data archive.
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Twitter @ghoshrx
Contributors STCW, KW, and RG had full access to all data in the study and take responsibility for the data integrity and accuracy of the analysis. RG, KW, and STCW participated in the concept and design. RG, KW, GWB, and YJZ participated in the acquisition, analysis, or interpretation of data, as well as review of testing data. All authors were involved in drafting and critical revision of the article for important intellectual content. RG performed the statistical analysis. STCW obtained the funding. KW and STCW participated in administrative, technical, or material support. STCW, KW, and GWB acted as supervisors. STCW and KW are guarantors of this work.
Funding This work was supported by the Ting Tsung & Wei Fong Chao Center for BRAIN (STCW), the John S Dunn Research Foundation (STCW).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.