Background Endovascular thrombectomy (ET) is the standard of care for treatment of acute ischemic stroke (AIS) secondary to large vessel occlusion. The elderly population has been under-represented in clinical trials on ET, and recent studies have reported higher morbidity and mortality in elderly patients than in their younger counterparts.
Objective To use machine learning algorithms to develop a clinical decision support tool that can be used to select elderly patients for ET.
Methods We used a retrospectively identified cohort of 110 patients undergoing ET for AIS at our institution to train a regression tree model that can predict 90-day modified Rankin Scale (mRS) scores. The identified algorithm, termed SPOT, was compared with other decision trees and regression models, and then validated using a prospective cohort of 36 patients.
Results When predicting rates of functional independence at 90 days, SPOT showed a sensitivity of 89.36% and a specificity of 89.66% with an area under the receiver operating characteristic curve of 0.952. Performance of SPOT was significantly better than results obtained using National Institutes of Health Stroke Scale score, Alberta Stroke Program Early CT score, or patients’ baseline deficits. The negative predictive value for SPOT was >95%, and in patients who were SPOT-negative, we observed higher rates of symptomatic intracerebral hemorrhage after thrombectomy. With mRS scores prediction, the mean absolute error for SPOT was 0.82.
Conclusions SPOT is designed to aid clinical decision of whether to undergo ET in elderly patients. Our data show that SPOT is a useful tool to determine which patients to exclude from ET, and has been implemented in an online calculator for public use.
Statistics from Altmetric.com
Following the success of several randomized controlled trials (RCTs) on endovascular thrombectomy (ET) for acute ischemic stroke (AIS), ET has become a standard of care for management of patients with a stroke.1–5 However, owing to the large effect size of ET with a number needed to treat near 2.6,6 it has been suggested that trial criteria could have excluded a wider population of patients who might benefit from ET. Multiple centers nationally and internationally have expanded the indication of ET to include patients with a low National Institutes of Health Stroke Scale (NIHSS) score, with more distal occlusions, with lower Alberta Stroke Program Early CT (ASPECT) scores, elderly age, or patients with posterior circulation strokes.7–10
Recently, the efficacy and safety of ET in an elderly population—that is, patients aged ≥80 years, have been investigated, and results have shown a lower efficacy of ET in this population7 9 11–14; however, a potential benefit compared with controls was largely dependent on the selection criteria, with the DAWN trial being one example.15 Therefore, there has been a need to define optimal selection criteria for patients in this population in the absence of data from RCTs and in view of the high likelihood of benefit for a subset of patients. Selection criteria for ET in the real-world practice have been variable and follow institution-specific guidelines.
With the accumulation of large cohorts of retrospective data, we investigated a new approach—namely, using machine learning to predict outcomes of elderly patients undergoing ET and to provide a preintervention prognosis that can be used to guide the decision of whether to intervene or manage medically in the presence of AIS. We used a retrospective cohort of patients with AIS presenting to a high-volume stroke center to train and develop an optimal prognostic model. The mode, called SPOT (Stroke Prognosis in Octogenarians undergoing Thrombectomy) was then validated using an independent prospective cohort of patients presenting to the same institution.
Materials and methods
We retrospectively reviewed a prospectively maintained database of patients with AIS to select elderly patients (defined in this study as ≥80 years of age) who presented with AIS of the anterior or posterior circulation and underwent thrombectomy using a direct aspiration first pass technique (ADAPT) thrombectomy between January 2013 and September 2017. This set defines the training and validation set. We also prospectively followed up elderly (≥80 years old) patients who underwent ADAPT thrombectomy for anterior and posterior AIS between October 2017 and April 2018. Prospective data were used for external validation. The study was approved by the institutional review board at the (Medical University of South Carolina).
Patients were selected for ADAPT thrombectomy of the anterior circulation if CT perfusion imaging showed a mismatch (presumed penumbra) between relative cerebral blood volume and blood flow that correlated with their presenting NIHSS score. Patients with a posterior circulation stroke underwent ADAPT thrombectomy if they presented within 10 hours of onset or if MR diffusion-weighted imaging (DWI) demonstrated restricted diffusion (infarction) involving less than half of the brain stem on any axial slice. Patients were treated with ADAPT thrombectomy regardless of whether intravenous tissue-type plasminogen activator (IV tPA) was administered on admission. A detailed description of ADAPT thrombectomy has been previously recorded.16 17
Patient demographics, comorbidities, and pre-stroke functional status were collected by chart review and included age, gender, race, pre-stroke modified Rankin Scale (mRS) score, and the presence of diabetes, hyperlipidemia, hypertension, or atrial fibrillation. Both the NIHSS score, and IV tPA administration information were collected from admission notes, and admission CT scans were reviewed by a board certified neuroradiologist (ARC) for ASPECT scores. Outcomes at 90 days were defined by the mRS scores. For both the retrospective and prospective cohorts, 90-day mRS scores were collected by a stroke neurologist during a routine visit at 90 days (±14 days) after the stroke. Phone calls were used to contact nursing homes to obtain mRS scores or confirm mortality in patients who did not attend their 90-day visit. Only patients with documented mRS scores at 90 days were included in the study.
It is important to note that all available information is used in the model building process without filtering out features based on experienced knowledge. In practice, such a feature engineering task is needed when the number of features is relatively large compared with that of the examples used in training. For prediction of mRS scores, the number of features (12) is small compared with the number of patients used in the training process (110). With this setup, any irrelevant feature included will be ignored by the model during training. In other words, the noise introduced by the irrelevant features will average out in the presence of a large sample size.
Data from each variable were first normalized before use in model training and validation. Dichotomized variables, including gender (female, not female), race (white, not white), diabetes (yes, no), hypertension (yes, no), hyperlipidemia (yes, no), atrial fibrillation (yes, no), and use of IV tPA (yes, no), were normalized to ‘0’ or ‘1’ values. To normalize onset-to-groin time, any value >1440 min was coded as 1, and the range (0, 1440 min) was scaled down to (0.0, 1.0] as fractions of 1440. NIHSS scores, baseline mRS scores and ASPECT scores were normalized so that each variable was presented as (0.0, 1.0) by dividing by the maximum score for each variable (44 for NIHSS, six for mRS, and 10 for ASPECT).
Model selection and training
The decision-making process in SPOT comprises two main steps: (i) predicting the individual mRS score (0–6) based on patient information input by the user and (ii) judging the binary output (good [mRS score 0–2] or poor [mRS score 3–6]) based on the predicted mRS score. The latter step can be easily achieved by comparing the predicted mRS score with a threshold, but the mRS prediction step requires a statistical regression model to be built through a supervised learning scheme. A subset of identified elderly patients reviewed retrospectively constitutes a training dataset that is used to build the predictive model. An independent subset of prospectively reviewed patients serves as the validation set.
Various models can be used to build a regression predictor using a set of training data. However, the choice of model for a specific task is highly dependent on the characteristics of the variables. For the mRS prediction task, the variables are of two types: continuous (eg, NIHSS score) and categorical (eg, gender). The continuous variables are typical in most regression problems and linear regression can deal successfully with such variables in capturing the correlations between them and the desired output. On the other hand, categorical variables are more challenging and require special handling because they can reflect different behavior based on categorical labeling. To elaborate on this, we consider a situation in which the correlation between mRS and NIHSS is different for patients from two categories (eg, [male] and [female]). Technically, a linear regression model is unable to learn such a model because it can learn only one correlation metric for NIHSS score.
Conventionally, decision trees are used to handle categorical variables, especially when the output to be predicted can take one out of a fixed set of values (eg, in the case of binary classification with output [0, 1]). For mRS prediction, standard decision trees cannot be used because mRS values can take any number between 0 and 6. Hence, the optimal choice is to use a regression decision tree, specifically M5P, which combines the best of both worlds: linear regression and decision tree. In M5P, categorical variables appear typically in internal nodes where branches from these nodes define a specific category. Each leaf node of these branches houses a linear function that captures the correlation between the variables (mostly the continuous variables) and the mRS score. Returning to the NIHSS correlation stated above, an M5P tree can model such scenario using a single internal node whose branching decision is based on the gender, and two leaf nodes each containing a linear model with different correlation coefficients for NIHSS.
Detailed description of the M5P model
Regression trees model a non-linear continuous function by a continuous piecewise linear function. Each leaf node in the decision tree houses a linear function that predicts the value of the function. Each internal node is governed by a predicate that makes a branching decision to select a node from the lower level. The tree inducer starts by splitting the instances into nodes such that the variation in the class value is minimal down each branch. The intuition is that a minimal variation allows a better constant or linear representation.18 The second phase of the algorithm prunes the tree via backtracking from each leaf. This leads to replacing the constant values at the leaf nodes with planar (linear) values at the replacement nodes that were originally interior nodes. The predicates that govern splitting into the pruned leaf nodes decide the coefficients and the intercept values at the replacement node. Finally, smoothing is applied to compensate for sharp discontinuities that may occur between adjacent linear models; when these models house a small subset of training instances. The splitting criterion for the M5 algorithm is based on the variable that maximizes the expected error reduction. The SD reduction (SDR) is given by the formula:
where n is the number of variables, is the variable in question, are the sets of training instances resulting from splitting the node according to variable i, and sd denotes standard deviation. Splitting terminates when a bound of instances remains in the node, or when all the instances that reach a node vary very slightly.
Pruning underestimates the error for non-training instances as the average of the absolute differences between the predicted and the actual value of the error for the instances matching each node in the tree. It then compensates the average with a factor where is the set of instances matching a node, and μ is the number of variables referenced in the model to decide the class value at the node.
The M5P algorithm prunes the tree by computing a linear regression model for the interior node using only the variables that are tested in the subtree below the node. Estimated error reduction happens by dropping terms one by one to reduce the multiplication factor.
Finally, M5P performs a smoothing process to compensate for the sharp discontinuities that may occur, especially for pruned leaves with small numbers of matching instances. The model combines the value at each node with backtracking it to the root one node at a time. The compensation is given by , where p' is the smoothed prediction at the upper node, p is the prediction passed from below, q is the prediction at the upper node, n is the number of training instances matching the node below and k is a constant parameter with default value 15.
Comparison with decision trees and regression models
We used a panel of decision trees and regression analysis to assess how predictions of the SPOT M5P model compare with these algorithms. We used the area under the receiver operating characteristics (ROC) curve predicting good functional outcome (mRS score 0–2) to compare models. The output was used as binary (mRS 0–2 vs mRS 3–6) for models that fit only binary outputs (such as logistic regression) or numeric (0, 1, 2, 3, 4, 5, 6) for models that accept non-binary outputs. In either case, ROC curves for good outcome (mRS score 0–2) were computed after the prediction and comparisons made accordingly.
Retrospective evaluation of SPOT
Parameter optimization for the M5P model described above was performed using retrospective data from 110 elderly patients who underwent ADAPT thrombectomy. Since the predictions of SPOT are for individual mRS scores, a threshold scheme was used, obtained based on the ROC curve, to define the final binary output according to the predictions of the M5P model. The resulting scheme was then referred to as SPOT. The sensitivity, specificity, and ROC curve area under the curve (AUC) for SPOT were then computed and compared with other models as described above, and with individual variables such as the NIHSS, baseline mRS, and ASPECT scores.
Prospective clinical validation
SPOT was then evaluated using a prospective cohort of 36 elderly patients. Patients' outcomes were predicted on admission using SPOT, while undergoing thrombectomy based on the current selection criteria used at our institution. Providers were not made aware of the SPOT prediction before the procedure. Patients outcomes were evaluated as described above at the 90-day visit to assess the validity of the prediction.
Computational analyses were performed as described above. Statistical analyses were performed using Graphpad Prism 7. ROC curves were used to compare the different predictive scores with SPOT. A χ2analysis was used to compare proportions. Comparison of the AUC between different ROC curves was performed using the approach described by DeLong et al.19
Following prospective validation, SPOT was implemented in an online tool that allows for real-time rapid prediction of expected outcomes of octogenarians undergoing thrombectomy.20
Description of patient population
In the retrospective training cohort, 110 patients met the criteria for undergoing ADAPT thrombectomy and were aged ≥80 years at the time of the procedure. The mean age for this group was 85±4 years, and 65% were female. In the prospective validation cohort, a total of 36 patients were included with a mean age 86±4, and 54% were female. Baseline characteristics of both groups are shown in table 1.
Evaluation of predictive model
After training the M5P regression model using retrospective patient data, the mean absolute error was 0.83 when computed using a 10-fold cross-validation scheme predicting the mRS score (table 2). The trained M5P model was eventually combined with a thresholding scheme to form a binary classifier, which is referred to as the SPOT model. The ROC curve of SPOT for predicting 90-day good outcome (mRS score 0–2) was compared with predictions based on multiple decision trees and regression models and with admission NIHSS, baseline mRS, or ASPECT scores, independently (table 2). SPOT had the highest AUC and better sensitivity and specificity for predicting good mRS scores and the lowest mean absolute error for mRS scores. Using the optimal cut-off value for SPOT as a score below 3.15 for good outcome, SPOT showed a sensitivity of 89.36% and a specificity of 89.66% (AUC=0.925; p<0.0001, table 2). Using non-parametric ROC comparison,19 SPOT had a higher AUC than decision trees and linear and logistic regression models (table 2). We therefore refer to patients predicted by SPOT to have good or poor outcomes as SPOT-positive or SPOT-negative, respectively.
Retrospective evaluation of SPOT
We applied SPOT to the retrospective training cohort of elderly patients who underwent thrombectomy between January 2013 and July 2017 to assess the utility of SPOT for selecting the most appropriate therapeutic pathway using a 10-fold cross-validation approach. The standard-of-care selection criteria used at our institution (described in methods) resulted in 20% of patients having good outcomes (mRS 0–2) at 90 days (figure 1). However, when SPOT was retrospectively applied, the rate of good outcome was significantly higher at 72.4% among SPOT-positive patients compared with the full cohort (p<0.001). Among SPOT-negative patients, only 2.5% would have had a good outcome (figure 1). Among the 110 patients evaluated, 45 patients developed postprocedural hemorrhage, and eight patients developed parenchymal hematoma 2 (PH2)-type symptomatic intracerebral hemorrhage after the procedure. Around 85% (n=38) of patients who developed postprocedural hemorrhage, and 100% (n=8) of patients who developed PH2-type symptomatic intracerebral hemorrhage were predicted by SPOT not to achieve a good outcome.
Prospective evaluation of SPOT
We then assessed the utility of SPOT as a selection tool for elderly patients with CT perfusion mismatch in a prospective cohort of patients who underwent thrombectomy between August 2017 and November 2017. Of all elderly patients who underwent the procedure irrespective of SPOT prediction, the rate of good outcomes at 90 days in our prospective sample was 36.2% (figure 2). For SPOT-positive patients, the rate of good outcome was 60%, compared with a 4.8% rate of good outcome (only one misclassified subject) in patients who were SPOT-negative (figure 2). Using these prospective data, the negative predictive value of SPOT was 95.2%.
Implementation of SPOT
Following our prospective and retrospective validation, SPOT was launched as a publicly available online tool.20 In elderly patients presenting with AIS being considered for ET, SPOT provides rapid prediction of patient outcome and can now be easily used by investigators for multicenter validation.
Patient selection criteria for ET after acute ischemic stroke (AIS) are evolving, and the significantly high effect size seen in RCTs warrants expanding the indications for ET beyond existing trial criteria. Designing new RCTs that alter the selection criteria used in previous highly successful RCTs has ethical and practical constraints. Experience from multiple centers with selection criteria for ET that are broader than those in the recent RCTs provides surrogate measures of efficacy in select understudied patient populations, such as the elderly. We used an approach that benefits from machine learning algorithms to optimize patient selection for ET in the elderly, to enhance clinical outcomes and reduce the rate of unnecessary and potentially risky interventions. The SPOT prognostic tool was developed based on retrospective data, and we show that it has significantly high sensitivity and specificity for predicting functional independence at 90 days after stroke in patients aged ≥80 years. The performance of SPOT was significantly better than using common regression models or decision trees.
The variable degree of reported single and multicenter trial efficacy of ET in the elderly can probably be attributed to differences in patient selection criteria (discussed by Alawieh et al9). However, these studies have consistently shown that after ET, the elderly have lower rates of functional independence, higher morbidity and mortality, and a higher risk of hemorrhage, irrespective of their baseline level of functioning before stroke, than younger patients.9 The use of SPOT may provide guidance in this area of clinical equipoise. With its high negative predictive value (>95%), SPOT might help to exclude patients unlikely to benefit from ET and reduce the higher rate of morbidity and mortality in this patient group. This is especially apparent given the high rate of hemorrhagic complications present in SPOT-negative patients who underwent ET.
In addition to the proposed model in this paper, prior work has been described using predictive models for outcomes after ET using data from RCTs, MR Predicts being one example.21 MR Predicts uses a logistic regression model for prediction of 90-day mRS with a reported ROC AUC of 0.69–0.73.21 Logistic regression models were used in our work; however, a lower discriminative effect for good outcomes was seen in those models compared with the use of our M5P algorithm. M5P models performed better than logistic regression models because both linear and logistic regressions (logistic being in essence a linear regression mapped to a 0, 1 scale using the logit function) can fail to capture the mRS behavior since they cannot handle multiple categorical inputs easily, and cannot provide multiple decision-making processes as in decision trees. Interestingly, the M5P algorithm was initially designed to predict continuous classes rather than discrete classes where objects fall into categories. However, it was still applicable in this situation since our target output (mRS score), although discretely encoded where the lowest score is 0 and highest is 6, can still be viewed as continuous between 0 and 6. The combination of decision trees and linear functions implemented in the M5P algorithm as described in the ’Methods' section provided optimal prediction in our dataset that combines categorical and continuous data.
Considerable progress is currently being achieved in using machine learning for AIS and ET, specifically using computer vision and deep-learning algorithms to extract predictive features in radiological data, such as head CT and CT angiograms. SPOT was deliberately designed without incorporating the processing of radiological studies to avoid being limited by the associated computational expense of convolutional neural networks. SPOT uses clinical information available on admission and a quickly interpreted ASPECT score based on non-contrast CT. Thus it can be used as a readily accessible assessment tool at spoke sites in stroke networks to determine the need to transfer patients to thrombectomy-capable and comprehensive stroke centers.
Based on the high sensitivity of SPOT, the proposed use for this model is to determine when not to offer ET for elderly patients presenting with AIS. It is important to mention that although SPOT was externally validated using prospective data, it was based on data from a single center. Future evaluation of SPOT using multicenter data will further strengthen the utility of SPOT as a clinical decision tool. In addition, although the focus of this study was an elderly population, due to the recent controversy on outcomes in this specific population and the need to optimize patient selection, a similar approach can be used for all thrombectomy patients—a future direction of this work.
Contributors Each author listed above should receive authorship credit based on the material contribution to this article, their revision of this article, and their final approval of this article for submission to this journal.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests AS: Penumbra. consulting, honorarium, speaker bureau; Pulsar Vascular, consulting, honorarium, speaker bureau; Microvention, consulting, honorarium, speaker bureau, research; Stryker, consulting, honorarium, speaker bureau.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.