Statistics from Altmetric.com
“Evidence based medicine has contributed to the development of a rigid hierarchy of research design that underestimates the limitations of randomized controlled trials”1
A Randomized Trial of Unruptured Brain Arteriovenous Malformations
In May 2013, the Data Safety Monitoring Board of A Randomized Trial of Unruptured Brain Arteriovenous Malformations (ARUBA) halted enrollment due to excess morbidity in the interventional group. This result was not surprising to the medical community, as prior editorials attest.2 Until complete trial data are available, it is impossible to determine why interventional treatment was deemed inferior to medical management. However, the trial had major inherent limitations that portended failure from the outset.
Brain arteriovenous malformations (AVMs) are congenital lesions, which frequently present in young patients. The goal of treatment is a complete and durable cure, with limited neurological morbidity and mortality, and a better quality of life, for the rest of that patient's life. Unfortunately, the logistics of any trial, particularly a National Institutes of Health funded trial, initially will only accommodate follow-up for a finite period of time—5 years in the case of ARUBA. With this study design, patients allocated to intervention were exposed to the entire upfront risk of the interventional procedure(s) to achieve curative treatment of the lesion. While most patients allocated to intervention presumably underwent definitive treatment and were cured of the AVM during the trial, the benefit of this curative treatment was only realized for the relatively brief period of follow-up. At the same time, patients allocated to the medical management arm, who remain at a lifelong risk for neurological morbidity, were only tracked for a small fraction of their ‘at risk’ period. For this reason, the study design heavily favored medical management. In other words, patients undergoing treatment face the full risk posed by that treatment during the study while those allocated to medical management are exposed only to that fraction of their lifetime risk which is accumulated over the study period. Similarly, the chosen primary endpoint of any symptomatic stroke or death further exaggerates this bias. When faced with the lifetime risk of stroke posed by an AVM, exchanging minor and sometimes transient neurologic morbidity for long term protection against hemorrhage can be a bargain.
Second, unruptured AVMs are perhaps the most heterogeneous and complex of all neurovascular lesions, with highly variable anatomy, natural history and associated treatment risk. Significant heterogeneity inherent within the study group presents a formidable challenge to trial design and execution by creating a tremendous susceptibility to enrollment bias. Specifically, managing physicians might have felt that there was inadequate equipoise to support the enrollment of younger patients with the most treatable AVMs. At the same time, investigators might have been more apt to enroll patients with more complex lesions which were perceived to carry a more ambiguous risk:benefit ratio for treatment. For a trial to be free of such enrollment bias, participating centers would have had to agree to exclusively offer treatment for all potentially eligible patients with unruptured AVMs within the context of the trial. Without this safeguard against enrollment bias, it is impossible to ensure that the cohort studied represented a reasonable sampling of those unruptured AVMs which met the eligibility criteria. Until the percentage of treated AVM patients who were enrolled in the study at the participating centers is known, it will be difficult to ignore this possible source of bias.
This same heterogeneity within the disease process means that the ARUBA results lack generalizability. It would be an error to apply the ARUBA results to all unruptured AVMs. The variation and anatomical complexity of AVMs makes comparison across the cohorts extremely challenging and limits the application of the ARUBA results to those AVMs that are most frequently treated (grades 1 and 2). Furthermore, subcategorization of AVMs is challenging, and once subcategorized for analysis, the small numbers in each group reduces the signal to noise ratio in the data. The original study was not powered to evaluate differences between these subgroups, and since enrollment was halted early (after only 223 patients), it is unlikely that any meaningful subset analyses will be possible.
Finally, the difficulty in interpreting the ARUBA results is further amplified by the treatment strategies employed in the interventional cohort—embolization, microsurgery, and radiosurgery, either alone or in some combination. These treatment approaches were not standardized within ARUBA to any extent, and were exclusively left to the discretion of the participating physicians. Moreover, there was no credentialing process to ensure that participating physicians were using standardized, generally accepted algorithms. This variance in treatment strategies is probably most important with respect to the application of embolization.
The majority of neurointerventionists view AVM embolization as an adjunct to surgical resection, with the goal of embolization being to reduce the complexity and risk associated with surgical resection.3 This is also reflected in the Food and Drug Administration cleared indications for use for both Onyx (Covidien/eV3, Irvine, California, USA) and n-butyl cyanoacrylate (n-BCA, Codman Neurovascular, Raynham, Massachusetts, USA) which specify that the agents are for the presurgical embolization (or devascularization) of brain AVMs. The best care of patients with AVM, as for other diseases, requires careful and collaborative evaluation and treatment. Embolization should not be used as a blunt instrument; it is one of several tools available for the treatment of AVMs and should be used judiciously and carefully. Some neurointerventionists have recently advocated a newer, albeit controversial, approach of aggressive embolization as a primary prospective treatment strategy with the goal of achieving a complete angiographic obliteration of the AVM.4 ,5 While preoperative embolization has been validated in controlled clinical trials, the strategy of standalone embolization (particularly for larger AVMs) has not.6 ,7 Moreover, longitudinal data with respect to the natural history of AVMs after ‘curative embolization’ are lacking.3 These two embolization strategies are likely associated with drastically different complication rates.
One could easily see how the participation of investigators that routinely employ such an unproven and highly controversial treatment strategy could have heavily influenced the data set and made the final results inapplicable to standard routine clinical practice—that is, presurgical embolization followed by resection. While standardization of AVM treatment is certainly challenging, a complete lack of any prospectively standardized approach (and without active surveillance of the operators) seems irresponsible.
Despite these flaws, the results of ARUBA are likely to be erroneously generalized by some to the treatment of all unruptured AVMs.
Randomized clinical trials in neurovascular intervention
Well designed, well executed randomized clinical trials (RCTs) clearly yield the highest quality of scientific evidence. The premise that such data are critical to the advancement of the neurovascular field remains unassailable. However, just as well designed and executed RCTs will nurture and sustain this nascent field, suboptimally designed and executed RCTs could dismantle it.
RCTs are particularly challenging in the neurovascular field because the disease processes have low prevalence and are highly heterogeneous. By contrast, other disease based fields of study (eg, atherosclerotic coronary artery disease) have an exponentially greater prevalence and a much more homogeneous population to study.
Clinical trialists frequently view one, or even a series of, failed or negative RCTs as an inconsequential and necessary stepping stone toward the next RCT. However, while arguably valid in device or drug trials in high prevalence disease processes, in the neurovascular field, things are very different. There are fewer patients; trials take much longer to complete; and the relative expenses of the RCTs in comparison with the overall size of the population at risk are high. Moreover, there are a scarcity of RCT data supporting neurointervention. This creates a scenario in which the magnitude and duration of the impact of any negative RCT is amplified. A negative clinical trial can lead to an abrupt inability to offer neurovascular therapies to patients, thereby arresting further iterative advancement of existing technology and obviating the infrastructure development necessary for further RCT performance. A procession of negative RCTs could certainly result in a rapid and marked contraction of the discipline. If this failure were the result of solid clinical science, then this type of a contraction would be warranted. However, if denial of treatment is based on RCTs with significant flaws in design and/or execution, then it would lower the standard of care and harm patients.
Moving forward, it is essential to analyze what has evolved over the past year and determine how we can better move forward. This starts with an assessment of why various RCTs have failed, and how this might have been predicted and avoided. Several basic themes are evident when looking back at ARUBA and other recent neurovascular trials:
Practical limitations with respect to funding or structure prohibit adequate longitudinal follow-up. Many of the disease processes which we treat electively (eg, unruptured aneurysms, AVMs, asymptomatic carotid stenosis) place patients at a continuous and cumulative lifelong risk for potentially catastrophic neurological morbidity and mortality. As such, any trials comparing interventional therapy with conservative or medical management must be designed with a duration of clinical follow-up that is adequate to reflect the cumulative risk associated with the natural history of the disease. Failure to do this will create an inherent bias against curative treatments.
Practical limitations with respect to enrollment, funding or structure limit sample size and may result effect size overestimation: In some cases, investigators may be tempted to overestimate the potential effect size for an intervention so that the sample size can be reduced. This is particularly the case in neurointervention, as, given the low disease prevalence, there is a legitimate concern that enrollment will take excessively long and thereby render any conclusions irrelevant to modern care once the study is completed. Overestimating effect size can represents a hazardous maneuver that can cause an effective intervention to fail to meet pre-determined criteria for success. A potential example of this phenomenon may be the ongoing mechanical thrombectomy trial, ESCAPE (Endovascular Treatment for Small Core and Proximal Occlusion Ischemic Stroke). While the ESCAPE investigators are to be congratulated on their ability to establish an RCT for such important topic, and we acknowledge there are a number of excellent facets to the trial's design, we have some concern regarding their statistical design and estimated effect size. The investigators have designed their trial based upon an ambitious 20% absolute benefit of interventional therapy. While this premise limits the required patient enrollment to approximately 250 patients, it represents a tremendous potential hazard because a clinically effective procedure could easily fail due to the very high bar set for success. This seems particularly concerning given the results of several other recent stroke intervention RCTs, which have collectively created a very real challenge to existing care networks established to provide acute stroke thrombectomy. We would caution that instead of defaulting to high thresholds for success (and as a result lowering enrollment targets), the neurointerventional community should instead identify the appropriate minimal reasonable clinical benefit as the effect size and then power each study accordingly. If ESCAPE eventually demonstrates an absolute improvement in dichotomized outcome of 16%, most physicians would believe that such a result should support IA intervention, however because of the excessively high estimated effect size and resultant lower n, the trial would have failed to reach significance and would be interpreted as a failure. This would be a disappointing result from what is otherwise a well designed and conceived trial.
Slow execution and enrollment makes a trial obsolete before or during enrollment. It is a daunting and time consuming effort to design, secure funding for, and then to promptly execute an RCT. In a rapidly evolving field like neuroendovascular therapeutics, this is a particular problem and the comparatively small number of eligible patients further exacerbates the situation. This small number of patients is exacerbated in RCTs as RCTs typically have stringent inclusion and exclusion criteria, often reducing enrollment to a trickle. Unfortunately, during the long period of time required for this undertaking, the standard of care can evolve dramatically, to the point where the original trial design becomes obsolete. For example in the Interventional Management of Stroke III (IMS-III) Trial, at the time of trial design, CT angiography was not used at many institutions for the evaluation of patients with suspected large vessel occlusion. When IMS-III started enrollment, CT angiography was standard of care at most active interventional stroke treatment centers. Moreover, during more than 8 years required to complete the trial, the available interventional devices evolved considerably, and the superior efficacy and safety of the newer devices and treatment strategies were demonstrated in concurrent RCTs. These protocol issues were recognized early by many interventionists practicing at high volume institutions, and dampened enthusiasm for participation, further retarding the trial's progress. By the time the trial was halted for futility, the neuroimaging evaluation, devices, and treatment strategies used for the vast majority of the enrolled cases were no longer relevant to clinical practice and had not been for several years.8 Despite this, the IMS trial results continue to be inappropriately generalized to contemporary practice and have had a direct impact on the availability of these procedures for patients.9
Suboptimal trial design does not reflect ongoing clinical practice. The initial RCTs must be structured to validate the most fundamental principles for the interventional therapy rather than to investigate alternative exploratory treatment paradigms. For example, in patients with early acute stroke secondary to large vessel occlusion, a comparison of standard of care therapy (intravenous tissue plasminogen activator (IV tPA)) to standard of care in addition to interventional therapy (thrombectomy plus IV tPA) represents an important and relevant question. Only after this concept is validated would it make sense to evaluate more aggressive early acute stroke protocols. In some cases, investigators have sought to evaluate hypothetical treatment algorithms, which are not at all reflective of typical clinical practices. Unfortunately, this type of exploratory design does not prevent the eventual results from being extrapolated as a surrogate for current practice. This was the case in the SYNTHESIS-Expansion trial in which investigators chose to compare intra-arterial stroke intervention alone with IV tPA in IV tPA eligible patients without any verification of a large vessel occlusion on imaging or any minimum National Institutes of Health Stroke Scale requirement. This trial predictably demonstrated no difference between the two treatment arms.10 While anyone with experience in the neurovascular field would quickly acknowledge that this exploratory protocol is completely foreign to any standard treatment strategy, the results are often not presented or acknowledged as such. In fact, these data were widely presented as ‘yet another negative intra-arterial stroke trial’, adding to the existing impression that interventional stroke therapies, as they are currently being practiced, are ineffective.
Poor operator selection and contemporaneous quality control results in poor performance. While it is critical that an RCT treatment paradigm reflects current practice, when evaluating the efficacy of any intervention it is equally critical to control the quality of proceduralists. It has been argued that this works against the generalizability of the RCT results, however the purpose of early RCTs is to determine the potential efficacy of a treatment, not to assure that all centers across the country can perform the treatment paradigm at a competent level. This paradigm has been applied repeatedly in many successful cerebrovascular trials (eg, NASCET (North American Symptomatic Carotid Endarterectomy Trial), CREST (Carotid Revascularization Endarterectomy versus Stenting Trial)). This applies to negative interventional trials as well. Many criticized SAMMPRIS’ (Stenting versus Aggressive Medical Management for Preventing Recurrent Stroke in Intracranial Stenosis) rigorous medical arm as not being applicable to general practice but in reality it set the appropriate bar for the future of medical therapy for intracranial atherosclerosis, and it is up to us to raise our standards to the trial's level.11 Therefore, as these examples demonstrate, it is an absolute requirement that RCTs ensure that those with appropriate expertise perform the evaluated treatment. This is best done by establishing a formal credentialing committee and contemporaneously monitoring procedural conduct and outcomes throughout the study period.11 Even with these precautions, quality control is challenging. In the absence of adequate quality control measures, it is possible that procedural quality within the trial is poor enough to call the overall result into question. For example, the MR RESCUE (Mechanical Retrieval and Recanalization of Stroke Clots Using Embolectomy) trial was designed to assess the efficacy of physiological imaging to select ischemic stroke patients for interventional therapies. Operators in this trial were able to achieve successful revascularization of the large vessel occlusions in only 27% of cases, a level of performance far substandard to other concurrent core laboratory adjudicated clinical trials of stroke intervention.12–14 As such, the primary conclusion of the study—that penumbral imaging is not efficacious to select patients for stroke intervention—is invalidated by the dismal success rates of the interventions performed within the trial.
To move forward successfully, we need to proceed with a strategy of carefully designed and well executed RCTs to establish a ‘beachhead position’ for beneficial neurointerventional therapies as well as to appropriately identify treatments which are ineffective.
Initial trials must be structured to validate those core interventions which are most standardized and which we most strongly believe to be beneficial, but for which RCT data are yet unavailable. We should not design trials which incorporate non-standardized interventional procedures if the core interventional strategies have yet to be validated.
Trials must be adequately powered to reflect pragmatic and clinically relevant effect sizes with interim analyses built in (if required) to avoid unnecessary over enrollment.
Trials must have reasonable and clinically relevant primary endpoints selected to have adequate sensitivity to demonstrate the potential benefit of a therapy (eg, modified Rankin Scale shift analysis vs binary modified Rankin Scale outcomes).
Trials must assess relevant longitudinal outcomes that realistically reflect the true ‘at risk’ period for the disease process.
Trials must carefully stratify and enroll a study population that has the index disease process and is as homogeneous as possible.
Neurointerventionists recognize the critical role of evidence based medicine. The blinded RCT represents the pinnacle of the evidentiary summit. In recent years, neurointerventionist specialists have seen a host of negative RCTs published in some of the world's most powerful journals. These include INVEST (International Verapamil SR Trandolapril Study), Vertebroplasty for Painful Osteoporotic Fractures, SYNTHESIS Expansion, MR Rescue, and IMS-3.8 ,10 ,12 ,15 ,16 Once published, post hoc criticisms of incorrect observations have relatively little effect on the of these studies and the subsequent interpretation of the medical community, not to mention the lay public and media. It is far better to participate prospectively in designing appropriate randomized trials so that in the end, interventionalists can have confidence in the results. As these recent challenges indicate, RCTs can be a double edged sword, and enrollment for the sake of enrollment with a blind allegiance to the concept of the ‘sanctity’ of the RCT, any RCT, is hazardous. Neurointerventionists, at the individual and society levels, have a responsibility to review and thoroughly understand RCT protocols prior to enrolling patients.
Contributors All authors contributed sufficiently for inclusion in the manuscript.
Competing interests None.
Provenance and peer review Commissioned; not externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.