Article Text

Download PDFPDF

Original research
Evaluation of previously embolized intracranial aneurysms: inter-and intra-rater reliability among neurosurgeons and interventional neuroradiologists
  1. Scott L Zuckerman1,
  2. Nikita Lakomkin2,
  3. Jordan A Magarik1,
  4. Jan Vargas3,
  5. Marcus Stephens4,
  6. Babatunde Akinpelu5,
  7. Alejandro M Spiotta3,
  8. Azam Ahmed6,
  9. Adam S Arthur7,
  10. David Fiorella8,
  11. Ricardo Hanel9,
  12. Joshua A Hirsch10,
  13. Ferdinand K Hui11,
  14. Robert F James12,
  15. David F Kallmes13,
  16. Philip M Meyers14,
  17. David B Niemann4,
  18. Peter Rasmussen15,
  19. Raymond D Turner3,
  20. Babu G Welch16,
  21. J Mocco2
  1. 1 Department of Neurosurgery, Vanderbilt University Medical Center, Nashville, Tennessee, USA
  2. 2 Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, USA
  3. 3 Department of Neurosurgery, Medical University of South Carolina, Charleston, South Carolina, USA
  4. 4 Department of Neurosurgery, University of Arkansas, Little Rock, Arkansas, USA
  5. 5 Department of Radiology, University of Washington, Washington, USA
  6. 6 Department of Neurosurgery, University of Wisconsin, Madison, Wisconsin, USA
  7. 7 Department of Neurosurgery, Semmes-Murphey Clinic, University of Tennessee Health Sciences Center, Memphis, Tennessee, USA
  8. 8 Department of Neurosurgery, Cerebrovascular Center, Stony Brook University Medical Center, Stony Brook, New York, USA
  9. 9 Department of Lyerly Neurosurgery, Baptist Neurological Institute, Jacksonville, Florida, USA
  10. 10 Neurointerventional Service, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
  11. 11 Department of Radiology, Johns Hopkins Hospital, Baltimore, Maryland, USA
  12. 12 Department of Neurosurgery, University of Louisville School of Medicine, Louisville, Kentucky, USA
  13. 13 Department of Neurosurgery, Mayo Clinic, Rochester, Minnesota, USA
  14. 14 Columbia University Medical Center, Departments of Neurosurgery and Radiology, New York, USA
  15. 15 Department of Neurosurgery, Cleveland Clinic, Cleveland, Ohio, USA
  16. 16 Department of Neurological Surgery, University of Texas Southwestern Medical Center, Dallas, Texas, USA
  1. Correspondence to J Mocco, Department of Neurological Surgery, Mount Sinai Health System, NY 10029, USA; j.mocco{at}


Background The angiographic evaluation of previously coiled aneurysms can be difficult yet remains critical for determining re-treatment.

Objective The main objective of this study was to determine the inter-rater reliability for both the Raymond Scale and per cent embolization among a group of neurointerventionalists evaluating previously embolized aneurysms.

Methods A panel of 15 neurointerventionalists examined 92 distinct cases of immediate post-coil embolization and 1 year post-embolization angiographs. Each case was presented four times throughout the study, along with alterations in demographics in order to evaluate intra-rater reliability. All respondents were asked to provide the per cent embolization (0–100%) and Raymond Scale grade (1-3) for each aneurysm. Inter-rater reliability was evaluated by computing weighted kappa values (for the Raymond Scale) and intraclass correlation coefficients (ICC) for per cent embolization.

Results 10 neurosurgeons and 5 interventional neuroradiologists evaluated 368 simulated cases. The agreement among all readers employing the Raymond Scale was fair (κ=0.35) while concordance in per cent embolization was good (ICC=0.64). Clinicians with fewer than 10 years of experience demonstrated a significantly greater level of agreement than the group with greater than 10 years (κ=0.39 and ICC=0.70 vs κ=0.28 and ICC=0.58). When the same aneurysm was presented multiple times, clinicians demonstrated excellent consistency when assessing per cent embolization (ICC=0.82), but moderate agreement when employing the Raymond classification (κ=0.58).

Conclusions Identifying the per cent embolization in previously coiled aneurysms resulted in good inter- and intra-rater agreement, regardless of years of experience. The strong agreement among providers employing per cent embolization may make it a valuable tool for embolization assessment in this patient population.

  • coil
  • aneurysm
  • angiography

Statistics from


The clinical utility of endovascular coiling for the treatment of ruptured and unruptured intracranial aneurysms has been demonstrated in a variety of robust, prospective studies with lower reported rates of complications compared with traditional clipping.1–5 Despite these favorable results, one disadvantage of coiling is the potential for aneurysm recanalization.1 2 6 Several studies have reported that the combined recurrence rate of embolized ruptured and unruptured aneurysms exceeds 20%.1 6 7 This recanalization is commonly related to compression/migration of the coils or regrowth of the aneurysm,6 and factors such as aneurysm size, neck morphology, initial level of occlusion, and ruptured presentation have been identified as significant predictors of recurrence.7 8

When recanalization is identified on follow-up angiograms, the interventionalist must determine whether re-treatment is indicated to mitigate the risk of rupture.9 Although only approximately half of these recanalized aneurysms undergo re-treatment,9 10 there is substantial variability in the decision to proceed with additional intervention.11 12 Criteria to re-treat include enlarging remnant, uncovering of the aneurysmal sac with a >2 mm recurrence, and worsening symptoms.6 10 13 14 The question of the need for re-treatment has become even more complex as embolization technology and techniques develop.

Possible methods of evaluating aneurysm occlusion and residual filling include the Raymond–Roy Scale (hereafter referred to as the Raymond Scale) and the total percentage of the extent of embolization.15 To date, relatively few studies have determined the inter-rater reliability for these measurements in the context of aneurysm treatment.11 12 16 As endovascular treatment of brain aneurysms represents the cross section of several medical disciplines, the relationship between specialty and years of experience to the assessment of the radiographic adequacy of aneurysm occlusion remains unclear. With an increasing emphasis on evidence based practice and the formulation of standardized treatment guidelines, exploration of the utility of standardized techniques in clinical assessment is particularly relevant.

The purpose of this study was to evaluate the inter-rater reliability for both Raymond classification and the estimated percentage of aneurysm embolization for previously coiled aneurysms among a diverse set of neurointerventionalists. In addition, variables such as years of experience were selected a priori to examine their effect on rating agreement.


Study design

Institutional review board approval was obtained for the current survey based cross sectional study. To accomplish the goal of assessing inter-rater reliability for the Raymond classification (table 1) and the estimated per cent embolization, a panel of 15 experienced neurointerventionalists was assembled to examine 92 simulated cases of immediate post-embolization and 1 year post-embolization angiographs. Images were identified in consecutive fashion, with the intent to obtain a range of nearly occluded and fully occluded aneurysms across the selected locations. All cases were obtained from two institutions. A total of 92 distinct pairs of aneurysm images were utilized.

Table 1

Raymond–Ray classification

Two images were shown on a single PowerPoint slide for each case: one working view at the end of the initial treatment and a second of the same aneurysm, from the same working view, at follow-up. For the purposes of study standardization, the post-embolization image was presented as being obtained at 1 year following the initial treatment. Each distinct aneurysm image set was presented four separate times, spread across the total 398 case cohort, with each of the four presentations shown alongside a different clinical history (such as age, smoking status, and rupture history) to allow for an assessment of consistency for each individual physician. It was reported that the aneurysm underwent no additional intervention between the two images. Aneurysms were selected from anterior communicating, posterior communicating, ophthalmic, and basilar artery locations. All cases consisted of aneurysms that had been coiled, with possible stent assisted or balloon assisted coiling. Four example cases are seen in figure 1A-D, with an anterior communicating (A), posterior communicating  (B), ophthalmic (C), and basilar (D) case shown.

Figure 1

Immediate post-coiling and 1 year post-coiling images of anterior communicating (A), posterior communicating  (B), ophthalmic (C), and basilar (D) aneurysms.

Data collection

Clinicians provided demographic information regarding years of experience and the geographic location of their practice. All respondents were asked to provide the following information for each case: per cent embolization (0–100%) and Raymond Scale for each aneurysm (1–3).

Data collection was completed, in a blinded fashion, during a 3 hour session at a national meeting, in order to establish consistent practice and timing of review. Remaining questions that were not finished during the session were answered individually using a time regulated review module. An honorarium was provided to participants as compensation for their time.

Statistical analysis

Inter-rater reliability of the Raymond Scale was assessed using a weighted Fleiss’ kappa coefficient analysis. The Raymond Scale was explored as a three level categorical variable. Inter-rater reliability using per cent embolization was assessed with the intraclass correlation coefficient (ICC). Per cent embolization was a continuous variable from 0 to 100, although in the significant majority of cases this number was clustered from 70 to 100.

Ninety-five per cent bias corrected bootstrap confidence intervals were computed for each coefficient. Overlaps between the confidence intervals of the coefficients associated with different groups were examined to determine significant differences in concordance between the raters. Kappa coefficient reliability values were evaluated in ranges from 0.01 to 0.20, indicating a slight correlation, 0.21–0.40 demonstrating fair correlation, 0.41–0.60 showing moderate agreement, 0.61–0.80 indicating substantial agreement, and 0.81–1.0 demonstrating almost perfect correlation.17 ICC values were assessed using commonly employed cut-offs, with <0.40 indicating poor agreement, 0.40–0.59 demonstrating fair concordance, 0.60–0.74 indicating strong agreement, and 0.75–1.0 showing excellent reliability.18 An additional sub-analysis for inter-rater reliability among those with greater or less than 10 years of experience following training was performed for both techniques. Intra-rater reliability values were subsequently computed for all raters’ evaluation of the same distinct aneurysm over the course of multiple assessments. A potential interaction between specialty and years of experience was also examined. Experience was examined as both a binary and continuous variable, employing a χ2 analysis and a Student’s t test, respectively. Excluding rare exceptions, each interventionalist evaluated the same aneurysm four different times. The mean ICC or Fleiss’ kappa coefficients were calculated for each subgroup, as appropriate. Outliers, defined as reliability values that fell below three times the IQR below the first quartile, were excluded. All statistical analysis was performed in IBM SPSS Statistics 22 (SPSS Inc, Armonk, New York, USA).


Case features

A total of 92 separate aneurysms were presented to the study participants, each four different times, with varied associated clinical scenarios, resulting in a total of 368 analyzed cases. Of the unique aneurysm cases that were evaluated, 21 (22.8%) were classified as anterior communicating, 23 (25.0%) as posterior communicating, 19 (20.7%) as ophthalmic, and 29 (31.5%) as basilar. A total of 49% were large (>10 mm) and 51% small (0–10 mm). These data are presented in table 2.

Table 2

Radiographic characteristics of simulated cases

Rater characteristics

Fifteen neurointerventionalists were recruited for the study. Of these, 10 (66.6%) were neurosurgeons (NS) while five (33.3%) were interventional neuroradiologists (INR). The combined cohort of readers had been in practice for a mean of 11.1 years following completion of training. Seven (46.7%) had been in practice for greater than 10 years while eight (53.3%) were practicing for fewer than 10 years. Of the NS, seven were in practice for fewer than 10 years, and three for 10 years or greater. Of the INR, one was in practice for fewer than 10 years, and four for 10 years or greater. The demographic and training characteristics of the study participants are reported in table 3.

Table 3

Demographics of participating interventionalists

Inter-rater reliability

Primary analysis

Inter-rater reliability was calculated using weighted Fleiss’ kappa coefficients for the Raymond Scale and ICC for the estimated per cent embolization. Agreement among all readers using the Raymond Scale was fair (κ=0.35) while concordance in per cent embolization was good (ICC=0.64). This trend remained consistent throughout the subanalyses by both training and experience.

Secondary analysis

The combined cohort of clinicians with fewer than 10 years of experience following training demonstrated a significantly greater level of agreement than the group with greater than 10 years (κ=0.39 and ICC=0.70 vs κ=0.28 and ICC=0.58). The non-overlapping confidence intervals for both Raymond Scale and per cent embolization demonstrates a significantly greater degree of concordance among those with fewer than 10 years of experience compared with the estimates of those practicing for more than 10 years. These data are depicted in table 4.

Table 4

Inter-rater reliability for aneurysm embolization

Intra-rater reliability

Among all readers who assessed each unique aneurysm on separate occasions, the mean intra-rater reliability coefficients were ICC=0.82 for per cent embolization and κ=0.58 for Raymond Scale. Intra-rater reliability in assessing per cent embolization was consistently strong across groups stratified by experience. These values are presented in table 5. In terms of a possible interaction between years of experience and specialty, when years of experience was dichotomized into greater or less than 10 years, χ2 testing revealed no significant interaction (χ2=3.348, p=0.12). Similarly, when years of experience was treated as a continuous variable, a Student’s t test also revealed no significant relationship (p=0.356, 95% CI (0.208 to 0.321)).

Table 5

Mean intra-rater reliability for identical aneurysms


Recanalization remains an important concern following endovascular coiling of intracranial aneurysms.8 While clinicians vary substantially in their decision to re-treat recurrent aneurysms,11 12 the decision process begins with the perception of aneurysm recurrence or increase in fundal filling. As such, assessing the completeness of embolization plays an important role in evaluating previously treated aneurysms. In this study, incorporating experienced neurointerventionalists, 92 separate aneurysms, and 368 simulated cases, Raymond Scale classification was found to result in fair inter-rater reliability (κ=0.35). Meanwhile, assigning an estimated per cent embolization was associated with good inter-observer agreement (ICC=0.64). This relative relationship between the two methods remained consistent for all cohorts of raters, regardless of years in practice, as well as for intra-rater reliability in the assessment of the same aneurysm.

The Raymond–Ray classification is frequently used in the characterization of coiled aneurysms. Designed to identify the degree of occlusion but not predict recurrence,19 the Raymond Scale is assigned as a categorical variable between 1 and 3. Despite its importance and widespread use in the clinical assessment of coiled aneurysms, relatively few studies have examined the associated inter-rater reliability, and even fewer have performed stringent inter- and intra-rater evaluations. In determining the observer agreement for several different scales, Cloft et al reported moderate inter-rater reliability using the Raymond Scale (κ=0.50 for 4 point and κ=0.54 for 3 point).16 20 That study included 83 angiograms and utilized two experienced observers, both of whom were INR.

In perhaps the most notable study to date, McDonald et al enlisted seven endovascular therapists (4 INR, 3 NS) to review 66 cases of previously coiled aneurysms.12 The authors assessed rater agreement by ICC using a 5 point rating scale recommending whether or not to re-treat a previously coiled aneurysm. Overall variability was moderate with an ICC of 0.50, and demonstrated greater consistency among less experienced raters (ICC=0.46) compared with those with more experience (ICC=0.14) in selecting treatment type. This study laid the groundwork for the organization of the current study. The present analysis incorporated 92 distinct cases that were each repeated four times, in variable order, and reviewed by 15 neurointerventionalists who represented a range of experiences and training pathways. Interestingly, we also found significantly increased agreement among less experienced raters, in this case for both per cent embolization and Raymond Scale compared with their more experienced counterparts. Furthermore, the data demonstrated that inter-rater reliability in assigning Raymond Scale consistently ranged from fair to slight, regardless of years of experience (κ <0.40 for all groups). Among all readers with greater than 10 years of experience, the Fleiss’ kappa coefficient was 0.28.

These findings initially appear to be unexpected when presented alongside the strong reliability values of per cent embolization assessment, particularly given the categorical nature of Raymond Scale classification. However, Fleiss’ kappa coefficients, which are used in the reliability assessment of categorical variables, are not identical to ICC values computed using continuous variables and are, as such, difficult to compare in terms of score utility. While the kappa values for this series were <0.60, they are not outliers in the context of the existing literature, where prior studies employing the 3 point Raymond Scale have reported kappa values of 0.28,21 0.50,22 and 0.54.16 Cloft et al previously noted that observer variability was greater for scales that offer fewer responses, and our data support this conclusion. Interestingly, all of the scales evaluated to date were categorical (ranging from 2 to 5 point responses), which facilitates direct comparison to one another, rather than a comparison of continuous and categorical scales.

The good inter- and intra-rater reliability noted in this study using per cent embolization may be secondary to a number of reasons. First, there may be disagreement among clinicians regarding the definition and application of Raymond Scale criteria, potentially contributing to the lower (<0.8) kappa values reported across the literature utilizing this scale. To limit this, the Raymond Scale, with accompanying illustrations, was reviewed before the initiation of this study. Second, although per cent embolization is theoretically a continuous variable and every rater was explicitly instructed that any integer could be employed, in practice it is frequently reported as a round number in multiples of 5 (ie, 90%, 95%, 100%), thus potentially improving inter- and intra-rater reliability. However, the distribution of values precluded their assessment as categorical variables. These findings have implications for the potential utility of per cent embolization in the assessment of previously coiled aneurysms, since continuous variables tend to have greater power and discriminative capacity.23 While the assessment of per cent embolization is often used in the evaluation of previously coiled aneurysms, the authors are unaware of any studies in the literature that report inter-rater agreement among multiple readers examining the per cent of aneurysm embolization. Further evaluation of how per cent embolization naturally groups among evaluators as well as the relationship between percentage embolization and rupture risk would be a worthwhile future endeavor.

In addition to determining the inter-rater reliability for these scales in the context of assessing previously embolized aneurysms, a secondary analysis examining the a priori selected variable of experience was performed. Interestingly, readers with greater experience were significantly less concordant in determining Raymond Scale and embolization than those who spent fewer years in practice. While more experienced providers have been previously shown to be less concordant regarding the selection of re-treatment type,12 this relationship has not been previously assessed in the context of embolization assessment. These findings may reflect a trend that a younger generation of neurointerventionalists, trained in the climate of rapidly evolving technologies, may evaluate aneurysms similarly compared with their elder colleagues. However, this is mere speculation.

These findings must be examined in the context of the limitations of the study. First, this analysis represents a retrospective review of previously collected and assembled angiograms and is thus at inherent risk for unintended selection bias. We attempted to mitigate this by recruiting a wide array of interventionalists and examining a large number of angiogram studies from two different institutions. Second, the number of NS in the study outnumbered those from an INR background, and the number of recruited neurointerventionalists was limited by inherent limitations in time and funding. Third, only one DSA view was shown to the group, which does not represent actual practice, but is often comparable to views available to a core lab for clinical trial endpoint analysis. The volume calculations performed in this analysis as well as in similar studies are often rudimentary, thus emphasizing the importance of developing new tools to facilitate rapid and accurate volume assessment. Novel three-dimensional imaging techniques have been reported to provide superior visualization of intracranial aneurysms24 25 and similar techniques have the potential to facilitate post-embolization assessment compared with two-dimensional representations. Fourth, this imaging dataset primarily consisted of minimally recanalized aneurysms in order to assess various clinicians’ approaches to these lesions. It is possible that these findings would have been different for massively recanalized aneurysms. Finally, this study was solely performed to assess the inter- and intra-rater reliability of Raymond Scale and per cent embolization. Conclusions regarding outcomes following endovascular coiling or treatment decisions between groups cannot be made. Future studies are needed to examine the role that these factors play in recommending the re-treatment of aneurysms and assess how this may differ between providers with various years of experience.


Overall, this study reports the largest series of angiographic analyses assessing the inter- and intra-rater reliability of Raymond Scale and per cent embolization. In a query of 15 neurointerventionalists from varying training backgrounds and levels of experience, assessment of per cent embolization was found to be associated with good inter- and intra-rater reliability. This finding was consistent in subcohort analyses of clinician experience. In evaluating the same aneurysm, raters demonstrated excellent consistency when assessing per cent embolization and moderate agreement when employing Raymond classification. Further studies are needed to determine the role that assessing per cent embolization may play in evaluating recurrent, previously treated aneurysms. However, it appears that the use of percent embolization may result in both strong inter- and intra-rater reliability among clinicians assessing previously coiled aneurysms.


The authors would like to acknowledge the hard work and dedication of Michael C Dewan and Peter J Morone.



  • Contributors SLZ designed the study, collected the data, wrote the statistical analysis plan, edited the manuscript, and is the guarantor. NL wrote the statistical analysis plan, cleaned the data, and drafted and edited the manuscript. JAM, JV, MS, BA, AMS, AA, ASA, DF, RH, JAH, FKH, RFJ, DFK, PMM, DBN, PR, RDT, and BGW collected the data and edited the manuscript. JM designed the study, collected the data, and edited the manuscript.

  • Funding This work was supported by Codman & Shurtleff, Inc.

  • Competing interests ASA is a consultant for Leica, Medtronic, Microvention, Penumbra, Siemens, and Stryker; has received research support from Microvention, Penumbra, and Siemens; and is a shareholder at Bendit, Cerebrotech, Serenity, and Synchron. BGW is a consultant for Stryker and Medtronic. PR is a member of the speaker’s bureau for Stryker Neurovascular; and a shareholder at Perflow Medical and Neurvana Medical. RDT has relationships with Codman, Medtronic, Penumbra, Microvention, Stryker, and Blockade Medical. JM is a consultant for Rebound Medical, Endostream, Synchron, and Cerebrotech; and an investor in Apama, The Stroke Project, Endostream, Synchron, Cerebrotech, NeurVana, and NeuroTechnology Investors.

  • Ethics approval Institutional review board approval was obtained for the current survey based cross sectional study.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.