ࡱ >
_ " bjbj
. h1bh1bG t * [! [! [! ) ) ) ) ) ) ) $ , g/ R * Q [! [! [! [! [! * Y* # # # [! v ) # [! ) # # R% % y#_ ! j% ) o* 0 * r% , / ! N / % % , / ( [! [! # [! [! [! [! [! * * # [! [! [! * [! [! [! [! / [! [! [! [! [! [! [! [! [! B R : O N L I N E S U P P L E M E N T
M a c h i n e L e a r n i n g I m p r o v e s P r e d i c t i o n o f D e l a y e d C e r e b r a l I s c h e m i a i n P a t i e n t s W i t h S u b a r a c h n o i d H e m o r r h a g e
T a b l e S E Q T a b l e \ * R O M A N I - H y p e r - p a r a m e t e r s u s e d f o r S V M
C l a s s i f i e r K e r n e l T y p e P e n a l t y p a r a m e t e r C K e r n e l c o e f f i c i e n t D e gree of the Polynomial kernelSVMLinear[0.001, 0.01, 0.1, 1, 10, 100]n.a.n.a.Radial basis function[0.001, 0.01, 0.1, 1, 10, 100][1, 0.1, 0.01, 0.001, 0.0001]n.a.Polynomial[0.001, 0.01, 0.1, 1, 10, 100][1, 0.1, 0.01, 0.001, 0.0001][1,2,3,4]
Table SEQ Table \* ROMAN II - Hyper-parameters used for RFC and MLP
ClassifierParameter NameParameter ValueRFCNumber of Trees[100,200,400,600,800]Max features for splitauto, sqrt and log2Quality of splitGini or EntropyMLPHidden Layer sizes[50,25], [60,30 ], [60,40,20], [50,30,10], [70,40,20], [70,30], [80,50,30], [80,60,30,10]Regularization parameter[0.1, 0.01, 0.001, 0.0001]Batch size[64, 128] Learning rate[0.01, 0.05, 0.001, 0.005, 0.0001]
Table SEQ Table \* ROMAN III - Hyper-parameters used for the auto-encoder
Patch SizeConv LayerMax PoolConv LayerMax poolConv LayerMax pool128x128x197x7x7 feature maps=162x2x2 stride (2:2:1)5x5x5
feature maps=162x2x2 stride (2:2:2)3x3x3
feature maps=322x2x2 stride (2:2:1)128x128x195x5x5 feature maps=162x2x2 stride (2:2:1)3x3x3
feature maps=162x2x2 stride (2:2:2)3x3x3
feature maps=322x2x2 stride (2:2:1)
Table SEQ Table \* ROMAN IV - Patient characteristics.
VariableAll (317)no DCI (220)DCI (97)Missing%p-valueAge (mean/SD)57.66 (12.1)57.68 (10.9)57.62 (12.6)0 (0.0)0.964Female sex (%)211 (66.6)143 (70.1)68 (65.0)0 (0.0)0.448History of aneurysmal subarachnoid hemorrhage (%)5 (1.6)4 (1.0)1 (1.8)31 (9.8)1.0History of intracerebral hemorrhage (%)2 (0.6)2 (0.0)0 (0.9)31 (9.8)1.0History of cardiovascular disorder (%)58 (18.3)43 (15.5)15 (19.5)24 (7.6)0.50History of diabetes mellitus (%)21 (6.6)15 (6.2)6 (6.8)29 (9.1)0.939History of hypertension (%)104 (32.8)75 (29.9)29 (34.1)27 (8.5)0.521History of hyper cholesterol (%)53 (16.7)35 (18.6)18 (15.9)32 (10.1)0.617
History of Smoking (%)83 (26.2)0.628No55 (17.4)37 (18.6)18 (16.8)Yes, but stopped60 (18.9)44 (16.5)16 (20.0)Yes, still smokes119 (37.5)79 (41.2)40 (35.9)History of alcohol use (%)134 (42.3)89 (46.4)45 (40.5)83 (26.2)0.349Previous MRs (%)89 (28.1)
0.7410158 (49.8)108 (51.5)50 (49.1)146 (14.5)30 (16.5)16 (13.6)216 (5.0)11 (5.2)5 (5.0)36 (1.9)4 (2.1)2 (1.8)41 (0.3)0 (1.0)1 (0.0)51 (0.3)1 (0.0)0 (0.5)Patient sedated (%)64 (20.2)43 (21.6)21 (19.5)112 (35.3)0.631Glasgow coma scale (mean/SD) on admission13.17 (3.2)13.17 (2.9)13.14 (3.3)99.00 (31.2)
0.946WFNS on admission (%)79 (24.9)
0.3911118 (37.2)86 (33.0)32 (39.1)255 (17.4)33 (22.7)22 (15.0)39 (2.8)5 (4.1)4 (2.3)432 (10.1)23 (9.3)9 (10.5)524 (7.6)15 (9.3)9 (6.8)Hunt and Hess score (%)7 (2.2)0.238155 (17.4)37 (18.6)18 (16.8)296 (30.3)71 (25.8)25 (32.3)356 (17.7)33 (23.7)23 (15.0)424 (7.6)15 (9.3)9 (6.8)579 (24.9)59 (20.6)20 (26.8)Fisher score (%)1 (0.3)0.228111 (3.5)10 (1.0)1 (4.5)218 (5.7)14 (4.1)4 (6.4)344 (13.9)27 (17.5)17 (12.3)4243 (76.7)168 (77.3)75 (76.4)Modified Fisher score (%)1 (0.3)0.317
012 (3.8)11 (1.0)1 (5.0)117 (5.4)13 (4.1)4 (5.9)22 (0.6)2 (0.0)0 (0.9)369 (21.8)45 (24.7)24 (20.5)4216 (68.1)148 (70.1)68 (67.3)Presence of intraventricular hemorrhage (%)
214 (67.5)148 (68.0)66 (67.3)88 (27.8)0.782Presence of intraparenchymal hemorrhage (%)
83 (26.2)54 (29.9)29 (24.5)172 (54.3)0.434Presence of subdural hemorrhage (%)
19 (6.0)13 (6.2)6 (5.9)209 (65.9)0.979Total hemorrhage volume (mean/SD)37.22 (30.3)34.60 (31.0)43.22 (29.6)2.00 (0.6)0.023Time from Ictus to admission (mean/SD)30.93 (107.0)31.60 (58.6)29.39 (122.4)30.93 (107.0)0.829Number of aneurysms (mean/SD)
1.28 (0.7)1.31 (0.6)1.23 (0.7)0.00 (0.0)0.292Height of aneurysm (mean/SD)
6.82 (5.4)6.93 (3.8)6.58 (6.0)6.00 (1.9)0.539Width of aneurysm (mean/SD)
5.51 (4.3)5.54 (3.4)5.45 (4.6)7.00 (2.2)0.855Side of aneurysm (%)
3 (0.9)0.021Left156 (49.2)99 (58.8)57 (45.0)Right131 (41.3)94 (38.1)37 (42.7)Middle27 (8.5)24 (3.1)3 (10.9)Shape of aneurysm (%)
5 (1.6)
0.573Saccular 284 (89.6)195 (91.8)89 (88.6)Non-saccular (fusiform/ruptured) 26 (8.2)20 (6.2)6 (9.1)Other2 (0.6)1 (1.0)1 (0.5)Aneurysm treatment (%)0 (0.0)
0.002No48 (15.1)44 (4.1)4 (20.0)Coiling212 (66.9)144 (70.1)68 (65.5)Clipping54 (17.0)30 (24.7)24 (13.6)Coiling plus stent2 (0.6)1 (1.0)1 (0.5)Flow diversion1 (0.3)1 (0.0)0 (0.5)Rebleed number (%)
91 (28.7)
0.6280178 (56.2)120 (59.8)58 (54.5)134 (10.7)26 (8.2)8 (11.8)212 (3.8)7 (5.2)5 (3.2)31 (0.3)1 (0.0)0 (0.5)51 (0.3)1 (0.0)0 (0.5)Treatment for rebleed (%)
0 (0.0)
0.998No254 (80.1)176 (80.4)78 (80.0)Yes, based on both CT blood increase and clinical deterioration 27 (8.5)19 (8.2)8 (8.6)Yes, based on blood increase on CT scan6 (1.9)4 (2.1)2 (1.8)Yes, based on clinical deterioration30 (9.5)21 (9.3)9 (9.5)Location of aneurysm (%)0 (0.0)
0.044Posterior circulation66 (20.8)53 (24.1)13 (13.4)Anterior circulation251 (79.2)167 (75.9)84 (86.6)
Auto-encoder implementation
Stacked Convolutional Auto-encoder is a typical unsupervised feature learning algorithm that scales well to high-dimensional inputs and is robust to noise and variations This method, learns features (characteristics) from the image by first encoding the input into a lower dimensional space using convolutional and pooling layers, and then reconstructs it using the inverse operations (deconvolution and unpooling) ADDIN CSL_CITATION {"citationItems":[{"id":"ITEM-1","itemData":{"ISBN":"9783642217340","ISSN":"03029743","abstract":"We present a novel convolutional auto-encoder (CAE) for unsupervised feature learning. A stack of CAEs forms a convolutional neural network (CNN). Each CAE is trained using conventional on-line gradient descent without additional regularization terms. A max-pooling layer is essential to learn biologically plausible features consistent with those found by previous approaches. Initializing a CNN with filters of a trained CAE stack yields superior performance on a digit (MNIST) and an object recognition (CIFAR10) benchmark.","author":[{"dropping-particle":"","family":"Masci","given":"Jonathan","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Meier","given":"Ueli","non-dropping-particle":"","parse-names" : f a l s e , " s u f f i x " : " " } , { " d r o p p i n g - p a r t i c l e " : " " , " f a m i l y " : " C i r e _a n " , " g i v e n " : " D a n " , " n o n - d r o p p i n g - p a r t i c l e " : " " , " p a r s e - n a m e s " : f a l s e , " s u f f i x " : " " } , { " d r o p p i n g - p a r t i c l e " : " " , " f a m i l y " : " S c h m i d h u b e r " , " g i v e n " : " J r g e n " , " n o n - d r o p p i n g - p a r t i c l e " : " " , " p a r s e - n a m e s " : f a l s e , " s u f f i x ":""}],"container-title":"Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","id":"ITEM-1","issue":"PART 1","issued":{"date-parts":[["2011"]]},"page":"52-59","title":"Stacked convolutional auto-encoders for hierarchical feature extraction","type":"article-journal","volume":"6791 LNCS"},"uris":["http://www.mendeley.com/documents/?uuid=26f25a77-fd80-495d-99d8-8966f1537e06"]}],"mendeley":{"formattedCitation":"1","plainTextFormattedCitation":"1","previouslyFormattedCitation":"1"},"properties":{"noteIndex":0},"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}1. The weights from this network are trained based on the difference between the input image and the reconstructed output image. The features learned by the auto-encoder are used to reconstruct the image. These features are usually an average representation of the images, which is unlikely to yield the discovery of a more useful representation than the input image. To solve this problem, we used the same approach from ADDIN CSL_CITATION {"citationItems":[{"id":"ITEM-1","itemData":{"ISSN":"21682267","abstract":"Deep networks have achieved excellent performance in learning representation from visual data. However, the super-vised deep models like convolutional neural network require large quantities of labeled data, which are very expensive to obtain. To solve this problem, this paper proposes an unsupervised deep net-work, called the stacked convolutional denoising auto-encoders, which can map images to hierarchical representations without any label information. The network, optimized by layer-wise training, is constructed by stacking layers of denoising auto-encoders in a convolutional way. In each layer, high dimensional feature maps are generated by convolving features of the lower layer with kernels learned by a denoising auto-encoder. The auto-encoder is trained on patches extracted from feature maps in the lower layer to learn robust feature detectors. To better train the large network, a layer-wise whitening technique is introduced into the model. Before each convolutional layer, a whitening layer is embedded to sphere the input data. By layers of mapping, raw images are transformed into high-level feature represen-tations which would boost the performance of the subsequent support vector machine classifier. The proposed algorithm is evaluated by extensive experimentations and demonstrates supe-rior classification performance to state-of-the-art unsupervised networks.","author":[{"dropping-particle":"","family":"Du","given":"Bo","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Xiong","given":"Wei","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Wu","given":"Jia","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Lefei","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhang","given":"Liangpei","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Tao","given":"Dacheng","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"IEEE Transactions on Cybernetics","id":"ITEM-1","issue":"4","issued":{"date-parts":[["2017"]]},"page":"1017-1027","title":"Stacked Convolutional Denoising Auto-Encoders for Feature Representation","type":"article-journal","volume":"47"},"uris":["http://www.mendeley.com/documents/?uuid=2a446c82-17d3-4b20-88ac-0d74ab9c40c3"]}],"mendeley":{"formattedCitation":"2","plainTextFormattedCitation":"2","previouslyFormattedCitation":"2"},"properties":{"noteIndex":0},"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}2 which consists in applying noise to the input image and trying to reconstruct the normal scan using the SCAE. This approach is called Stacked Denoising Convolutional Auto-encoder (SDCAE) and will force the auto-encoder to extract more robust features.
To speed up the training process (and allow the use of more samples per mini-batch) the images were downscaled by a factor of 4, resulting in scans of size 128x128x20. In order to account for variations in the data and increase the number of samples, we performed data augmentation using label-preserving transformations (translation, rotation and reflection), following the approach used in ADDIN CSL_CITATION {"citationItems":[{"id":"ITEM-1","itemData":{"ISSN":"00104825","abstract":"A B S T R A C T Dental records play an important role in forensic identification. To this end, postmortem dental findings and teeth conditions are recorded in a dental chart and compared with those of antemortem records. However, most dentists are inexperienced at recording the dental chart for corpses, and it is a physically and mentally laborious task, especially in large scale disasters. Our goal is to automate the dental filing process by using dental x-ray images. In this study, we investigated the application of a deep convolutional neural network (DCNN) for classifying tooth types on dental cone-beam computed tomography (CT) images. Regions of interest (ROIs) including single teeth were extracted from CT slices. Fifty two CT volumes were randomly divided into 42 training and 10 test cases, and the ROIs obtained from the training cases were used for training the DCNN. For examining the sampling effect, random sampling was performed 3 times, and training and testing were repeated. We used the AlexNet network architecture provided in the Caffe framework, which consists of 5 convolution layers, 3 pooling layers, and 2 full connection layers. For reducing the overtraining effect, we augmented the data by image rotation and intensity transformation. The test ROIs were classified into 7 tooth types by the trained network. The average classification accuracy using the augmented training data by image rotation and intensity transformation was 88.8%. Compared with the result without data augmentation, data augmentation resulted in an approximately 5% improvement in classification accuracy. This indicates that the further improvement can be expected by expanding the CT dataset. Unlike the conventional methods, the proposed method is advantageous in obtaining high classification accuracy without the need for precise tooth segmentation. The proposed tooth classification method can be useful in automatic filing of dental charts for forensic identification.","author":[{"dropping-particle":"","family":"Miki","given":"Yuma","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Muramatsu","given":"Chisako","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Hayashi","given":"Tatsuro","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Zhou","given":"Xiangrong","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Hara","given":"Takeshi","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Katsumata","given":"Akitoshi","non-dropping-particle":"","parse-names":false,"suffix":""},{"dropping-particle":"","family":"Fujita","given":"Hiroshi","non-dropping-particle":"","parse-names":false,"suffix":""}],"container-title":"Computers in Biology and Medicine","id":"ITEM-1","issue":"September 2016","issued":{"date-parts":[["2017"]]},"page":"24-29","publisher":"Elsevier","title":"Classification of teeth in cone-beam CT using deep convolutional neural network","type":"article-journal","volume":"80"},"uris":["http://www.mendeley.com/documents/?uuid=a0cd7f4c-f9ad-45d5-82c9-e69ba56d4183"]}],"mendeley":{"formattedCitation":"3","plainTextFormattedCitation":"3","previouslyFormattedCitation":"3"},"properties":{"noteIndex":0},"schema":"https://github.com/citation-style-language/schema/raw/master/csl-citation.json"}3.
Figure SEQ Figure \* ARABIC 1 - Feature Importance for RF classifier. Top using only the clinical data and bottom using a combination of clinical and image features.
LIME explanation and examples
Machine learning (ML) methods are often seen as black boxes, since explaining their predictions is usually not a trivial task. In order to build trust, it is important to visualize which features influenced the models prediction. LIME is a tool that can be used to locally explain the predictions of a given model. The explanation is based on visual representations that provide qualitative understanding of the model. While a models decision boundary can be very complex globally, it can be easier to interpret the vicinity around a particular sample of this complex decision boundary. This particular sample is perturbed and a sparse linear model is built around it and used for explanation. In summary, LIME creates an explanation by approximating a black box model by a more interpretable one.
More examples of LIME explanations are shown in Figure 2.1 and 2.2. In Figure 2.1 (top), a patient that developed DCI received a low risk prediction when using only clinical features and a Random Forest model, even though some variables, such as total blood volume, strongly suggest a higher risk of DCI. After including image features (Figure 2.1 bottom), the risk for DCI increased (from 0.24 to 0.71), and most of the images features suggested a higher risk of DCI. In Figure 2.2 (top) a patient that did not develop DCI was assessed with LIME. First the risk for DCI and no DCI is similar (top). After including the image features (Figure 2.2 bottom), some image features strongly suggest a lower risk of DCI, which reduces the overall risk predicted by the ML model.
Figure SEQ Figure \* ARABIC 2.1 - LIME model explanation of a DCI positive patient. The model built using the clinical features suggest a lower risk of DCI (Top). After including the image features (bottom), the model suggests a higher risk for DCI.
Figure 2.2 - LIME model explanation of a DCI Negative patient. The model b u i l t u s i n g t h e c l i n i c a l f e a t u r e s s u g g e s t a h i g h e r r i s k o f D C I ( T o p ) . A f t e r i n c l u d i n g t h e i m a g e f e a t u r e s ( b o t t o m ) , t h e m o d e l s u g g e s t s a l o w e r r i s k f o r D C I .
B i b l i o g r a p h y
A D D I N M e n d e l e y B i b l i o g r a p h y C S L _ B I B L I O G R A P H Y 1 . M a s c i J , M e i e r U , C i r e " $
4 6 8 : |
$ % ' ( Q { g h ž{k{^ZV{^{^VZ h hlO h hlO OJ QJ aJ h hlO 56OJ QJ aJ h hlO 5OJ QJ aJ h+# CJ aJ mH nH uj h hlO CJ UaJ h hlO CJ aJ h h h} hlO hw hHDV hHDV 5hw hN 5B*OJ QJ aJ fH ph q
5hw hf(# 5B*OJ QJ aJ fH ph q
! $ |
d $If gd1$ gdlO gd gdlO gdHDV $a$gdf(#
#
*
I
2 $ $ $ d $If gd1$ kd $$If l r nJ
}Y# 3
t 0 4 4
l a p2 yt:? I
N
S
T
U
$ kd $$If l r nJ
}Y# 3
t 0 4 4
l a p2 yt:? d $If gd1$ U
k
d $If gd1$
2 $ $ $ d $If gd1$ kd $$If l r nJ
}Y# 3
t 0 4 4
l a p2 yt:?
$ kd $$If l r nJ
}Y# 3
t 0 4 4
l a p2 yt:? d $If gd1$ Q \ k { d $If gd1$ $gdlO d gdlO { | ` T T F d $If gd1$ d $If gd1$ kd| $$If l F R#
t 0 6 4 4
l a p yt:? ` T T F d $If gd1$ d $If gd1$ kd. $$If l F R#
t 0 6 4 4
l a p yt:? ` T T F d $If gd1$ d $If gd1$ kd $$If l F R#
t 0 6 4 4
l a p yt:? h ` T T F d $If gd1$ d $If gd1$ kd $$If l F R#
t 0 6 4 4
l a p yt:? h i j ` T T F d $If gd1$ d $If gd1$ kdD $$If l F R#
t 0 6 4 4
l a p yt:? ` T T F d $If gd1$ d $If gd1$ kd $$If l F R#
t 0 6 4 4
l a p yt:? ` T T F d $If gd1$ d $If gd1$ kd $$If l F R#
t 0 6 4 4
l a p yt:? ;
F
Q
Z
e
n
` X R F F F F F d $If gd1$ $gdlO d gdlO kdZ $$If l F R#
t 0 6 4 4
l a p yt:?
;
! * + ʽ،r[rN h1$ h17 OJ QJ aJ -h17 5B*OJ PJ QJ \aJ nHph tH3h1$ h17 5B*OJ PJ QJ \aJ nHph tHj h h CJ UaJ h h CJ aJ hY hG hG hlO hlO mHsH h hlO OJ QJ aJ h hlO 5OJ QJ aJ h+# CJ aJ mH nH uh hlO CJ aJ j h hlO CJ UaJ n
y
" 8 M S d | d 1$ 7$ 8$ H$ ^`gdlO Ff Ff Ff
d $If gd1$ " * $If gd1$ $gd gdlO * + kd $$If l .ֈ %U>!*% f f f f f
t 0 %6 4 4
l B Ba p<