3. #2 Data: to find Brachial Plexus (BP)
– 420x580 resolution
– 5635 train images with masks, 5508 test;
– ~120 images per 47 patients
– 47% of the images don’t have a mask;
– result in RLE encoding
4. #3 Data: mistakes in the ground truth
– 45 known errors of near duplicate images
– metric is sensitive to nerve presence mistakes
5. #4 Evaluation
Peculiarities
– (!) Mask presence mistake leads to zero score
– Needs smoothing in the denominator
Loss functions
– 1 – dice
– -dice
– weighted cross entropy (2 classes, per pixel prediction)
Mean mask
6. #5 Baselines
Score Description Framework Author
0.51 Empty submission - -
0.00 Top left pixel - -
0.57 U-Net, in the
beginning of the
competition
Keras code Marko Jocic, kaggler
0.62 U-Net, almost at the
end
Torch code Qure.ai (host)
7. #6 What is U-Net
Overview
– (May 2015 article) “U-Net: Convolutional Networks for Biomedical Image Segmentation”
– Winner of “Grand Challenge for Computer-Automated Detection of Caries in Bitewing
Radiography at ISBI 2015”
– Encoder-decoder architecture with skip connection on the same level
– Fully convolutional, Drop-out in the middle
– Augmentation: “Smooth deformations using random displacement vectors on a coarse 3 by 3
grid. The displacements are sampled from a Gaussian distribution with 10 pixels standard
deviation.”
8. #7 Another aproach, FCN
Overview
– (20 May 2016, article) “Fully Convolutional Networks for Semantic Segmentation”
– VGG-18
– Segmentation prediction on different layers of the net, +upsampling
– Average predictions
9. #8 Starting point, Marko Jocic’s solution
Overview
– Classic U-Net: VGG-like
– Very simple Keras code
– Image resize to 64x80, bicubic interpolation
– Loss= - Dice coefficient, per batch averaging, smooth=1
– Training on whole dataset, no validation
– RLE-encoding function
– Adam optimizer
Training
– 20 epochs, ~30 seconds on Titan X, memory footprint 800mb
– Overfits, 0.68 on training -> 0.57 on LB
10. #9 Aspects of the solution: basics
Overfitting basics (+2%)
– Split train/valid, 20% and early stopping patience=5 epochs
• used random split instead of more convenient by patient (due to a subtle bug)
– Dropout after each conv layer
General enhancements
– Resolution 64x80 -> 80x112 (+1%)
– ELU instead of ReLU -> faster convergence
11. #10 Aspects of the solution: augmentation
Augmentation*
– flip x, y
– random rotate (5)
– random zoom (0.9, 1.1)
– random channel shift (5.0)
*all transformations should be done with a mask too
All transformations can be done on the fly with a generator (randomly applied), but didn’t improve
results.
Elastic transform:
convolve with a Gaussian on random
displacement fields
Result: no added effect
12. #11 Aspects of the solution: blocks
Modifications of U-Net
– 2 3x3 convolution -> inception_v3 block
– BNA after each convolution
– BNA + activation after summation
– nxn -> 1xn + nx1
Results:
– lesser parameters (1M)
– faster convergence
– LB: +2%
v3 block
v3 + splitted
13. #12 Aspects of the solution: 2nd head, postfilter
2nd head
– mask presence branch in the middle of NN (after decoder part)
• Conv 1x1, sigmoid
• FC=1, sigmoid
– leads to better convergence
Post filter
– presence prob < 0.5 or sum(pixels) < 3000 -> empty mask (+4.5%)
– in the end: combining p_nerve = (p_score + p_segment)/2 -> +0.5%
14. #13 Aspects of the solution: other
Modifications
– Skip connection with Residual blocks (+1%)
– Max pool -> Conv 3x3 with stride=2 (+BNA)
– Ensemble (+1%)
• k-fold 5,6,7,8, average
– Prediction on augmented versions of test images (averaging)
Final result:
– single model 0.694 score
– ensemble 0.70399 (hour before the competition’s end)
– last submission has been human verified ;) but no help
code: https://github.com/EdwardTyantov/ultrasound-nerve-segmentation
16. #15 What didn’t help
– Inception Resnet v4
– sequential training of decoder, encoder parts
– more or lesser layers/blocks/n_filters
– pixel clustering
– higher or lower resolution
– dropout, different probs
– Torch version
– deconv layers instead of upsampling
– weight decay for layers
– FCN
– Deepmask architecture
17. #16 Technical
– Ubuntu 14 or 16, Cuda 8, cudnn 5, keras last, torch last
– batch_size=64, 128 (depends on GPU memory)
– Single model, 2-3 hours on Titan/1080
– Ensemble – 24 hours
18. – train dataset: error re-labeling or zero-outing
– FCN with several heads in different resolutions (regularize)
– post process: mask to elipse, no holes
– separate training: mask/no mask
– crop images, super-resolution
– models on different resolution
– higher resolution
– loss
– smart post-processing
• which obv. led to overfitting on Public Score
– replication padding instead of zero-padding
#17 Other competitors
19. #18 Deepmask (FB)
– no low-level features
– CNN with two heads: mask and score
– training set: patch, mask, y_k - objected in centered and fully contained
• mask pixel=1 if it is part of object in the center
– VGG, 8-layers, can be trained
– Training
• joint learning score * 1/32
• first branch only positives
• augmentation: shift 16pix, scale little, horiz. flip
– Evaluation
• full image: 16 pixel stride