2. Introduction
• Julian de Wit
• Freelancer software / machine learning
• MSc. Software engineering
• Love biologically inspired computing
• Last few years neural net “revolution”
• Turn academic ideas into practical apps
• Documents, plant/fruit grading, Medical, radar
3. Agenda
1. Diagnose heart disease challenge
2. Deep learning
3. Solution discussion
4. Results
5. Some extra slides
6. Feel free to ask questions during talk !
4. Challenge
• Second national data science bowl
• Kaggle.com / Booz Allen Hamilton
• Automate manual 30min clinical procedure
• Ca. 500.000 cases/year in USA
• Estimate heart volume based on MRI’s
• Ratio systole/diastole is ‘health’ predictor
• 750 teams
• $200.000 prize money
5. Challenge
• Kaggle.com
• Competition platform for ‘data scientists’
• Challenges hosted for companies
• Prize money and exposure
• 400.000+ registered users
• Learn: Always someone smarter than you !
• Today’s state of the art is tomorrow’s baseline!
7. Deep learning
• Image data → Deep Learning (CNN)
• Neural networks 2.0
• Don’t believe ALL the hype
• Structured data → feature engineering + Tree/Lin
• Great when “perception” data is involved
• Spectacular results with image analysis
• My take: “Super human” with a twist
8. Solution • Step 1: Preprocessing
• Use DICOM info to make images uniform
• Crop around heart 180x180 (less distractions)
• For my solution less class imbalance
• Local contrast enhancement (CLAHE)
9. Solution
123ml
• Step 2: Train deep neural net
• Standard option: Regression with ‘Vanilla’ architecture.
• Approach used by most teams (ie. #2 Ghent university)
• Input slices, regress on provided volumes
10. Solution • Less publicized approach (mine): Segment images.
• Integrate estimated areas into volume using metadata.
• Problem: ‘No annotations provided.’ Sunnybrook/hand
11. Solution • Segmentation : Traditional architecture bad fit
• Every layer is higher level features less spatial info (BOW)
• Per pixel classification possible coarse due to spatial loss
• Cumbersome! H x W x 300.000 classifications.
12. Solution • Segmentation : Fully convolutional architecture + upscale
• Efficient. Classify all pixels at once
• Still problem spatial bottleneck at bottom : coarse
13. Solution • Segmentation : U-net architecture
• Skip connection give more detail in segmentation output
• Author works at Deepmind health now
• Resnet-like ?!?
14. Solution
• Segmentation results impressive.
• Machine did exactly what it was told.
• Confused with uncommon examples < 1%.
• Remedy : Active learning
• Nice property : brightness == (un)certainty
15. Solution • Last step: Integrate to volume.. should be simple
• Devil was in the details
PER PIXEL
SEGMENTATION
LEFT VENTRICLE
Y/N
SUM ALL PIXELS
AND USE
DICOM INFO TO
GET TO ML
100ML
...
...
...
...
n slices n overlays
16. Solution
• Devil in details: MUCH data cleaning
• Slice order
• Missing slices
• Out of bound slices
• Wrong orientation
• Missing frames
• BAD ground truth volumes
• Gradient boosting “calibration” procedure
• Not relevant in real setting. Just rescan MRI.
17. Results
• Result:
• 3rd place
• Only 1 model. No ensemble.
• Sub 10ml MAE → clinically significant
• Many improvements possible :
• More, cleaner train data
• Expert annotations
• Active learning
18. Appendix 1.
• Other approaches
• #1 Similar + 9 extra models
Segmentation, age, 4-chamber, regression on images etc.
• #2 Traditional, 250!! Models
Dynamic ensemble per patient
“Cool” end-to-end model
19. Appendix 2.
• U-nets and state of the art
• Potential successor dilated convolutions.
• No more bottleneck.
• Somewhat easier to use.
• Small improvements for personal project.
• Jury is still out.
• Kaggle: Ultrasound nerve segmentation
• U-nets was baseline and best solution.
• FCN also worked.
• No significant “discoveries”
• Dilated convolutions did not seem to work,
20. Appendix 3.
• Medical images challenges
• Deep learning => success
• Example: Kaggle retinopathy challenge
• As good as doctor (better in combination)
• Google deepmind (Jeffry De Fauw=Kaggler)
• Many other companies “copied” the solution
25. Competition
• Kaggle.com
• Competition platform for ‘data scientists’
• Challenges hosted for companies
• Prize money and exposure
• 400.000+ registered competitors
• Learn. Always someone smarter than you !
• Today’s state of the art is tomorrow’s baseline!
26. My background
• Julian de Wit
• Freelancer software / machine learning
• Technical University Delft : SE
• Biologically inspired computing / AI
• Since 2006 heavily re-interested in neural nets
• Looking for opportunities to test and bring in
practice
27. Approach
n slices n overlays
PER PIXEL
SEGMENTATI
ON LEFT
VENTRICLE
Y/N
CLEAN DATA
& SUM
...
...
...
...
PROVIDED
VOLUMES
CALIBRATE 110ML
28. Calibration
• Use provided volumes to calibrate
• Remove systematic errors
• Use Gradient Booster on residuals
• Top 5 -> top 3
• Beware of overfitting
29. Approach
• Every pixel: Left ventricle Yes/No
• Use convolutional neural network
• Sunnybrook too simplistic
• Train with hand-labeled segmentations
• Reverse engineer how to label
• Fix systematic errors with calibration against
provided volumes.
38. Submission
• CRPS
• Uncertainty based on stdev in error as a
function of size.
• Model provided uncertainty.
• However does not account for uncertainty in
labels
• Example: patient 429. Error of 89ml !!!
• Provided label was wrong…