Kaggle is a community of almost 400K data scientists who have built almost 2MM machine learning models to participate in our competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons we have learned from the Kaggle community.
2. GE Flight Quest 2
Optimize flight routes based
on weather & traffic
$250,000
122 teams
Hewlett Foundation: Automated Essay Scoring
Develop an automated scoring algorithm
for student-written essays
$100,000
155 teams
Allstate Purchase Prediction Challenge
Develop an automated scoring algorithm
for student-written essays
$50,000
1,570 teams
Merck Molecular Activity Challenge
Help develop safe and effective medicines
by predicting molecular activity
$40,000
236 teams
Higgs Boson Machine Learning Challenge
Use the ATLAS experiment to
identify the Higgs boson
$13,000
1,302 teams
3. Age Income Default
58 $95,824 True
73 $20,708 False
59 $82,152 False
66 $25,334 True
Age Income Default
73 $53,445
61 $36,679
47 $90,422
44 $79,040
Training Data Test Data
The Kaggle Approach
4.
5. Mapping Dark Matter
Competition Progress
Accuracy
(lower is better)
Week 1 Week 3 Week 5 Week 7 End
.0150
.0170
Martin O’Leary
PhD student in Glaciology, Cambridge U
6. “In less than a week, Martin O’Leary,
a PhD student in glaciology,
outperformed the state-of-the-art
algorithms”
“The world’s brightest physicists have
been working for decades on solving
one of the great unifying problems of
our universe”
7. Mapping Dark Matter
Competition Progress
Accuracy
(lower is better)
Week 1 Week 3 Week 5 Week 7 End
.0150
.0170
Martin O’Leary
PhD student in Glaciology, Cambridge U
Marius Cobzarenco
Grad student in computer vision, UC London
Ali Haissaine & Eu Jin Loc
Signature Verification, Qatar U & Grad Student @ Deloitte
Other
deepZot (David Kirkby & Daniel Margala)
Particle Physicist & Cosmologist
8. We’ve worked with
many of the
world’s largest
companies
Healthcare &
Pharma
Consumer
Internet
Finance IndustrialConsumer
Marketing
Oil
& Gas
$50b+
Beverage
Co.
Global
Bank
Top
Credit
Card
Issuer
Top 5 E&P
Top 20 E&P
9.
10. That submit over
100K machine
learning models
per month
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
May-10 May-11 May-12 May-13 May-14 May-15
Monthly Submissions to Kaggle Competitions
People don’t come to us with churn or cross sell, but they typically come to us with their hardest problems, and I’ll talk more about this soon.
It’s for these reasons that we continue to invest in the competition platform. It’s a very efficient operation. It’s currently running with a headcount of 4. We believe 6 is the right long term number of people to invest in competitions.
We decided to focus on Oil & Gas because after working with ~25 Fortune 500s and 12 industries, we believe it’s the biggest opportunity for machine learning and most ripe for disruption. Specifically because:
Greatest value add: Huge gap between what they’re doing and what’s possible
Shale is disruptive: the industry is looking for new ideas making it a good environment to be selling into.
We score their solutions in real time.
Kaggle Competitions – breakeven business
Access to most advanced and proven techniques
Recruiting the very best of a scarce resource
C-level access from leadership positioning in media
Kaggle Competitions – breakeven business
Access to most advanced and proven techniques
Recruiting the very best of a scarce resource
C-level access from leadership positioning in media
Kaggle Competitions – breakeven business
Access to most advanced and proven techniques
Recruiting the very best of a scarce resource
C-level access from leadership positioning in media
Kaggle Competitions – breakeven business
Access to most advanced and proven techniques
Recruiting the very best of a scarce resource
C-level access from leadership positioning in media
Kaggle Competitions – breakeven business
Access to most advanced and proven techniques
Recruiting the very best of a scarce resource
C-level access from leadership positioning in media
Kaggle Competitions – breakeven business
Access to most advanced and proven techniques
Recruiting the very best of a scarce resource
C-level access from leadership positioning in media
Kaggle Competitions – breakeven business
Access to most advanced and proven techniques
Recruiting the very best of a scarce resource
C-level access from leadership positioning in media