Why Data is becoming a competitive advantage in all verticals.
Introduction to Data Science given to ESCP Europe Master 2 in Feb 15'.
Martin DANIEL - @martindaniel4
RABBIT: A CLI tool for identifying bots based on their GitHub events.
Introduction to Data Science - ESCP Europe
1. Introduction to Data Science
MS - ESCP Europe - 14/02/15
Martin Daniel
@martindaniel4Source : www.d3js.org
2. Entrepreneurship
Media solutions
Head of Data Science
Founder (100 m texts processed)
Lecturer 1 week - data science Bootcamp
Founder
Organizer d3js / Data For Good
2010
2011
2015
Martin Daniel
@martindaniel4
8. Martin Daniel
@martindaniel4
Percentage of proprietary statin prescribing by CCG, UK
NHS, prescribinganalytics.com, 2011 - 2012
> 200 million £ saving / year
# Healthcare
9. Martin Daniel
@martindaniel4
Google Flu Tracker vs CDC Official Report
Google.org, 2011 - 2012
t(Google) = t(CDC) - 2 weeks
# Healthcare
Source : www.google.org/flutrends/
11. « Machine Learning is the field of study that
gives computers the ability to learn without
being explicitly programmed. »
Arthur Samuel, 1959
Martin Daniel
@martindaniel4
12. Martin Daniel
@martindaniel4
More Data Beats Usually Better Algorithms
Banko & Brill, Microsoft Research, 2001
e.g : Choose between {to, two, too}
For breakfast I ate __ eggs
Source : Michele Banko , Eric Brill, Scaling to Very Very Large Corpora for Natural Language Disambiguation (2001)
14. Dans quel ordre afficher des
produits dans une liste ?
Martin Daniel
@martindaniel4Source : fifty-five.com
15. Taux de clic par position et par liste
juin 2013
• Liste = Carrefour d’audience
• Merchandising manuel ou générique
• Problématique large, tous secteurs
Source : fifty-five.com
Martin Daniel
@martindaniel4
16. Taux d’ajout au panier cumulés robes 1 / robes 2
3S – nov 2013 – fev. 2014
Taux de passage cumulés robes 1 / robes 2
3S – nov 2013 – fev. 2014
Source : fifty-five.com
Martin Daniel
@martindaniel4
19. « Husky »
{freq: 15,
aff: 0.4,
cat: abc,
res: 10,
loc: 1,
typo:Quoi
Etc..}
Modélisation
Ok
Quoi > Qui
Qui > Quoi
Logs Processing Visualisation
Martin Daniel
@martindaniel4
Training
Training
Production