SlideShare a Scribd company logo
1 of 46
Future of Data Science
as a profession
Jose Quesada, Director, Data Science Retreat
@datascienceret
http://datascienceretreat.com/
The promise
The machine learning promise
People should be able to predict:
• Which employee will leave in the next 6 months
• Which electric generator is likely to die in the next 2 weeks
• Which sales lead has the highest potential to close in the next 3
months
• What each new website visitor is likely to buy based on past visitors
http://www.slideshare.net/bigml/the-past-present-and-future-of-machine-learning-apis
Jao. The Past, Present, and Future of Machine Learning APIs
http://www.enlitic.com/healthcare.html
Smile detection
Example Graduate portfolio project from DSR
03. Smile detection on video streams. Works
reliably with multiple people on cam.
Applications: youtube funny video evaluation
Data analysis has become super easy.
But has it?
• Great libraries exist with every algorithm under the sun
The machine learning promise
(Anyone who can turn on a computer) should be able
to predict:
• Which employee will leave in the next 6 months
• Which electric generator is likely to die in the next 2 weeks
• Which sales lead has the highest potential to close in the next 3 months
• What each new website visitor is likely to buy based on past visitors
Future of data science as a profession
Paco Nathan: Data Science in future tense
Future of data science as a profession
Why data analysis is still
hard, after all the libraries
and APIs
Andreas Mueller’s map
Trent McConaghy’s riff on Andy
http://trent.st/ffx/
Two machine learners, two maps
Andreas Mueller, PhD
Andy is an Assistant Research Scientist
at the NYU Center for Data Science,
building a group to work on open
source software for data science.
Previously I was a Machine Learning
Scientist at Amazon, working on
computer vision and forecasting
problems. I am one of the core
developers of the scikit-learn machine
learning library, and have maintained
it for several years.
Authored the now famous model
picker image from scikit-learn
Trent McConaghy, PhD
Trent is co-founder & CTO of ascribe,
which uses modern crypto, ML, and
big data to tackle challenges in digital
property ownership. His two startups
applied ML in the enterprise semi-
conductor space: ADA was acquired in
2004 and Solido is going strong. His
interests include large scale
regression, automating creativity,
anything labeled "impossible", and
thousand-fold improvements. He was
raised on a pig farm in Canada.
Why data analysis is still hard, after
all the libraries and APIs
• It’s too easy to lie to yourself about it working
• It’s very hard to tell whether it could work if it doesn’t
• There is no free lunch
http://blog.mikiobraun.de/2014/02/data-analysis-hard-
parts.html
No free lunch theorem
• There is no universally optimal learning algorithm as
shown by the No Free Lunch Theorem: There is no
algorithm which is better than all the rest for all kinds
of data.
“Toolified”
• As more and more ML techniques become "toolified" the
problem is that the business doesn't understand that the
hard work is still ahead of them.
• Home Depot sells hammers and lumber, and while some
people have the skill and dedication to build their own
house, most folks are smart enough to hire someone that
knows what they're doing so the thing doesn't fall in and kill
their family.
• Blind faith in the power of tools is not helpful
80 % data mangling 20 % building & testing
models
Is model building automatable?
How about the data Wrangling part? It’s actually a larger chunk
Automating the data
scientist
Machine learning APIs
Machine learning for data Wrangling
• Zoubin Ghahramani, Automatic statistician
• It's easy to shoot yourself in the foot with automated
tools — and convince yourself that the results are
meaningful when they're not
Alternative:
interfaces that draw
the most useful
information out of
people
Aka ‘The Luis von Ahn trick’.
Human computation: combine
human brainpower with computers
to solve problems that neither could
solve alone.
ReCAPTCHA: Computer-generated
tests that humans are routinely able
to pass but that computers have not
yet mastered.
Actionable advice for
individuals
Goal
• Become a full-stack problem solver
• AKA the unicorn data scientist
How to get there
• Focus on delivering business value
How to get there
Only after the business side is covered: focus on the tech
stack.
• Machine learning
• Big data/ engineering
• When to use ML at scale, when to sample and run on a single
machine
Constant learning
• The field changes faster than any other in technology
• If you are not willing to allocate ‘time outside work’ to
learn new things you will stagnate fast
Not being the equivalent to a code
monkey
• MOOC haven decreased the barrier of entry to machine-
learning.
• Nowadays, you cannot be ‘the guy who knows how to
run (insert off-the-shelf-algo-here)’. In dataland, that’s
the equivalent to being a code monkey. MOOCs and
superb libraries (scikit-learn, R’s ecosystem) made sure
there is plenty of people who can throw say a random
forest to a problem. In the modern world, this is not
adding that much value.
Picking problems to add the most
value
• Sometimes beating what the company is already doing
(often, nothing) offers a lot of value. Detecting fraud
poorly is better than not detecting fraud
Data Science will continue to be
democratized
• There’s no shortage of data
scientists.
• 1900: Number of cars on the
road would be limited by the
supply of trained chauffeurs.
Machine learning can very quickly get
you, say, 80% of the way to solving just
about any (real world) problem
You want to apply ML to contexts that are fault tolerant:
• Online ad targeting
• Ranking search results
• Recommendations
• Spam filtering
ML quickly hits a point of
diminishing returns
“The gain is not worth the pain."
Actionable advice for
companies
Talent: invest in it
• The hunt for the 10x programmer continues (although
few companies succeed)
• In data science, the equivalent is the unicorn data
scientist
• Unicorn data scientist should generate more business
value than a 10x programmer
• Market agrees: supersalaries of >200k are common for
unicorn data scientists
Talent: beware of the fake data
scientist
• Each linkedin job ad for data scientist gets ~150
applications
• Often people who just rebranded themselves but have no
real experience
• Very common in guys bailing out of academia
• HR managers cannot tell the difference
• It’s a common mistake to hire one, and never be able to
produce business value
Talent: easier to find than you may
think
• Online courses have raised the bar
• Intensive bootcamps do work, as long as people have
built something at the end
• You will still get 150 fake data scientist for each decent
one
A future where ML has
been popular for years.
How does it look like?
Next 3 years
• ML APIs will enable people with less and less skill to run
quite sophisticated analyses
• Startups doing ML as a service will grow up, then
contract. ML will stop being a key competitive
advantage on most (not all) domains
• Blind faith in the power of tools will lead to wrong
decisions, which will lead to a backslash
Next 10 years
• Prediction: C-level people will be data scientists in the
future
• Product managers become a data scientist, or get
replaced by one
DS is a chaotic field and
people don’t really know
what they want (much less
what they need)
Interested in Data Science Retreat?
Apply to any of our two tracks
http://datascienceretreat.com/
Future of data science as a profession
Thank You!
Jose Quesada, PhD
Director, Data Science Retreat
@datascienceret
me@josequesada.com
References
• Paco Nathan. Data science in future tense
• Chris Dixon Machine learning is really good at partially
solving just about any problem
• Jao. The Past, Present, and Future of Machine Learning
APIs

More Related Content

What's hot

How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6Zhihao Lin
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Edureka!
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science TeamsGanes Kesari
 
Implementing Data Science
Implementing Data ScienceImplementing Data Science
Implementing Data ScienceNathan Watson
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in dataDavid Rostcheck
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyLyn Fenex
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationChasity Gibson
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 

What's hot (20)

How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6How to build a data science team 20115.03.13v6
How to build a data science team 20115.03.13v6
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
The Big Data Dream Team
The Big Data Dream TeamThe Big Data Dream Team
The Big Data Dream Team
 
Implementing Data Science
Implementing Data ScienceImplementing Data Science
Implementing Data Science
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.io
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the Organization
 
AskAndy Anything 2016
AskAndy Anything 2016AskAndy Anything 2016
AskAndy Anything 2016
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 

Viewers also liked

R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009Jose Quesada
 
Wave Hackathon Intro
Wave Hackathon IntroWave Hackathon Intro
Wave Hackathon IntroJose Quesada
 
A quick overview of the available reference managers2010
A quick overview of the available reference managers2010A quick overview of the available reference managers2010
A quick overview of the available reference managers2010Jose Quesada
 
Irmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websIrmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websJose Quesada
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future TensePaco Nathan
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on MesosPaco Nathan
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapePaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 

Viewers also liked (20)

R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
 
Wave Hackathon Intro
Wave Hackathon IntroWave Hackathon Intro
Wave Hackathon Intro
 
A quick overview of the available reference managers2010
A quick overview of the available reference managers2010A quick overview of the available reference managers2010
A quick overview of the available reference managers2010
 
Irmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websIrmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data webs
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future Tense
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscapeHow Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 

Similar to Future of data science as a profession

EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session Steve Ardire
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learningMax Pagels
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLBritney Muller
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesTathagat Varma
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninInside Analysis
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Dhiana Deva
 
Industrial revolution 4.0
Industrial revolution 4.0 Industrial revolution 4.0
Industrial revolution 4.0 Aditya Randika
 
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Agence du Numérique (AdN)
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Diego Oppenheimer
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
Why analytics projects fail
Why analytics projects failWhy analytics projects fail
Why analytics projects failDr. Bülent Dal
 
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?Haluk Demirkan
 
Career options in Artificial Intelligence : 2020
Career options in Artificial Intelligence : 2020Career options in Artificial Intelligence : 2020
Career options in Artificial Intelligence : 2020Venkatarangan Thirumalai
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCapgemini
 
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...NUS-ISS
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 

Similar to Future of data science as a profession (20)

EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & Challenges
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine Learnin
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
Industrial revolution 4.0
Industrial revolution 4.0 Industrial revolution 4.0
Industrial revolution 4.0
 
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
Salesforce Einstein: Use Cases and Product Features
Salesforce Einstein: Use Cases and Product FeaturesSalesforce Einstein: Use Cases and Product Features
Salesforce Einstein: Use Cases and Product Features
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
Why analytics projects fail
Why analytics projects failWhy analytics projects fail
Why analytics projects fail
 
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
 
Career options in Artificial Intelligence : 2020
Career options in Artificial Intelligence : 2020Career options in Artificial Intelligence : 2020
Career options in Artificial Intelligence : 2020
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pub
 
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
SkillsFuture Festival at NUS 2019- Artificial Intelligence for Everyone - A P...
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 

Recently uploaded

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 

Recently uploaded (20)

The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 

Future of data science as a profession

  • 1. Future of Data Science as a profession Jose Quesada, Director, Data Science Retreat @datascienceret http://datascienceretreat.com/
  • 3. The machine learning promise People should be able to predict: • Which employee will leave in the next 6 months • Which electric generator is likely to die in the next 2 weeks • Which sales lead has the highest potential to close in the next 3 months • What each new website visitor is likely to buy based on past visitors
  • 6. Smile detection Example Graduate portfolio project from DSR 03. Smile detection on video streams. Works reliably with multiple people on cam. Applications: youtube funny video evaluation
  • 7. Data analysis has become super easy. But has it? • Great libraries exist with every algorithm under the sun
  • 8. The machine learning promise (Anyone who can turn on a computer) should be able to predict: • Which employee will leave in the next 6 months • Which electric generator is likely to die in the next 2 weeks • Which sales lead has the highest potential to close in the next 3 months • What each new website visitor is likely to buy based on past visitors
  • 10. Paco Nathan: Data Science in future tense
  • 12. Why data analysis is still hard, after all the libraries and APIs
  • 14. Trent McConaghy’s riff on Andy http://trent.st/ffx/
  • 15. Two machine learners, two maps Andreas Mueller, PhD Andy is an Assistant Research Scientist at the NYU Center for Data Science, building a group to work on open source software for data science. Previously I was a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. I am one of the core developers of the scikit-learn machine learning library, and have maintained it for several years. Authored the now famous model picker image from scikit-learn Trent McConaghy, PhD Trent is co-founder & CTO of ascribe, which uses modern crypto, ML, and big data to tackle challenges in digital property ownership. His two startups applied ML in the enterprise semi- conductor space: ADA was acquired in 2004 and Solido is going strong. His interests include large scale regression, automating creativity, anything labeled "impossible", and thousand-fold improvements. He was raised on a pig farm in Canada.
  • 16. Why data analysis is still hard, after all the libraries and APIs • It’s too easy to lie to yourself about it working • It’s very hard to tell whether it could work if it doesn’t • There is no free lunch http://blog.mikiobraun.de/2014/02/data-analysis-hard- parts.html
  • 17. No free lunch theorem • There is no universally optimal learning algorithm as shown by the No Free Lunch Theorem: There is no algorithm which is better than all the rest for all kinds of data.
  • 18. “Toolified” • As more and more ML techniques become "toolified" the problem is that the business doesn't understand that the hard work is still ahead of them. • Home Depot sells hammers and lumber, and while some people have the skill and dedication to build their own house, most folks are smart enough to hire someone that knows what they're doing so the thing doesn't fall in and kill their family. • Blind faith in the power of tools is not helpful
  • 19. 80 % data mangling 20 % building & testing models Is model building automatable? How about the data Wrangling part? It’s actually a larger chunk
  • 22. Machine learning for data Wrangling
  • 23. • Zoubin Ghahramani, Automatic statistician • It's easy to shoot yourself in the foot with automated tools — and convince yourself that the results are meaningful when they're not
  • 24. Alternative: interfaces that draw the most useful information out of people Aka ‘The Luis von Ahn trick’. Human computation: combine human brainpower with computers to solve problems that neither could solve alone. ReCAPTCHA: Computer-generated tests that humans are routinely able to pass but that computers have not yet mastered.
  • 26. Goal • Become a full-stack problem solver • AKA the unicorn data scientist
  • 27. How to get there • Focus on delivering business value
  • 28. How to get there Only after the business side is covered: focus on the tech stack. • Machine learning • Big data/ engineering • When to use ML at scale, when to sample and run on a single machine
  • 29. Constant learning • The field changes faster than any other in technology • If you are not willing to allocate ‘time outside work’ to learn new things you will stagnate fast
  • 30. Not being the equivalent to a code monkey • MOOC haven decreased the barrier of entry to machine- learning. • Nowadays, you cannot be ‘the guy who knows how to run (insert off-the-shelf-algo-here)’. In dataland, that’s the equivalent to being a code monkey. MOOCs and superb libraries (scikit-learn, R’s ecosystem) made sure there is plenty of people who can throw say a random forest to a problem. In the modern world, this is not adding that much value.
  • 31. Picking problems to add the most value • Sometimes beating what the company is already doing (often, nothing) offers a lot of value. Detecting fraud poorly is better than not detecting fraud
  • 32. Data Science will continue to be democratized • There’s no shortage of data scientists. • 1900: Number of cars on the road would be limited by the supply of trained chauffeurs.
  • 33. Machine learning can very quickly get you, say, 80% of the way to solving just about any (real world) problem You want to apply ML to contexts that are fault tolerant: • Online ad targeting • Ranking search results • Recommendations • Spam filtering
  • 34. ML quickly hits a point of diminishing returns “The gain is not worth the pain."
  • 36. Talent: invest in it • The hunt for the 10x programmer continues (although few companies succeed) • In data science, the equivalent is the unicorn data scientist • Unicorn data scientist should generate more business value than a 10x programmer • Market agrees: supersalaries of >200k are common for unicorn data scientists
  • 37. Talent: beware of the fake data scientist • Each linkedin job ad for data scientist gets ~150 applications • Often people who just rebranded themselves but have no real experience • Very common in guys bailing out of academia • HR managers cannot tell the difference • It’s a common mistake to hire one, and never be able to produce business value
  • 38. Talent: easier to find than you may think • Online courses have raised the bar • Intensive bootcamps do work, as long as people have built something at the end • You will still get 150 fake data scientist for each decent one
  • 39. A future where ML has been popular for years. How does it look like?
  • 40. Next 3 years • ML APIs will enable people with less and less skill to run quite sophisticated analyses • Startups doing ML as a service will grow up, then contract. ML will stop being a key competitive advantage on most (not all) domains • Blind faith in the power of tools will lead to wrong decisions, which will lead to a backslash
  • 41. Next 10 years • Prediction: C-level people will be data scientists in the future • Product managers become a data scientist, or get replaced by one
  • 42. DS is a chaotic field and people don’t really know what they want (much less what they need)
  • 43. Interested in Data Science Retreat? Apply to any of our two tracks http://datascienceretreat.com/
  • 45. Thank You! Jose Quesada, PhD Director, Data Science Retreat @datascienceret me@josequesada.com
  • 46. References • Paco Nathan. Data science in future tense • Chris Dixon Machine learning is really good at partially solving just about any problem • Jao. The Past, Present, and Future of Machine Learning APIs

Editor's Notes

  1. It was almost a joke Too much email asking the ‘When to do what’ question
  2. IF YOU thought sci-kit learn was convenient 
  3. What is business value? If you have been in academia or away from a customer-facing role most of your career, you probably don’t have good intuitions abut this. Sure-fire way to learn is to start a business. Or take a customer-facing role. Even so it may take years to know your market
  4. What is business value? If you have been in academia or away from a customer-facing role most of your career, you probably don’t have good intuitions abut this. Sure-fire way to learn is to start a business. Or take a customer-facing role. Even so it may take years to know your market
  5. The discussion about the shortage of Data Scientists reminds me that in the early 1900s people thought that the number of cars on the road would be limited by the supply of trained chauffeurs. Then Henry Ford and others built cars that owners could drive themselves. New tools are going to be available that business owners can use themselves without need data scientists  
  6. you need to apply ML to contexts that are fault tolerant: online ad targeting, ranking search results, Recommendations spam filtering.