SlideShare a Scribd company logo
1 of 42
Intelligently Automating
Machine Learning, Artificial
Intelligence, and Data
Science Processes

Ali ALKAN
Co-Founder & Principal Data Scientist
ADVANCETICS B.V.
ali.alkan@advancetics.com
Twitter / Ali_Alkan
7 December 2018
Agenda
Machine Learning, Artificial Intelligence, and Data Science
Phases of Data Science Projects and CRISP-DM
Guided Analytics Approach for Data Science Processes
A Guided Analytics Application with KNIME Analytics Platform
Q&A Session
ML vs. AI vs. DS?
Data Science produces insights
Machine Learning produces predictions
ML vs. AI vs. DS?
Data Science produces insights
Machine Learning produces predictions
Artificial Intelligence produces actions
What is Artificial Intelligence?
• Artificial Narrow Intelligence (ANI): Machine
intelligence that equals or exceeds human
intelligence or efficiency at a specific task.
• Artificial General Intelligence (AGI): A machine
with the ability to apply intelligence to any
problem, rather than just one specific problem
(human-level intelligence).
• Artificial Superintelligence (ASI): An intellect that
is much smarter than the best human brains in
practically every field, including scientific
creativity, general wisdom and social skills.
Machine Learning | Introduction
• Machine Learning is a type of Artificial Intelligence that provides
computers with the ability to learn without being explicitly programmed.
• Provides various techniques that can learn from and make predictions on
data.
Machine Learning | Learning Approaches
Supervised Learning: Learning with a labeled
training set
• Example: email spam detector with training set
of already labeled emails
Unsupervised Learning: Discovering patterns
in unlabeled data
• Example: cluster similar documents based on
the text content
Reinforcement Learning: learning based on
feedback or reward
• Example: learn to play chess by winning or
losing
Outlook | Traditional Programming
Outlook | Machine Learning
Outlook | Goal-based AI
CRISP - DMCross Industry Standard for Data Mining
The CRISP-DM methodology provides a
structured approach to planning a data mining
project.
It is a robust and well-proven methodology.
It is powerful practical, flexible and useful
when using analytics to solve business issues.
This model is an idealised sequence of events.
In practice many of the tasks can be performed
in a different order and it will often be
necessary to backtrack to previous tasks and
repeat certain actions.
CRISP-DM | Definition
CRISP-DM | Business Understanding
The first stage of the CRISP-DM process
is to understand what you want to
accomplish from a business
perspective.
The goal of this stage of the process is to
uncover important factors that
could influence the outcome of the
project.
Neglecting this step can mean that a
great deal of effort is put into producing
the right answers to the wrong questions.
CRISP-DM | Data Understanding
The second stage of the CRISP-DM
process requires you to acquire the data
listed in the project resources.
This initial collection includes data loading,
if this is necessary for data understanding.
• For example, if you use a specific tool for
data understanding, it makes perfect
sense to load your data into this tool.
• If you acquire multiple data sources then
you need to consider how and when
you're going to integrate these.
All steps from the raw data to the final dataset
Final dataset:
used for statistical modeling
sometimes called ADS (analytical dataset)
Includes or can include:
• data source selection and loading
• table selection and loading
• joining data sources
• data cleaning (missing values, outliers, ...)
• feature generation and data transformation
• taking samples of data
• …
CRISP-DM | Data Preparation
CRISP-DM | Modeling
CRISP-DM | Evaluation
CRISP-DM | Deployment
CRISP - DM
Cross Industry Standard for Data Mining
80 - 20 Rule!
Time Consuming : %20
Success Factor : %80
Source: Berthold, Borgelt, Höppner, Klawonn: Guide to Intelligent Data Analysis, Springer 2011
Sharing Tools
Sharing Skills
Sharing Responsibility
A new generation of tools
They can build their own reports
A recipe for disaster
Data is viral - everybody wants it
Start small and just do it
Source: Phil Winters
Machine
Learning
Guided Analytics
Guided Analytics | Introduction
• Systems that automate the data science cycle
have been gaining a lot of attention recently.
• Those tools often automate only a few phases
of the cycle, have a tendency to consider just a
small subset of available models, and are limited
to relatively straightforward, simple data formats.
• Automation should not result in black boxes,
hiding the interesting pieces from everyone; the
modern data science environment should allow
automation and interaction to be combined
flexibly.
Guided Analytics | Definition
• Allowing data scientists to build
interactive systems, interactively
assisting the business analyst in her
quest to find new insights in data and
predict future outcomes.
Guided Analytics | Definition
• We explicitly do not aim to replace the
driver (or totally automate the process) but
instead offer assistance and carefully
gather feedback whenever needed
throughout the analysis process.
• To make this successful, the data scientist
needs to be able to easily create powerful
analytical applications that allow
interaction with the business user
whenever their expertise and feedback is
needed.
Guided Analytics | Environments
Openness
Uniformity
Flexibility
Agility
Guided Analytics | Environments
Openness:
• The environment does not post restrictions in terms of
tools used – this also simplifies collaboration between
scripting gurus (such as R or Python) and others who just
want to reuse their expertise without diving into their
code.
• Obviously being able to reach out to other tools for specific
data types (text, images, …) or specialized high
performance or big data algorithms (such as H2O or
Spark) from within the same environment would be a plus;
Uniformity
Flexibility
Agility
Guided Analytics | Environments
Openness
Uniformity:
The experts creating data science can do it all in
the same environment:
• blend data,
• run the analysis,
• mix & match tools,
• build the infrastructure to deploy this as analytical
application;
Flexibility
Agility
Guided Analytics | Environments
Openness
Uniformity
Flexibility:
• Underneath the analytical application, we
can run simple regression models or
orchestrate complex parameter
optimization and ensemble models –
ranging from one to thousands of models.
Agility
Guided Analytics | Environments
Openness
Uniformity
Flexibility
Agility:
• Once the application is used in the wild, new demands
will arise quickly: more automation here, more consumer
feedback there.
• The environment that is used to build these analytical
applications needs to make it intuitive for other members
of the data science team to quickly adapt the existing
analytical applications to new and changing
requirements.
Guided Analytics | Auto-what?
• So how do all of those driverless, automatic, automated AI or
machine learning systems fit into this picture?
• Their goal is either to encapsulate (and hide!) existing expert data
scientists’ expertise or apply more or less sophisticated
optimization schemes to the fine-tuning of the data science tasks.
Guided Analytics | Auto-what?
• Obviously, this can be useful if no in-house data science expertise is available but in
the end, the business analyst is locked into the pre-packaged expertise and the
limited set of hard coded scenarios.
• Both, data scientist expertise and parameter optimization can easily be part of a
Guided Analytics workflow as well.
• And since automation of whatever kind tends to always miss the important and interesting
piece, adding a Guided Analytics component to this makes it even more powerful: we can
guide the optimization scheme and we can adjust the pre-coded expert knowledge to
the new task at hand.
Data Sciense Project | Roles
www.sistek.com.tr
• Data scientists
– Workflow development
– Data Analysis
– Model Development
• Business analysts
– WebPortal
– Data Analysis
• IT administrators
– Enterprise Architecture Mngmt
– Cloud solution provider
5.Data Science Project –Roles
Data Science Project | Data Scientist
www.sistek.com.tr
Responsible for:
• Creating, updating workflows
• Creating, maintaining metanode
templates
• Building, evaluating, monitoring data
and using ad hoc developed
workflows
• Development of WebPortal
applications
5.Data Science Project – Data Scientists
Demo
About KNIME
KNIME is a software for fast, easy and intuitive access to advanced
data science, helping organizations drive innovation.
KNIME Analytics Platform is the leading open solution for data-
driven innovation, designed for discovering the potential hidden in
data, mining for fresh insights, or predicting new futures.
Organizations can take their collaboration, productivity and
performance to the next level with a robust range of commercial
extensions to Knime open source platform.
For over a decade, a thriving community of data scientists in over
60 countries has been working with Knime platform on every kind of
data: from numbers to images, molecules to humans, signals to
complex networks, and simple statistics to big data analytics.
KNIME’s headquarters are based in Zurich, with additional offices
in Konstanz, Berlin, and Austin.
Chicago O'Hare International Airport (ORD)
Guided Analytics | Design
The workflow defines a fully automated web based application to
select, train, test, and optimize a number of machine learning
models.
The workflow is designed for business analysts to easily create
predictive analytics solutions by applying their domain knowledge.
Each of the wrapped metanodes outputs a web page with which the
business analyst can interact.
Guided Analytics | Application
Sources
๏ Christian Dietz, Paolo Tamagnini, Simon Schmid, Michael Berthold: Intelligently
Automating Machine Learning, Artificial Intelligence, and Data Science,
https://www.knime.com/blog
๏ Berthold, Borgelt, Höppner, Klawonn: Guide to Intelligent Data Analysis, Springer 2011
๏ Michael Berthold: Principles of Guided Analytics, https://www.knime.com/blog
๏ Ali Alkan: Veri Madenciliği Teknikleri, Eğitim Notları 2017
๏ Ali Alkan: İleri Analitik Teknikler Seminerleri 1-2-3-5-6-7, Seminer Notları 2016-17
Ali ALKAN
Twitter @Ali_Alkan
ali.alkan@advancetics.com
Thank you!

More Related Content

What's hot

A Space X Industry Day Briefing 7 Jul08 Jgm R4
A Space X Industry Day Briefing 7 Jul08 Jgm R4A Space X Industry Day Briefing 7 Jul08 Jgm R4
A Space X Industry Day Briefing 7 Jul08 Jgm R4jmorriso
 
20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project ManagersTze-Yiu Yong
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptopRising Media, Inc.
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteRoger Barga
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsSri Ambati
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
The (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology residentThe (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology residentPedro Staziaki
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterDomino Data Lab
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupCaserta
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approachjoshwills
 
Why Data Science Projects Fail?
Why Data Science Projects Fail?Why Data Science Projects Fail?
Why Data Science Projects Fail?Ethan Ram
 
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramH2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramSri Ambati
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2Cdiscount
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects FailSense Corp
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 

What's hot (20)

A Space X Industry Day Briefing 7 Jul08 Jgm R4
A Space X Industry Day Briefing 7 Jul08 Jgm R4A Space X Industry Day Briefing 7 Jul08 Jgm R4
A Space X Industry Day Briefing 7 Jul08 Jgm R4
 
20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project Managers
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop
 
10 best practices in operational analytics
10 best practices in operational analytics 10 best practices in operational analytics
10 best practices in operational analytics
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientists
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
The (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology residentThe (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology resident
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with Jupyter
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Building Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball ApproachBuilding Data Science Teams: A Moneyball Approach
Building Data Science Teams: A Moneyball Approach
 
Why Data Science Projects Fail?
Why Data Science Projects Fail?Why Data Science Projects Fail?
Why Data Science Projects Fail?
 
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramH2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects Fail
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 

Similar to Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science Processes

Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teamsVenkatesh Umaashankar
 
Philips john huffman
Philips john huffmanPhilips john huffman
Philips john huffmanBigDataExpo
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceData Science Milan
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
 
WELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptxWELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptx9D38SHIDHANTMITTAL
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationAnalytics8
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiProfessor Lili Saghafi
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy Hussain Sultan
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 
Self Service Outline Updated 8 js
Self Service Outline Updated 8 jsSelf Service Outline Updated 8 js
Self Service Outline Updated 8 jsJulia Smith
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 

Similar to Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science Processes (20)

Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Philips john huffman
Philips john huffmanPhilips john huffman
Philips john huffman
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROI
 
WELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptxWELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptx
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics Modernization
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Self Service Outline Updated 8 js
Self Service Outline Updated 8 jsSelf Service Outline Updated 8 js
Self Service Outline Updated 8 js
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 

Recently uploaded

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science Processes

  • 1. Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science Processes Ali ALKAN Co-Founder & Principal Data Scientist ADVANCETICS B.V. ali.alkan@advancetics.com Twitter / Ali_Alkan 7 December 2018
  • 2. Agenda Machine Learning, Artificial Intelligence, and Data Science Phases of Data Science Projects and CRISP-DM Guided Analytics Approach for Data Science Processes A Guided Analytics Application with KNIME Analytics Platform Q&A Session
  • 3. ML vs. AI vs. DS? Data Science produces insights Machine Learning produces predictions
  • 4. ML vs. AI vs. DS? Data Science produces insights Machine Learning produces predictions Artificial Intelligence produces actions
  • 5. What is Artificial Intelligence? • Artificial Narrow Intelligence (ANI): Machine intelligence that equals or exceeds human intelligence or efficiency at a specific task. • Artificial General Intelligence (AGI): A machine with the ability to apply intelligence to any problem, rather than just one specific problem (human-level intelligence). • Artificial Superintelligence (ASI): An intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills.
  • 6. Machine Learning | Introduction • Machine Learning is a type of Artificial Intelligence that provides computers with the ability to learn without being explicitly programmed. • Provides various techniques that can learn from and make predictions on data.
  • 7. Machine Learning | Learning Approaches Supervised Learning: Learning with a labeled training set • Example: email spam detector with training set of already labeled emails Unsupervised Learning: Discovering patterns in unlabeled data • Example: cluster similar documents based on the text content Reinforcement Learning: learning based on feedback or reward • Example: learn to play chess by winning or losing
  • 8. Outlook | Traditional Programming
  • 9. Outlook | Machine Learning
  • 11. CRISP - DMCross Industry Standard for Data Mining
  • 12. The CRISP-DM methodology provides a structured approach to planning a data mining project. It is a robust and well-proven methodology. It is powerful practical, flexible and useful when using analytics to solve business issues. This model is an idealised sequence of events. In practice many of the tasks can be performed in a different order and it will often be necessary to backtrack to previous tasks and repeat certain actions. CRISP-DM | Definition
  • 13. CRISP-DM | Business Understanding The first stage of the CRISP-DM process is to understand what you want to accomplish from a business perspective. The goal of this stage of the process is to uncover important factors that could influence the outcome of the project. Neglecting this step can mean that a great deal of effort is put into producing the right answers to the wrong questions.
  • 14. CRISP-DM | Data Understanding The second stage of the CRISP-DM process requires you to acquire the data listed in the project resources. This initial collection includes data loading, if this is necessary for data understanding. • For example, if you use a specific tool for data understanding, it makes perfect sense to load your data into this tool. • If you acquire multiple data sources then you need to consider how and when you're going to integrate these.
  • 15. All steps from the raw data to the final dataset Final dataset: used for statistical modeling sometimes called ADS (analytical dataset) Includes or can include: • data source selection and loading • table selection and loading • joining data sources • data cleaning (missing values, outliers, ...) • feature generation and data transformation • taking samples of data • … CRISP-DM | Data Preparation
  • 19. CRISP - DM Cross Industry Standard for Data Mining 80 - 20 Rule! Time Consuming : %20 Success Factor : %80 Source: Berthold, Borgelt, Höppner, Klawonn: Guide to Intelligent Data Analysis, Springer 2011
  • 20. Sharing Tools Sharing Skills Sharing Responsibility A new generation of tools They can build their own reports A recipe for disaster Data is viral - everybody wants it Start small and just do it
  • 23. Guided Analytics | Introduction • Systems that automate the data science cycle have been gaining a lot of attention recently. • Those tools often automate only a few phases of the cycle, have a tendency to consider just a small subset of available models, and are limited to relatively straightforward, simple data formats. • Automation should not result in black boxes, hiding the interesting pieces from everyone; the modern data science environment should allow automation and interaction to be combined flexibly.
  • 24. Guided Analytics | Definition • Allowing data scientists to build interactive systems, interactively assisting the business analyst in her quest to find new insights in data and predict future outcomes.
  • 25. Guided Analytics | Definition • We explicitly do not aim to replace the driver (or totally automate the process) but instead offer assistance and carefully gather feedback whenever needed throughout the analysis process. • To make this successful, the data scientist needs to be able to easily create powerful analytical applications that allow interaction with the business user whenever their expertise and feedback is needed.
  • 26. Guided Analytics | Environments Openness Uniformity Flexibility Agility
  • 27. Guided Analytics | Environments Openness: • The environment does not post restrictions in terms of tools used – this also simplifies collaboration between scripting gurus (such as R or Python) and others who just want to reuse their expertise without diving into their code. • Obviously being able to reach out to other tools for specific data types (text, images, …) or specialized high performance or big data algorithms (such as H2O or Spark) from within the same environment would be a plus; Uniformity Flexibility Agility
  • 28. Guided Analytics | Environments Openness Uniformity: The experts creating data science can do it all in the same environment: • blend data, • run the analysis, • mix & match tools, • build the infrastructure to deploy this as analytical application; Flexibility Agility
  • 29. Guided Analytics | Environments Openness Uniformity Flexibility: • Underneath the analytical application, we can run simple regression models or orchestrate complex parameter optimization and ensemble models – ranging from one to thousands of models. Agility
  • 30. Guided Analytics | Environments Openness Uniformity Flexibility Agility: • Once the application is used in the wild, new demands will arise quickly: more automation here, more consumer feedback there. • The environment that is used to build these analytical applications needs to make it intuitive for other members of the data science team to quickly adapt the existing analytical applications to new and changing requirements.
  • 31. Guided Analytics | Auto-what? • So how do all of those driverless, automatic, automated AI or machine learning systems fit into this picture? • Their goal is either to encapsulate (and hide!) existing expert data scientists’ expertise or apply more or less sophisticated optimization schemes to the fine-tuning of the data science tasks.
  • 32. Guided Analytics | Auto-what? • Obviously, this can be useful if no in-house data science expertise is available but in the end, the business analyst is locked into the pre-packaged expertise and the limited set of hard coded scenarios. • Both, data scientist expertise and parameter optimization can easily be part of a Guided Analytics workflow as well. • And since automation of whatever kind tends to always miss the important and interesting piece, adding a Guided Analytics component to this makes it even more powerful: we can guide the optimization scheme and we can adjust the pre-coded expert knowledge to the new task at hand.
  • 33. Data Sciense Project | Roles www.sistek.com.tr • Data scientists – Workflow development – Data Analysis – Model Development • Business analysts – WebPortal – Data Analysis • IT administrators – Enterprise Architecture Mngmt – Cloud solution provider 5.Data Science Project –Roles
  • 34. Data Science Project | Data Scientist www.sistek.com.tr Responsible for: • Creating, updating workflows • Creating, maintaining metanode templates • Building, evaluating, monitoring data and using ad hoc developed workflows • Development of WebPortal applications 5.Data Science Project – Data Scientists
  • 35. Demo
  • 36. About KNIME KNIME is a software for fast, easy and intuitive access to advanced data science, helping organizations drive innovation. KNIME Analytics Platform is the leading open solution for data- driven innovation, designed for discovering the potential hidden in data, mining for fresh insights, or predicting new futures. Organizations can take their collaboration, productivity and performance to the next level with a robust range of commercial extensions to Knime open source platform. For over a decade, a thriving community of data scientists in over 60 countries has been working with Knime platform on every kind of data: from numbers to images, molecules to humans, signals to complex networks, and simple statistics to big data analytics. KNIME’s headquarters are based in Zurich, with additional offices in Konstanz, Berlin, and Austin.
  • 37.
  • 39. Guided Analytics | Design The workflow defines a fully automated web based application to select, train, test, and optimize a number of machine learning models. The workflow is designed for business analysts to easily create predictive analytics solutions by applying their domain knowledge. Each of the wrapped metanodes outputs a web page with which the business analyst can interact.
  • 40. Guided Analytics | Application
  • 41. Sources ๏ Christian Dietz, Paolo Tamagnini, Simon Schmid, Michael Berthold: Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science, https://www.knime.com/blog ๏ Berthold, Borgelt, Höppner, Klawonn: Guide to Intelligent Data Analysis, Springer 2011 ๏ Michael Berthold: Principles of Guided Analytics, https://www.knime.com/blog ๏ Ali Alkan: Veri Madenciliği Teknikleri, Eğitim Notları 2017 ๏ Ali Alkan: İleri Analitik Teknikler Seminerleri 1-2-3-5-6-7, Seminer Notları 2016-17