SlideShare a Scribd company logo
1 of 22
Download to read offline
© 2019 KNIME AG. All rights reserved.
Practicing Data Science
KNIME: Rosaria.Silipo@knime.com
@KNIME
Asking for Directions in an AI Project
… is starting soon …
© 2019 KNIME AG. All rights reserved.
Practicing Data Science
KNIME: Rosaria.Silipo@knime.com
@KNIME
Asking for Directions in an AI Project
© 2019 KNIME AG. All rights reserved.
Introduction
This webinar collects the answers to
the questions I get every time I start
a new data science project
3
© 2019 KNIME AG. All rights reserved.
The Standard DS Process
4
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2019 KNIME AG. All rights reserved.
The Training Process as a Workflow
5
© 2019 KNIME AG. All rights reserved.
How Standard is the Standard Process?
• Data Preparation
– Data types (structured vs. unstructured)
– Weird distributions (rare and infrequent classes)
– Model-dependent transformations
• Machine Learning model
– Model yes/no
– Which problem?
– Which model for which problem?
• Deployment
– Reports, dashboards, REST, or just save to DB?
– Scalability
6
The standard data mining
process is not very
standard
© 2019 KNIME AG. All rights reserved.
Do I need to train a ML model?
7
Sometimes a picture is better than 1000
words
Customer Description
Money vs. Loyalty
User Behaviour
Energy Consumption
Sometimes we only
need KPI measures.
Clickstream Analysis
Multiple Aggregations
Sometimes we only need
a Data WareHouse
data
DB
data data data
DWH
Business
Unit Business
Unit
Business
Unit
ETL ETL ETL ETL
ETL
ETL ETL ETL
© 2019 KNIME AG. All rights reserved.
Classification or Number Prediction?
8
Classes: Red, Blonde, Brown, Black
EnergyUsage(KwH)
now Wed 12:00
Binning
Discretization
deep learning network
© 2019 KNIME AG. All rights reserved.
deep learning network
Number Prediction or Time Series Analysis?
9
Linear Regression
Time Series Prediction
y from x1, x2, ..., xn x(t) from past x(t-1) ... x(t-n)
time
original
predicted
Make sure that the future does not
mix with the past in data partitioning
© 2019 KNIME AG. All rights reserved.
Supervised vs. Unsupervised ML Algorithms
10
x1 x2 xn...x3 class
yx1 x2 xn...x3
Labelled Training Set
x1 x2 xn...x3
Unlabelled Training Set
Supervised Unsupervised
DBSCAN
Fuzzy c-Means
Hierarchical clusteringActiveLearning
© 2019 KNIME AG. All rights reserved.
Unevenly Distributed, Infrequent, and Rare Classes
Infrequent
11
Unevenly distributed Rare (anomaly)
distance
Auto-encoder
distance
numerical prediction
clustering
Training only on „normal“ data
© 2019 KNIME AG. All rights reserved.
Structured Data vs. Unstructured Data
12
Structured Data Unstructured Data
Text NetworksImages
Text / Image / Network / Chemistry Extension
To numbers
© 2019 KNIME AG. All rights reserved.
The Deployment Process as a Workflow
13
© 2019 KNIME AG. All rights reserved.
Deployment: REST API, Shiny Dashboards, plain Background Execution
14
Your workflow as ...
... a REST API
... Guided Application
© 2019 KNIME AG. All rights reserved.
Scalability: Spark, Parallel Execution, in-DB Processing
15
Spark
Parallel Execution
on Server
In-database
processing
© 2019 KNIME AG. All rights reserved.
Summary
• Is the standard DS process so standard?
• Do I need a ML model?
• Training
– Classification or Number Prediction?
• Number Prediction or Time Series Analysis?
• Supervised or Unsupervised Learning?
– Unevenly Distributed, Infrequent, and Rare Classes
– Structured vs. Unstructured Data
• Deployment
– REST API, Dashboards, Background Execution
– Scalability Options
16
© 2019 KNIME AG. All rights reserved.
KNIME Spring Summit 2019
March 18 – 22 at bcc Berlin Congress Center, Berlin
• Monday & Tuesday: One-day courses
• Wednesday & Thursday: Summit sessions
• Friday: Workshops
Use the code
WEBINAR-20
for 20% off tickets!
Register at
knime.com/spring-summit2019
© 2019 KNIME AG. All rights reserved.
Practicing Data Science
Free Copy of “Practicing Data Science” e-Book from KNIME Press
https://www.knime.com/knimepress
with this code: PDS-WEBINAR-0319
18
© 2019 KNIME AG. All rights reserved. 19
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.
Thank You!
© 2019 KNIME AG. All rights reserved.
Let’s unroll it!
It always starts
with some data …
20
Data
Preparation
Model
Training
Model
Optimization
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No. Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
Model
Evaluation
© 2019 KNIME AG. All rights reserved.
The many Lives of a Dataset
21
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Partitioning:
• Training Set
• Validation Set
• Test Set
Training Set Validation Set Test Set New Data from Real
World Applications
Original Data
Set with Past
Observations
© 2019 KNIME AG. All rights reserved.
Data Exploration
• Data Understanding is a Data Exploration phase
• The Data Exploration phase is useful to get to
know the data
• KNIME offers a few visualization nodes to build
dashboards to explore data
22

More Related Content

More from KNIMESlides

KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019KNIMESlides
 
Scoring Metrics for Classification Models
Scoring Metrics for Classification ModelsScoring Metrics for Classification Models
Scoring Metrics for Classification ModelsKNIMESlides
 
Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareKNIMESlides
 
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine LearningAnomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine LearningKNIMESlides
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerKNIMESlides
 
Guided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine LearningGuided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine LearningKNIMESlides
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIMESlides
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformKNIMESlides
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformKNIMESlides
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedKNIMESlides
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIMESlides
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to DeploymentKNIMESlides
 
From raw data to deployment
From raw data to deployment From raw data to deployment
From raw data to deployment KNIMESlides
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
Just add Imagination
Just add ImaginationJust add Imagination
Just add ImaginationKNIMESlides
 
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsAdvanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsKNIMESlides
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKNIMESlides
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIMEKNIMESlides
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!KNIMESlides
 

More from KNIMESlides (20)

KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
 
Scoring Metrics for Classification Models
Scoring Metrics for Classification ModelsScoring Metrics for Classification Models
Scoring Metrics for Classification Models
 
Open Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME SoftwareOpen Source Story and what’s new in KNIME Software
Open Source Story and what’s new in KNIME Software
 
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine LearningAnomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
 
Sharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME ServerSharing and Deploying Data Science with KNIME Server
Sharing and Deploying Data Science with KNIME Server
 
Guided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine LearningGuided Automation- A Blueprint for Interactive Automated Machine Learning
Guided Automation- A Blueprint for Interactive Automated Machine Learning
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
Sentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics PlatformSentiment Analysis with KNIME Analytics Platform
Sentiment Analysis with KNIME Analytics Platform
 
Chemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics PlatformChemistry Data Basics with KNIME Analytics Platform
Chemistry Data Basics with KNIME Analytics Platform
 
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon basedSentiment Analysis with Deep Learning, Machine Learning or Lexicon based
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
 
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To DeploymentKNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Data Science Learnathon: From Raw Data To Deployment
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
From Raw Data to Deployment
From Raw Data to DeploymentFrom Raw Data to Deployment
From Raw Data to Deployment
 
From raw data to deployment
From raw data to deployment From raw data to deployment
From raw data to deployment
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Just add Imagination
Just add ImaginationJust add Imagination
Just add Imagination
 
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike StationsAdvanced analytics for the Internet of Things. Restocking Rental Bike Stations
Advanced analytics for the Internet of Things. Restocking Rental Bike Stations
 
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network MiningKnime customer intelligence on social media: Text Analytics vs. Network Mining
Knime customer intelligence on social media: Text Analytics vs. Network Mining
 
Text Processing with KNIME
Text Processing with KNIMEText Processing with KNIME
Text Processing with KNIME
 
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!Big Data with KNIME is as easy as 1, 2, 3, ...4!
Big Data with KNIME is as easy as 1, 2, 3, ...4!
 

Recently uploaded

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 

Recently uploaded (20)

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

Practicing Data Science - Asking for Directions in an AI Project

  • 1. © 2019 KNIME AG. All rights reserved. Practicing Data Science KNIME: Rosaria.Silipo@knime.com @KNIME Asking for Directions in an AI Project … is starting soon …
  • 2. © 2019 KNIME AG. All rights reserved. Practicing Data Science KNIME: Rosaria.Silipo@knime.com @KNIME Asking for Directions in an AI Project
  • 3. © 2019 KNIME AG. All rights reserved. Introduction This webinar collects the answers to the questions I get every time I start a new data science project 3
  • 4. © 2019 KNIME AG. All rights reserved. The Standard DS Process 4 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 5. © 2019 KNIME AG. All rights reserved. The Training Process as a Workflow 5
  • 6. © 2019 KNIME AG. All rights reserved. How Standard is the Standard Process? • Data Preparation – Data types (structured vs. unstructured) – Weird distributions (rare and infrequent classes) – Model-dependent transformations • Machine Learning model – Model yes/no – Which problem? – Which model for which problem? • Deployment – Reports, dashboards, REST, or just save to DB? – Scalability 6 The standard data mining process is not very standard
  • 7. © 2019 KNIME AG. All rights reserved. Do I need to train a ML model? 7 Sometimes a picture is better than 1000 words Customer Description Money vs. Loyalty User Behaviour Energy Consumption Sometimes we only need KPI measures. Clickstream Analysis Multiple Aggregations Sometimes we only need a Data WareHouse data DB data data data DWH Business Unit Business Unit Business Unit ETL ETL ETL ETL ETL ETL ETL ETL
  • 8. © 2019 KNIME AG. All rights reserved. Classification or Number Prediction? 8 Classes: Red, Blonde, Brown, Black EnergyUsage(KwH) now Wed 12:00 Binning Discretization deep learning network
  • 9. © 2019 KNIME AG. All rights reserved. deep learning network Number Prediction or Time Series Analysis? 9 Linear Regression Time Series Prediction y from x1, x2, ..., xn x(t) from past x(t-1) ... x(t-n) time original predicted Make sure that the future does not mix with the past in data partitioning
  • 10. © 2019 KNIME AG. All rights reserved. Supervised vs. Unsupervised ML Algorithms 10 x1 x2 xn...x3 class yx1 x2 xn...x3 Labelled Training Set x1 x2 xn...x3 Unlabelled Training Set Supervised Unsupervised DBSCAN Fuzzy c-Means Hierarchical clusteringActiveLearning
  • 11. © 2019 KNIME AG. All rights reserved. Unevenly Distributed, Infrequent, and Rare Classes Infrequent 11 Unevenly distributed Rare (anomaly) distance Auto-encoder distance numerical prediction clustering Training only on „normal“ data
  • 12. © 2019 KNIME AG. All rights reserved. Structured Data vs. Unstructured Data 12 Structured Data Unstructured Data Text NetworksImages Text / Image / Network / Chemistry Extension To numbers
  • 13. © 2019 KNIME AG. All rights reserved. The Deployment Process as a Workflow 13
  • 14. © 2019 KNIME AG. All rights reserved. Deployment: REST API, Shiny Dashboards, plain Background Execution 14 Your workflow as ... ... a REST API ... Guided Application
  • 15. © 2019 KNIME AG. All rights reserved. Scalability: Spark, Parallel Execution, in-DB Processing 15 Spark Parallel Execution on Server In-database processing
  • 16. © 2019 KNIME AG. All rights reserved. Summary • Is the standard DS process so standard? • Do I need a ML model? • Training – Classification or Number Prediction? • Number Prediction or Time Series Analysis? • Supervised or Unsupervised Learning? – Unevenly Distributed, Infrequent, and Rare Classes – Structured vs. Unstructured Data • Deployment – REST API, Dashboards, Background Execution – Scalability Options 16
  • 17. © 2019 KNIME AG. All rights reserved. KNIME Spring Summit 2019 March 18 – 22 at bcc Berlin Congress Center, Berlin • Monday & Tuesday: One-day courses • Wednesday & Thursday: Summit sessions • Friday: Workshops Use the code WEBINAR-20 for 20% off tickets! Register at knime.com/spring-summit2019
  • 18. © 2019 KNIME AG. All rights reserved. Practicing Data Science Free Copy of “Practicing Data Science” e-Book from KNIME Press https://www.knime.com/knimepress with this code: PDS-WEBINAR-0319 18
  • 19. © 2019 KNIME AG. All rights reserved. 19 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!
  • 20. © 2019 KNIME AG. All rights reserved. Let’s unroll it! It always starts with some data … 20 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No. Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 21. © 2019 KNIME AG. All rights reserved. The many Lives of a Dataset 21 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 22. © 2019 KNIME AG. All rights reserved. Data Exploration • Data Understanding is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore data 22