More Related Content Similar to KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019 (20) More from KNIMESlides (11) KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 20191. © 2019 KNIME AG. All rights reserved.
From Raw Data to Deployment
KNIMEr: Kathrin.Melcher@knime.com
KNIMEr: Maarit.Widmann@knime.com
@KNIME
2. © 2019 KNIME AG. All rights reserved.
Do you recognize this?
2
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
3. © 2019 KNIME AG. All rights reserved.
Let’s unroll it!
It always starts
with some data …
3
Data
Preparation
Model
Training
Model
Optimization
Deployment
Data Manipulation
Data Blending
Missing Values Handling
Feature Generation
Dimensionality Reduction
Feature Selection
Outlier Removal
Normalization
Partitioning
…
Model Training
Bag of Models
Model Selection
Ensemble Models
Own Ensemble Model
External Models
Import Existing Models
Model Factory
…
Parameter Tuning
Parameter Optimization
Regularization
Model Size
No. Iterations
…
Performance Measures
Accuracy
ROC Curve
Cross-Validation
…
Files & DBs
Dashboards
REST API
SQL Code Export
Reporting
…
Model
Evaluation
4. © 2019 KNIME AG. All rights reserved.
The many Lives of a Dataset
4
Data
Preparation
Model
Training
Model
Optimization
Model
Evaluation
Deployment
Partitioning:
• Training Set
• Validation Set
• Test Set
Training Set Validation Set Test Set New Data from Real
World Applications
Original Data
Set with Past
Observations
5. © 2019 KNIME AG. All rights reserved.
Data Exploration
• Sometimes in between Data Access and Data
Preparation there is a Data Exploration phase
• The Data Exploration phase is useful to get to
know the data
• KNIME offers a few visualization nodes to build
dashboards to explore the data
5
6. © 2019 KNIME AG. All rights reserved.
What about Big Data?
• Big Data serves Scalability
• The whole Analytics Process is no different on
Big Data
• You need:
– a Big Data Platform
– The KNIME Big Data (Spark & Hive) Extension
6
7. © 2019 KNIME AG. All rights reserved.
One Example for Every Need – on KNIME EXAMPLES Server
The KNIME EXAMPLES Server
7
50_Applications
8. © 2019 KNIME AG. All rights reserved.
Classification Problem & Data Set
• Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html
• Smaller dataset (Jan 2007) (AirlineDataset.table)
• Challenge:
Predict Departure Delays
If on original airline dataset, only flights from airport ORD
Output Class = “delay” if depdelay > 15min
otherwise “no delay”
Available features: date, dep time, arr time, carrier, destination, cancelled, …
14
9. © 2019 KNIME AG. All rights reserved.
Challenges
• Group 1. Data Access and Data Preparation
• Group 2. ML Model Training
• Group 3. Model Deployment
• Import file Learnathon_2019.knar into your workspace
15
10. © 2019 KNIME AG. All rights reserved.
Group 1. Data Access and Data Preparation
16
11. © 2019 KNIME AG. All rights reserved.
Group 2. Model Training & Optimization
17
12. © 2019 KNIME AG. All rights reserved.
Group 3. Deployment
18
• Deployment Options – Multiple challenges:
– Workflow deployment to KNIME Server
– Remote/Scheduled execution from KNIME
Server
– KNIME RESTful Web Services
– Build a Composite Interactive Dashboard and
make it available on KNIME Web Portal
– Generate a report with BIRT
– Write Prediction Results to a Database
13. © 2019 KNIME AG. All rights reserved.
KNIME Fall Summit 2019
November 5 – 8 at AT&T Executive Education and Conference Center,
Austin, Texas
• Tuesday & Wednesday: One-day courses
• Thursday & Friday: Summit sessions
Register by October 1 for
10 % Early Bird Discount
with this code:
LEARNATHON-DUBLIN
Register at
knime.com/summits
14. © 2019 KNIME AG. All rights reserved.
KNIME Beginner’s Luck
Free Copy of KNIME Beginner’s Luck Book from KNIME Press
https://www.knime.com/knimepress
with this code: DUBLIN-0619
20
15. © 2019 KNIME AG. All rights reserved.
Stay connected with KNIME
Blog: knime.com/blog
Forum: forum.knime.com
KNIME Hub: hub.knime.com
Follow us on social media:
KNIME E-Learning Course:
www.knime.com/e-learning-course
16. © 2019 KNIME AG. All rights reserved.
The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH,
and are registered in the United States. KNIME® is also registered in Germany.
Thank You!
#KNIME
#Learnathon