The presentation summarizes most of the hands on materials from a course that I've been giving through the past few weeks, on basic machine learning techniques with implementation using KNIME .
I hope these materials would become useful for other learners who are making their first steps with this great platform
The course covers:
* basic I/O
* classification
* regression
* prediction
* evaluation
* feature selection
* hyper-parameter optimization
* basic feature extraction
* deep learning basics for KNIME analytics platform
** Note that the live course also included additional theoretical lectures and materials that formed a basic understanding of the underlying principles behind the described hands-on content
3. outline
Introduction to the KNIME platform
Data IO (reading/writing data from/to files)
Basic data manipulation (data preprocessing)
Basic data exploration and plotting
Initial data modeling – logistic regression
Exploration of model results
Using complex features to improve model results
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 3
4. The KNIME platform
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 4
KNIME platform is an interactive tool that allows data
processing, modeling, visualization and many more…
KNIME
explorer
Node
repository
Work area
Execution
console
Workflow
outline
5. The KNIME platform – Input and output
Reading and writing data from/to files
1. go to the node repository and search for “read”
2. Select csv reader and drag to work area
3. Right click the node and select configure
4. Browse and select the file that you would like
to open (throughout this tutorial we will use
the file “Churn_Modelling.csv” in the course
materials folder)
5. We can customize the file separators
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 5
6. The KNIME platform – Input and output
Reading and writing data from/to files
6. We can select number of lines to skip and number of lines to read
7. And also customize the encoding type
(this can be useful for some file type/languages)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 6
7. The KNIME platform – Input and output
Finally after we are done configuring the node we can run it and explore the
resulting data:
1. Right click the node
2. Click execute or press F7 to run the cell
3. Once reading is complete click File Table to view
the result – expect to see the following output:
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 7
8. The KNIME platform – Input and output
1. Notice the column types (D for double, I for integer, S for string)
2. We can also view the data dimensions (10k rows and 13 columns)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 8
9. The KNIME platform – Input and output
1. We can also get some basic information about the data we loaded
such as minimal and maximal value
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 9
10. The KNIME platform – data manipulation
Filtering
Sometimes we want to look/process/model only
part of the data so we need to filter out the rest
1. Add a row filter node to the workflow
2. Filter out all the rows in which age is lower
than 30
3. Note that you can choose whether to include
or exclude the rows that satisfy the condition
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 10
11. The KNIME platform – data manipulation
Filtering (cont.)
We can also use various logical rules for filtering:
1. Add a rule based row filter node
2. Configure the new node by adding
one or more logical rules
3. Add “=>“ key combination and either
TRUE or FALSE as an output
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 11
12. The KNIME platform – data manipulation
Replacing values
Sometimes we would like to replace specific values with another
alternative value
this would even be necessary when:
1. We get text error messages within a numerical feature column
2. We have outliers that we want to change to another value
3. We have some missing of fixed values (read errors)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 12
13. The KNIME platform – data manipulation
Replacing values
1. Select string replacer node
2. Replace any name that starts with ‘A’ to
‘Ace’
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 13
14. The KNIME platform – data manipulation
Missing values:
We have several ways to deal with missing values depending on type of
the feature we can:
1. Fill in with a specific value
2. Replace with following/previous good value
3. Fill in based on some statistical value
(average, mode etc.)
4. Fill in with an interpolation
or with a moving average
* Use the missing value node to experiment
with all of these options
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 14
15. The KNIME platform – data manipulation
Pivoting / unpivoting:
In some cases we would like to aggregate data per each feature value
we can use pivoting node to accomplish this task
In this example we will check the average age and 90th percentile of age
for male / female in each country
1. Add pivoting node to the workflow
2. Add ‘geography’ in the groups tab
3. Add ‘gender ‘ in the pivots tab
4. Add age to the aggregation tab (twice)
5. Define mean and percentile as desired
aggregation functions
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 15
16. The KNIME platform – data manipulation
Pivoting / unpivoting:
We get 3 views from this process
1. A pivot table view
2. A group-totals view (summarizes rows)
3. A pivot totals view (summarizes columns)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 16
17. The KNIME platform – data manipulation
Renaming features
Sometimes our column names are tricky to remember or even
distinguish – in such cases it can be a good idea to rename these
columns to contain a more human-friendly name
1. Add a rename node to the workflow
2. Change the names of columns to human-understandable names
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 17
18. The KNIME platform – data manipulation
Creating new features
In most cases the data we get when we start our
research process will not contain the optimal features
to begin with and we would like to create additional
features that will result in a better predictive power
1. Add a formula node
2. Create a new feature that holds the following
formula:
balance / salary
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 18
19. The KNIME platform – data manipulation
Dealing with textual and categorical features
For most algorithms data should not contain any textual feature for
processing. This can be solved using “one hot encoding”
1. Add a ‘one to many’ node
2. Add the desired string features
to selection
3. Run the node and check the
output table
4. You should now see columns that
contain binary values for each
of the values in these string
features
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 19
20. The KNIME platform – data exploration
Before modeling our data we should first get to know it better
This can be done in many ways – using plots, exploring statistical aggregations
and extremes of our data, and also via basic modeling and viewing the results
and errors
We will start with scatter plots
There are many scatter plot nodes I recommend using the “2D-3D scatter plot”
node for exploration as its interactive mode is very convenient and fits this
stage well.
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 20
21. The KNIME platform – data exploration
Scatter plots
1. Add a “2D-3D scatter plot” node
2. Select columns for your plot
(this could be changed later in
the interactive mode)
3. Make sure to adjust number of
rows to display according to
your need
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 21
22. The KNIME platform – data exploration
Scatter plots
4. Run the node
5. Select the relevant feature
for each of the axes
6. Use the target column in
color values
7. Note that you can filter the
presented results using the
sliders in the bottom right
corner
8. You can rotate the plot by
dragging one of the axis to
the desired direction
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 22
23. The KNIME platform – basic modeling and concepts
Its finally time to create our model – a Logistic Regression model
Use the Partitioning (aka train test split) node to create train and validation sets
This node has 2 return ports
One for our training set – the data our model
will be trained on
And the other for our validation (or test)
the data that the model will not be exposed to
and we will test the model performance with
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 23
24. The KNIME platform – basic modeling and concepts
1. Add a logistic regression learner node to the workflow
2. Set our desired target column
3. Select the desired explaining variables
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 24
25. The KNIME platform – basic modeling and concepts
1. Add a logistic regression predictor to the workflow
2. Make sure to connect the predictor to both the model and the validation
partition we created
3. We can define if predictions will be probabilities
or the predicted category
4. The flow should look like the one below:
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 25
26. The KNIME platform – result analysis & error analysis
Checking our results:
1. Add a scorer node
2. Define the target column and the
predicted column
3. run the node
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 26
27. The KNIME platform – result analysis & error analysis
Checking our results:
1. We can now view either the confusion
matrix or the accuracy results
(shown in the bottom of the list)
2. A summarized view of both is
available using the view in the middle
of the list
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 27
28. The KNIME platform – complex feature creation
Now we can use all what we have done so far to create some advanced
features and improve our results
Your task:
Use all what we have learned so far to improve the model results
You may explore additional exploration and transformation nodes
At this point do not use other learner-predictor nodes
Let the competition begin!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 28
30. outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Hyper-Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 30
31. The KNIME platform – basic modeling recap
1. Using a logistic regression model we got initial result of ~80% accuracy and
with some effort we managed to improve it to 83% accuracy still using
logistic regression model
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 31
32. The KNIME platform – basic modeling - feature selection
It turns out that we can get to 83% accuracy with only 7 features
(to improve our score we have to ignore the ‘gender’, ‘hasCrCredit’, and ‘balance’ features)
But, how can we decide which features to use and which to ignore?
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 32
33. The KNIME platform – basic modeling - feature selection loop
introducing feature selection loops!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 33
First we run some
experiments with
different combinations
Then we filter in only
the relevant columns
Finally we run the model with
best feature combination
34. The KNIME platform – feature selection loop
Now lets do this one step at a time…
First lets reconstruct our experiment:
1. Read the data from the previous exercise to a new workflow
(“churn_modeling.csv” file)
2. Encode the target column to be nominal using ‘number to string’
node
3. Remove outliers from the data using the
‘numeric outliers’ node
4. Fill in missing values using the ‘missing’ node
5. Split the data to training and testing using partitioning node (use
the same method and seed from lesson 1 for a valid comparison)
6. Add ‘logistic regression learner’ and
‘logistic regression predictor’ nodes
7. Add a ‘scorer’ node so that we can view the metrics of our
model’s predictions
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 34
35. The KNIME platform – feature selection loop
And… add the feature selection loop:
8. Add a ‘feature selection loop start’ node before
the partitioning node
9. Add a ‘feature selection loop end’ after the scorer
node
10. Right click the scorer node and select “show flow
variables ports” you will notice two red dots
above the node
11. Connect the scorer variable output
to the feature selection loop end
input port (red dots)
12. Add a ‘feature selection filter’ node
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 35
36. The KNIME platform – feature selection loop
13. Copy and paste the partitioning, learner, predictor and scorer nodes
14. Connect the new partitioning node to the feature selection filter node
You should now have a workflow similar to the one below:
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 36
37. outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Hyper-Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 37
38. The KNIME platform – random forest model
Now that we are familiar with some tree-based
models we can give them a try…
1. Read the data from the previous exercise to a new
workflow (“churn_modeling.csv” file)
2. Encode the target column to be nominal using
number to string node
3. Add features as you wish
4. Add a random forest learner and predictor nodes
5. Tune the number of models, maximal depth,
minimum node size and split criteria as you wish
6. Add a scorer node and ROC-curve node
7. Run the workflow and explore results
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 38
39. The KNIME platform – XGBoost model
Now that we are familiar with some tree-based
models we can give them a try…
1. Read the data from the previous exercise to a new
workflow (“churn_modeling.csv” file)
2. Encode the target column to be nominal using
number to string node
3. Add features as you wish
4. Add a XGBoost learner and predictor nodes
5. Tune the number of models, maximal depth,
minimum node size and split criteria as you wish
6. Add a scorer node and ROC-curve node
7. Run the workflow and explore results
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 39
40. The KNIME platform – comparing XGBoost vs. RF models
• Once added the two new models our workflow should look as follows:
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 40
41. outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Hyper-Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 41
42. The KNIME platform – (hyper)parameter optimization
• Well results look great, in comparison with our previous logistic regression
models, but are they optimal?
• To answer this question we may wish to use parameter optimization loop nodes
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 42
43. The KNIME platform – (hyper)parameter optimization
1. With our previous random forest and xgboost models we will add two
parameter optimization loop – the below steps are for the Random forest
model please complete the xgboost with similar steps
2. Add a parameter optimization loop start node
and connect it to the variable in-port
3. Add a parameter optimization loop end
and connect the variable out-port of the
scorer to its in-port
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 43
44. The KNIME platform – (hyper)parameter optimization
4. Configure the hyper parameter loop start node:
• Add a subsample, min_child_weight & max_depth
parameters with relevant ranges
• Select number of iterations for random search
or step-size for brute-force search
5. Configure the learner to use the parameters
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 44
45. The KNIME platform – (hyper)parameter optimization
6. Configure the parameter optimization loop end
set objective function to Accuracy and optimization to maximized
7. Run the loop from the loop end node
8. View the best configuration
or all configuration results
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 45
46. outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 46
47. The KNIME platform – regression
• The main difference between what we have
done so far for classification problems to
what we will do with regression tasks is the
characteristics of the target column
• In classification tasks our target is categorical
(represented as string in Knime) and in
regression tasks our target will be numerical.
• Pay attention to use appropriate nodes –
contain ‘(regression)’ in the node description
for both learner and predictor
• To the right you can find an example for a
random forest based regression workflow
• Build & run the workflow
• Report the results you got
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 47
We will use a numeric
scorer as metrics for
regression differs from
classification metrics The line plot will show
the difference between
predicted value and the
ground truth
48. The KNIME platform – regression
1. As in our previous tasks we
will start with reading the
data – insert a csv reader
node and read the file:
‘steam data - lesson 2.csv’
2. As before after reading the
data lets first understand
what is our goal – we
would like to validate the
data of P64TI4332 tag.
3. We will do this by using all
other parameters to
predict the value of the
P64TI4332 tag
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 48
49. The KNIME platform – regression
We can now use the things we
have learned so far to compare
the regression error metrics
(MAE, RMSE, R2) among
various regression algorithms
we will compare:
• linear regression
• polynomial regression
• random forest
• XGBoost
(no need to panic from the number of nodes –
its going to be quite simple )
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 49
50. outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 50
51. Not IID – still OK! just pay attention
• When data is not IID we need to make adjustments to:
• Validation method
• Our feature creation process
• Our target(s)
• Data originating groups
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 51
52. Time series data
• Split by time, not randomly
• Use lagged columns for features
• Use time difference based features
• Be aware of these important terms:
• Horizon of forecast
• Lag of available data
• Seasonality (may be more than one num)
• Trend
• Frequency
• Sampling rate and timing
• Prediction vs. backcast vs. current time regression
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 52
Random split of the data resembles imputation task
Time based separation is appropriate for prediction tasks
53. The KNIME platform – prediction
1. We will now reuse the
data from the previous
regression task and
redefine our problem as
prediction task
2. Given the current (and
historical) data we should
predict the temperature 4
hours ahead
3. Use the lag column node
to create the appropriate
target column
4. Try to come up with
effective explaining
features
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 53
54. outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 54
56. Class mini-hackathon!
Use all what we have learned so far to create a model that
1. Reads the case study XXX.csv file
2. Predicts the current target using the other variables
3. *Predicts tomorrow’s target using today’s features (24h horizon)
4. **Classifies whether tomorrow’s target will be higher or lower than today’s
current target
You may explore any additional nodes that you think relevant
And probably find nodepit.com to be useful
Let the competition begin! (guided classroom competition)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 56
58. outline
Recap - What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Lag-column node – the long (and obvious) way to create lagged features
Some more useful loop types:
• Column list loops
• Table row to variable loops
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 58
59. Not IID – still OK! just pay attention
• When data is not IID we need to make adjustments to:
• Validation method
• Our feature creation process
• Our target(s)
• Data originating groups
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 59
60. Time series data
• Split by time, not randomly
• Use lagged columns for features
• Use time-difference-based features
• Be aware of these important terms:
• Horizon of forecast
• Lag of available data
• Seasonality (may be more than one num)
• Trend
• Frequency
• Sampling rate and timing
• Prediction vs. backcast vs. current time regression
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 60
Random split of the data resembles imputation task
Time based separation is appropriate for prediction tasks
61. The KNIME platform – prediction
1. We will now reuse the
data from the previous
regression task and
redefine our problem as
prediction task
2. Given the current (and
historical) data we should
predict the target few
hours ahead
3. Use the lag column node
to create the appropriate
feature columns
4. Try to come up with
effective explaining
features
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 61
62. The KNIME platform – prediction
1. We will now reuse the
data from the previous
regression task and
redefine our problem as
prediction task
2. Given the current (and
historical) data we should
predict the target few
hours ahead
3. Use the lag column node
to create the appropriate
feature columns
4. Try to come up with
effective explaining
features
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 62
63. The KNIME platform – basic modeling recap
1. Based on our last workflow (from HW) we would now like to create a
prediction task that uses the same variables but with lagged features instead
of those from the same timestamp
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 63
64. The KNIME platform – basic modeling recap
• While the former example is valid and will work – it will be hard to experiment
with (consider changing the lag-interval from 5 to 4 or creating few lag-
intervals instead of just one)
• Introducing: table row variable loops
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 64
65. The KNIME platform – basic modeling recap
• But what if we want to create few lag-intervals
• Introducing: column list loops
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 65
66. The KNIME platform – basic modeling recap
• Now let’s combine it all together…
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 66
68. outline
From “we’re new to Knime” to “we’ve pushed Knime to the edge”
Deep Learning within Knime:
• Constructing the network
• Preparing the data
• Training a learner
• Saving the network and weights
• Predicting new data
Other network types
LSTMs using knime
CNNs using knime
Knime forums
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 68
69. outline
From “we’re new to Knime” to “we’ve pushed Knime to the edge”
Deep Learning within Knime:
• Constructing the network
• Preparing the data
• Training a learner
• Saving the network and weights
• Predicting new data
Other network types
LSTMs using knime
CNNs using knime
Knime forums (you should really write a blogpost on this part of the lesson)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 69
70. The KNIME platform – time series modeling with Deep Learning
We will start with the workflow we created during the last lesson
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 70
71. The KNIME platform – Deep Learning in Knime
• The first network we will construct is a simple feed forward network
• A feed forward network works similarly to the learners that we have trained
before – we just need to define its structure first
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 71
72. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 72
We will use the same
preprocessing workflow
from the past lesson
We will define the network
architecture using Keras nodes
Externally modeling stage is
very similar to what we have
done before, inside
configuration largely differs
Additional preprocessing for FFN is
minimal - we need to remove all
missing values
73. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 73
For input layers we will define the
shape of the input
in FFN - this will be the number of
features we feed the model with
For hidden layers we will select the
number of units (neurons) the layer
will contain and the activation function
that we would like to use
For output layers we will set the number
of units (neurons) the layer will contain
and the activation function that we
would like to use,
Note that this will be done according to
our target and not selected by our choice
Note that the type of this activation will be
highly responsible for the model’s final result
74. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 74
Now let’s configure the
network learner
hyperparametersTake the target column out of
the input columns
Verify that conversion is set to
“from number (double)”
Define the target column on
the relevant tab
Verify that conversion is set to
“from number (double)”
Remember to use a loss function appropriate
to the problem you are trying to model !
75. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 75
Now let’s configure the
network learner
hyperparameters
In the options tab define
“epochs” – number of
time the model sees all
of the training samples
Set the learning rate –
that’s right inverse relation
to number of epochs
76. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 76
After executing the network learner
Right click the learner node and
select “view : Learning Monitor”
Let's take a look at the various information that
this view supplies us with…
view accuracy (for classification tasks)
Or loss for any task
Training/validation error
Current epoch & batch
Stop learning process
(and keep results!)
77. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 77
We need to configure our predictor
(keras executer) to yield the results
we want
And configure our
scorer accordingly
Add an output
to the predictor
Select the last layer’s output as the predictors output
Make sure that output type is numeric so we can use our scorer
Check this box to have the option to run a scorer
78. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 78
Finally, we have trained the
network, and can save the model
weights for later use
Select a file name, click save and
Run the Keras network writer node!
79. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 79
Now lets try and read the network
we saved and use it for prediction
Connect the predictor node:
• to the network reader port
• to the validation data
Define the predictor output (as before)
And … Predict!
80. outline
From “we’re new to Knime” to “we’ve pushed Knime to the edge”
Deep Learning within Knime:
• Constructing the network
• Preparing the data
• Training a learner
• Saving the network and weights
• Predicting new data
Other network types
LSTMs using knime
CNNs using knime
Knime forums (you should really write a blogpost on this part of the lesson)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 80
81. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 81
By now we can probably wrap
this part as a meta-node
(We can also add more lags)
We will define the new network architecture
using LSTM/CuDNN LSTM nodes
Additional preprocessing for
using LSTMs is more complex
and contains some extra steps
82. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 82
We will define the new network architecture
using LSTM/CuDNN LSTM nodes
For input layers we will define the shape of
the input in LSTM - this will be
(number of time stamps {3 in our example}) x
(the number of features {9 in our example })
Check this boxes if next node is also LSTM node
Select number
of neurons
For output layers we will set the number of
units (neurons) the layer will contain and the
activation function that we would like to use,
Note that this will be done according to our
target and not selected by our choice
83. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 83
We will usually want to
normalize our data prior to
using it in a NN to help
convergence of the model
You may consider to normalize the
target as well but keep in mind to de-
normalize it after prediction and before
assessment of model results
We would like the columns to be
ordered by features and not by time
lags so let’s sort the columns…
Column order before sorting
Column order after sorting
Use the actions buttons to create
the sorting order you need
84. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 84
Now comes the hacky part…
Use “create column collection” to
convert multiple columns to one
collection column
85. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 85
Now comes hacky part II
Use “data row to image”
to convert the collection column we got in
the previous stage to a 3D array
a.k.a. “Tensor”
(While this stage is not mandatory, it will
enable us to look-into the tensor dimensions
and verify we got the desired results)
86. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 86
Finally we need to add the target column
back before training the learner
All other steps are the same
as in the FFN workflow!
87. outline
From “we’re new to Knime” to “we’ve pushed Knime to the edge”
Deep Learning within Knime:
• Constructing the network
• Preparing the data
• Training a learner
• Saving the network and weights
• Predicting new data
Other network types
LSTMs using knime
CNNs using knime
Knime forums (you should really write a blogpost on this part of the lesson)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 87
88. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 88
We will define the new network architecture
using 1D convolutions layer nodes, add layers,
and dropout layers
Same additional preprocessing
as for using LSTMs
89. The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 89
Select number of filters to train
Select filter size
Select stride size (gap between adjacent filters)
Is padding needed? Same=>yes ; normal=> no padding
Creates a skip within the graph
* Verify that all input tensors
have the same dimensions
Randomly erases {drop rate} neurons while training
90. Deep Learning
You have just competed your first
“Deep Learning for multivariate time
series modeling with Knime” !!!
Of course, this is merely just the
beginning of the journey…
But now the great part comes in:
apply all you have learned to your daily work- you’d be amazed with the things
you can accomplish with your newly acquired skills
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 90
91. The content was shared with the
hope to boost learning process for
those making their first steps with
the KNIME platform
Feel free to share your experience and
comments on using it (good or bad)
Mail: nathaniel@post.bgu.ac.il
Or via my LinkedIn page
Editor's Notes
Lets read some data from a file
Now lets save this data to another file
Lets read some data from a file
Now lets save this data to another file
Lets read some data from a file
Now lets save this data to another file
Lets read some data from a file
Now lets save this data to another file
Lets read some data from a file
Now lets save this data to another file
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions
Lts explore the data types that we get for each column (convert between different type)
For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically)
For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation
In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting
Now rename the column to… XXX
Lets create a new column that holds x-y, x/y,
Lets create a new column by splitting column x
Lets create a new column by combining two or more columns
Lets start with basic exploration of our data
Plot the histogram of column x
Plot a bar plot of the number of occurrences of value x per category
For each of the features in our data create a scatter plot with the target as y and the feature as x axis
Also answer: what is the number of unique values for this feature?
What is the min, max, mean(average), median and percentiles (10,20,…,90)
Lets start with basic exploration of our data
Plot the histogram of column x
Plot a bar plot of the number of occurrences of value x per category
For each of the features in our data create a scatter plot with the target as y and the feature as x axis
Also answer: what is the number of unique values for this feature?
What is the min, max, mean(average), median and percentiles (10,20,…,90)
Lets start with basic exploration of our data
Plot the histogram of column x
Plot a bar plot of the number of occurrences of value x per category
For each of the features in our data create a scatter plot with the target as y and the feature as x axis
Also answer: what is the number of unique values for this feature?
What is the min, max, mean(average), median and percentiles (10,20,…,90)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Lets read some data from a file
Now lets save this data to another file
Lets read some data from a file
Now lets save this data to another file
Lets read some data from a file
Now lets save this data to another file
Lets read some data from a file
Now lets save this data to another file
Lets read some data from a file
Now lets save this data to another file
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)
Now that we know how our data looks like lets create our very first ML model – a logistic regression model
Before we do we need to decide how will we now that our model is good for unseen data (use partitioning)
Split the data to 70% train and 30% validation:
Use random splitting
Use selection of the first 70% for training
Use selection by another criteria (stratification)