SlideShare a Scribd company logo
1 of 91
Machine Learning course with
Nathaniel Shimoni Aug 2019
Machine Learning Course – Day 1
outline
Introduction to the KNIME platform
Data IO (reading/writing data from/to files)
Basic data manipulation (data preprocessing)
Basic data exploration and plotting
Initial data modeling – logistic regression
Exploration of model results
Using complex features to improve model results
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 3
The KNIME platform
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 4
KNIME platform is an interactive tool that allows data
processing, modeling, visualization and many more…
KNIME
explorer
Node
repository
Work area
Execution
console
Workflow
outline
The KNIME platform – Input and output
Reading and writing data from/to files
1. go to the node repository and search for “read”
2. Select csv reader and drag to work area
3. Right click the node and select configure
4. Browse and select the file that you would like
to open (throughout this tutorial we will use
the file “Churn_Modelling.csv” in the course
materials folder)
5. We can customize the file separators
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 5
The KNIME platform – Input and output
Reading and writing data from/to files
6. We can select number of lines to skip and number of lines to read
7. And also customize the encoding type
(this can be useful for some file type/languages)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 6
The KNIME platform – Input and output
Finally after we are done configuring the node we can run it and explore the
resulting data:
1. Right click the node
2. Click execute or press F7 to run the cell
3. Once reading is complete click File Table to view
the result – expect to see the following output:
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 7
The KNIME platform – Input and output
1. Notice the column types (D for double, I for integer, S for string)
2. We can also view the data dimensions (10k rows and 13 columns)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 8
The KNIME platform – Input and output
1. We can also get some basic information about the data we loaded
such as minimal and maximal value
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 9
The KNIME platform – data manipulation
Filtering
Sometimes we want to look/process/model only
part of the data so we need to filter out the rest
1. Add a row filter node to the workflow
2. Filter out all the rows in which age is lower
than 30
3. Note that you can choose whether to include
or exclude the rows that satisfy the condition
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 10
The KNIME platform – data manipulation
Filtering (cont.)
We can also use various logical rules for filtering:
1. Add a rule based row filter node
2. Configure the new node by adding
one or more logical rules
3. Add “=>“ key combination and either
TRUE or FALSE as an output
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 11
The KNIME platform – data manipulation
Replacing values
Sometimes we would like to replace specific values with another
alternative value
this would even be necessary when:
1. We get text error messages within a numerical feature column
2. We have outliers that we want to change to another value
3. We have some missing of fixed values (read errors)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 12
The KNIME platform – data manipulation
Replacing values
1. Select string replacer node
2. Replace any name that starts with ‘A’ to
‘Ace’
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 13
The KNIME platform – data manipulation
Missing values:
We have several ways to deal with missing values depending on type of
the feature we can:
1. Fill in with a specific value
2. Replace with following/previous good value
3. Fill in based on some statistical value
(average, mode etc.)
4. Fill in with an interpolation
or with a moving average
* Use the missing value node to experiment
with all of these options
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 14
The KNIME platform – data manipulation
Pivoting / unpivoting:
In some cases we would like to aggregate data per each feature value
we can use pivoting node to accomplish this task
In this example we will check the average age and 90th percentile of age
for male / female in each country
1. Add pivoting node to the workflow
2. Add ‘geography’ in the groups tab
3. Add ‘gender ‘ in the pivots tab
4. Add age to the aggregation tab (twice)
5. Define mean and percentile as desired
aggregation functions
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 15
The KNIME platform – data manipulation
Pivoting / unpivoting:
We get 3 views from this process
1. A pivot table view
2. A group-totals view (summarizes rows)
3. A pivot totals view (summarizes columns)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 16
The KNIME platform – data manipulation
Renaming features
Sometimes our column names are tricky to remember or even
distinguish – in such cases it can be a good idea to rename these
columns to contain a more human-friendly name
1. Add a rename node to the workflow
2. Change the names of columns to human-understandable names
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 17
The KNIME platform – data manipulation
Creating new features
In most cases the data we get when we start our
research process will not contain the optimal features
to begin with and we would like to create additional
features that will result in a better predictive power
1. Add a formula node
2. Create a new feature that holds the following
formula:
balance / salary
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 18
The KNIME platform – data manipulation
Dealing with textual and categorical features
For most algorithms data should not contain any textual feature for
processing. This can be solved using “one hot encoding”
1. Add a ‘one to many’ node
2. Add the desired string features
to selection
3. Run the node and check the
output table
4. You should now see columns that
contain binary values for each
of the values in these string
features
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 19
The KNIME platform – data exploration
Before modeling our data we should first get to know it better
This can be done in many ways – using plots, exploring statistical aggregations
and extremes of our data, and also via basic modeling and viewing the results
and errors
We will start with scatter plots
There are many scatter plot nodes I recommend using the “2D-3D scatter plot”
node for exploration as its interactive mode is very convenient and fits this
stage well.
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 20
The KNIME platform – data exploration
Scatter plots
1. Add a “2D-3D scatter plot” node
2. Select columns for your plot
(this could be changed later in
the interactive mode)
3. Make sure to adjust number of
rows to display according to
your need
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 21
The KNIME platform – data exploration
Scatter plots
4. Run the node
5. Select the relevant feature
for each of the axes
6. Use the target column in
color values
7. Note that you can filter the
presented results using the
sliders in the bottom right
corner
8. You can rotate the plot by
dragging one of the axis to
the desired direction
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 22
The KNIME platform – basic modeling and concepts
Its finally time to create our model – a Logistic Regression model
Use the Partitioning (aka train test split) node to create train and validation sets
This node has 2 return ports
One for our training set – the data our model
will be trained on
And the other for our validation (or test)
the data that the model will not be exposed to
and we will test the model performance with
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 23
The KNIME platform – basic modeling and concepts
1. Add a logistic regression learner node to the workflow
2. Set our desired target column
3. Select the desired explaining variables
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 24
The KNIME platform – basic modeling and concepts
1. Add a logistic regression predictor to the workflow
2. Make sure to connect the predictor to both the model and the validation
partition we created
3. We can define if predictions will be probabilities
or the predicted category
4. The flow should look like the one below:
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 25
The KNIME platform – result analysis & error analysis
Checking our results:
1. Add a scorer node
2. Define the target column and the
predicted column
3. run the node
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 26
The KNIME platform – result analysis & error analysis
Checking our results:
1. We can now view either the confusion
matrix or the accuracy results
(shown in the bottom of the list)
2. A summarized view of both is
available using the view in the middle
of the list
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 27
The KNIME platform – complex feature creation
Now we can use all what we have done so far to create some advanced
features and improve our results
Your task:
Use all what we have learned so far to improve the model results
You may explore additional exploration and transformation nodes
At this point do not use other learner-predictor nodes
Let the competition begin!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 28
Machine Learning Course – Day 2
outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Hyper-Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 30
The KNIME platform – basic modeling recap
1. Using a logistic regression model we got initial result of ~80% accuracy and
with some effort we managed to improve it to 83% accuracy still using
logistic regression model
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 31
The KNIME platform – basic modeling - feature selection
It turns out that we can get to 83% accuracy with only 7 features
(to improve our score we have to ignore the ‘gender’, ‘hasCrCredit’, and ‘balance’ features)
But, how can we decide which features to use and which to ignore?
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 32
The KNIME platform – basic modeling - feature selection loop
introducing feature selection loops!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 33
First we run some
experiments with
different combinations
Then we filter in only
the relevant columns
Finally we run the model with
best feature combination
The KNIME platform – feature selection loop
Now lets do this one step at a time…
First lets reconstruct our experiment:
1. Read the data from the previous exercise to a new workflow
(“churn_modeling.csv” file)
2. Encode the target column to be nominal using ‘number to string’
node
3. Remove outliers from the data using the
‘numeric outliers’ node
4. Fill in missing values using the ‘missing’ node
5. Split the data to training and testing using partitioning node (use
the same method and seed from lesson 1 for a valid comparison)
6. Add ‘logistic regression learner’ and
‘logistic regression predictor’ nodes
7. Add a ‘scorer’ node so that we can view the metrics of our
model’s predictions
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 34
The KNIME platform – feature selection loop
And… add the feature selection loop:
8. Add a ‘feature selection loop start’ node before
the partitioning node
9. Add a ‘feature selection loop end’ after the scorer
node
10. Right click the scorer node and select “show flow
variables ports” you will notice two red dots
above the node
11. Connect the scorer variable output
to the feature selection loop end
input port (red dots)
12. Add a ‘feature selection filter’ node
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 35
The KNIME platform – feature selection loop
13. Copy and paste the partitioning, learner, predictor and scorer nodes
14. Connect the new partitioning node to the feature selection filter node
You should now have a workflow similar to the one below:
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 36
outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Hyper-Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 37
The KNIME platform – random forest model
Now that we are familiar with some tree-based
models we can give them a try…
1. Read the data from the previous exercise to a new
workflow (“churn_modeling.csv” file)
2. Encode the target column to be nominal using
number to string node
3. Add features as you wish
4. Add a random forest learner and predictor nodes
5. Tune the number of models, maximal depth,
minimum node size and split criteria as you wish
6. Add a scorer node and ROC-curve node
7. Run the workflow and explore results
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 38
The KNIME platform – XGBoost model
Now that we are familiar with some tree-based
models we can give them a try…
1. Read the data from the previous exercise to a new
workflow (“churn_modeling.csv” file)
2. Encode the target column to be nominal using
number to string node
3. Add features as you wish
4. Add a XGBoost learner and predictor nodes
5. Tune the number of models, maximal depth,
minimum node size and split criteria as you wish
6. Add a scorer node and ROC-curve node
7. Run the workflow and explore results
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 39
The KNIME platform – comparing XGBoost vs. RF models
• Once added the two new models our workflow should look as follows:
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 40
outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Hyper-Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 41
The KNIME platform – (hyper)parameter optimization
• Well results look great, in comparison with our previous logistic regression
models, but are they optimal?
• To answer this question we may wish to use parameter optimization loop nodes
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 42
The KNIME platform – (hyper)parameter optimization
1. With our previous random forest and xgboost models we will add two
parameter optimization loop – the below steps are for the Random forest
model please complete the xgboost with similar steps
2. Add a parameter optimization loop start node
and connect it to the variable in-port
3. Add a parameter optimization loop end
and connect the variable out-port of the
scorer to its in-port
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 43
The KNIME platform – (hyper)parameter optimization
4. Configure the hyper parameter loop start node:
• Add a subsample, min_child_weight & max_depth
parameters with relevant ranges
• Select number of iterations for random search
or step-size for brute-force search
5. Configure the learner to use the parameters
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 44
The KNIME platform – (hyper)parameter optimization
6. Configure the parameter optimization loop end
set objective function to Accuracy and optimization to maximized
7. Run the loop from the loop end node
8. View the best configuration
or all configuration results
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 45
outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 46
The KNIME platform – regression
• The main difference between what we have
done so far for classification problems to
what we will do with regression tasks is the
characteristics of the target column
• In classification tasks our target is categorical
(represented as string in Knime) and in
regression tasks our target will be numerical.
• Pay attention to use appropriate nodes –
contain ‘(regression)’ in the node description
for both learner and predictor
• To the right you can find an example for a
random forest based regression workflow
• Build & run the workflow
• Report the results you got
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 47
We will use a numeric
scorer as metrics for
regression differs from
classification metrics The line plot will show
the difference between
predicted value and the
ground truth
The KNIME platform – regression
1. As in our previous tasks we
will start with reading the
data – insert a csv reader
node and read the file:
‘steam data - lesson 2.csv’
2. As before after reading the
data lets first understand
what is our goal – we
would like to validate the
data of P64TI4332 tag.
3. We will do this by using all
other parameters to
predict the value of the
P64TI4332 tag
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 48
The KNIME platform – regression
We can now use the things we
have learned so far to compare
the regression error metrics
(MAE, RMSE, R2) among
various regression algorithms
we will compare:
• linear regression
• polynomial regression
• random forest
• XGBoost
(no need to panic from the number of nodes –
its going to be quite simple )
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 49
outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 50
Not IID – still OK! just pay attention
• When data is not IID we need to make adjustments to:
• Validation method
• Our feature creation process
• Our target(s)
• Data originating groups
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 51
Time series data
• Split by time, not randomly
• Use lagged columns for features
• Use time difference based features
• Be aware of these important terms:
• Horizon of forecast
• Lag of available data
• Seasonality (may be more than one num)
• Trend
• Frequency
• Sampling rate and timing
• Prediction vs. backcast vs. current time regression
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 52
Random split of the data resembles imputation task
Time based separation is appropriate for prediction tasks
The KNIME platform – prediction
1. We will now reuse the
data from the previous
regression task and
redefine our problem as
prediction task
2. Given the current (and
historical) data we should
predict the temperature 4
hours ahead
3. Use the lag column node
to create the appropriate
target column
4. Try to come up with
effective explaining
features
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 53
outline
Recap – logistic regression model
Feature selection in Knime
Classification using random forest & XGBoost
Parameter tuning in Knime
Regression using random forest & XGBoost
What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Mini-Hackathon!
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 54
Machine Learning Course – Day 3
Class mini-hackathon!
Use all what we have learned so far to create a model that
1. Reads the case study XXX.csv file
2. Predicts the current target using the other variables
3. *Predicts tomorrow’s target using today’s features (24h horizon)
4. **Classifies whether tomorrow’s target will be higher or lower than today’s
current target
You may explore any additional nodes that you think relevant
And probably find nodepit.com to be useful
Let the competition begin! (guided classroom competition)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 56
Machine Learning Course – Day 4
outline
Recap - What to do when data is not IID (Identically independently distributed)
Time series unique characteristics
Lag-column node – the long (and obvious) way to create lagged features
Some more useful loop types:
• Column list loops
• Table row to variable loops
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 58
Not IID – still OK! just pay attention
• When data is not IID we need to make adjustments to:
• Validation method
• Our feature creation process
• Our target(s)
• Data originating groups
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 59
Time series data
• Split by time, not randomly
• Use lagged columns for features
• Use time-difference-based features
• Be aware of these important terms:
• Horizon of forecast
• Lag of available data
• Seasonality (may be more than one num)
• Trend
• Frequency
• Sampling rate and timing
• Prediction vs. backcast vs. current time regression
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 60
Random split of the data resembles imputation task
Time based separation is appropriate for prediction tasks
The KNIME platform – prediction
1. We will now reuse the
data from the previous
regression task and
redefine our problem as
prediction task
2. Given the current (and
historical) data we should
predict the target few
hours ahead
3. Use the lag column node
to create the appropriate
feature columns
4. Try to come up with
effective explaining
features
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 61
The KNIME platform – prediction
1. We will now reuse the
data from the previous
regression task and
redefine our problem as
prediction task
2. Given the current (and
historical) data we should
predict the target few
hours ahead
3. Use the lag column node
to create the appropriate
feature columns
4. Try to come up with
effective explaining
features
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 62
The KNIME platform – basic modeling recap
1. Based on our last workflow (from HW) we would now like to create a
prediction task that uses the same variables but with lagged features instead
of those from the same timestamp
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 63
The KNIME platform – basic modeling recap
• While the former example is valid and will work – it will be hard to experiment
with (consider changing the lag-interval from 5 to 4 or creating few lag-
intervals instead of just one)
• Introducing: table row variable loops
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 64
The KNIME platform – basic modeling recap
• But what if we want to create few lag-intervals
• Introducing: column list loops
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 65
The KNIME platform – basic modeling recap
• Now let’s combine it all together…
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 66
Machine Learning Course – Day 5
outline
From “we’re new to Knime” to “we’ve pushed Knime to the edge”
Deep Learning within Knime:
• Constructing the network
• Preparing the data
• Training a learner
• Saving the network and weights
• Predicting new data
Other network types
LSTMs using knime
CNNs using knime
Knime forums
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 68
outline
From “we’re new to Knime” to “we’ve pushed Knime to the edge”
Deep Learning within Knime:
• Constructing the network
• Preparing the data
• Training a learner
• Saving the network and weights
• Predicting new data
Other network types
LSTMs using knime
CNNs using knime
Knime forums (you should really write a blogpost on this part of the lesson)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 69
The KNIME platform – time series modeling with Deep Learning
We will start with the workflow we created during the last lesson
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 70
The KNIME platform – Deep Learning in Knime
• The first network we will construct is a simple feed forward network
• A feed forward network works similarly to the learners that we have trained
before – we just need to define its structure first
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 71
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 72
We will use the same
preprocessing workflow
from the past lesson
We will define the network
architecture using Keras nodes
Externally modeling stage is
very similar to what we have
done before, inside
configuration largely differs
Additional preprocessing for FFN is
minimal - we need to remove all
missing values
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 73
For input layers we will define the
shape of the input
in FFN - this will be the number of
features we feed the model with
For hidden layers we will select the
number of units (neurons) the layer
will contain and the activation function
that we would like to use
For output layers we will set the number
of units (neurons) the layer will contain
and the activation function that we
would like to use,
Note that this will be done according to
our target and not selected by our choice
Note that the type of this activation will be
highly responsible for the model’s final result
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 74
Now let’s configure the
network learner
hyperparametersTake the target column out of
the input columns
Verify that conversion is set to
“from number (double)”
Define the target column on
the relevant tab
Verify that conversion is set to
“from number (double)”
Remember to use a loss function appropriate
to the problem you are trying to model !
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 75
Now let’s configure the
network learner
hyperparameters
In the options tab define
“epochs” – number of
time the model sees all
of the training samples
Set the learning rate –
that’s right inverse relation
to number of epochs
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 76
After executing the network learner
Right click the learner node and
select “view : Learning Monitor”
Let's take a look at the various information that
this view supplies us with…
view accuracy (for classification tasks)
Or loss for any task
Training/validation error
Current epoch & batch
Stop learning process
(and keep results!)
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 77
We need to configure our predictor
(keras executer) to yield the results
we want
And configure our
scorer accordingly
Add an output
to the predictor
Select the last layer’s output as the predictors output
Make sure that output type is numeric so we can use our scorer
Check this box to have the option to run a scorer
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 78
Finally, we have trained the
network, and can save the model
weights for later use
Select a file name, click save and
Run the Keras network writer node!
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 79
Now lets try and read the network
we saved and use it for prediction
Connect the predictor node:
• to the network reader port
• to the validation data
Define the predictor output (as before)
And … Predict!
outline
From “we’re new to Knime” to “we’ve pushed Knime to the edge”
Deep Learning within Knime:
• Constructing the network
• Preparing the data
• Training a learner
• Saving the network and weights
• Predicting new data
Other network types
LSTMs using knime
CNNs using knime
Knime forums (you should really write a blogpost on this part of the lesson)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 80
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 81
By now we can probably wrap
this part as a meta-node
(We can also add more lags)
We will define the new network architecture
using LSTM/CuDNN LSTM nodes
Additional preprocessing for
using LSTMs is more complex
and contains some extra steps
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 82
We will define the new network architecture
using LSTM/CuDNN LSTM nodes
For input layers we will define the shape of
the input in LSTM - this will be
(number of time stamps {3 in our example}) x
(the number of features {9 in our example })
Check this boxes if next node is also LSTM node
Select number
of neurons
For output layers we will set the number of
units (neurons) the layer will contain and the
activation function that we would like to use,
Note that this will be done according to our
target and not selected by our choice
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 83
We will usually want to
normalize our data prior to
using it in a NN to help
convergence of the model
You may consider to normalize the
target as well but keep in mind to de-
normalize it after prediction and before
assessment of model results
We would like the columns to be
ordered by features and not by time
lags so let’s sort the columns…
Column order before sorting
Column order after sorting
Use the actions buttons to create
the sorting order you need
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 84
Now comes the hacky part…
Use “create column collection” to
convert multiple columns to one
collection column
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 85
Now comes hacky part II
Use “data row to image”
to convert the collection column we got in
the previous stage to a 3D array
a.k.a. “Tensor”
(While this stage is not mandatory, it will
enable us to look-into the tensor dimensions
and verify we got the desired results)
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 86
Finally we need to add the target column
back before training the learner
All other steps are the same
as in the FFN workflow!
outline
From “we’re new to Knime” to “we’ve pushed Knime to the edge”
Deep Learning within Knime:
• Constructing the network
• Preparing the data
• Training a learner
• Saving the network and weights
• Predicting new data
Other network types
LSTMs using knime
CNNs using knime
Knime forums (you should really write a blogpost on this part of the lesson)
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 87
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 88
We will define the new network architecture
using 1D convolutions layer nodes, add layers,
and dropout layers
Same additional preprocessing
as for using LSTMs
The KNIME platform – Deep Learning in Knime
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 89
Select number of filters to train
Select filter size
Select stride size (gap between adjacent filters)
Is padding needed? Same=>yes ; normal=> no padding
Creates a skip within the graph
* Verify that all input tensors
have the same dimensions
Randomly erases {drop rate} neurons while training
Deep Learning
You have just competed your first
“Deep Learning for multivariate time
series modeling with Knime” !!!
Of course, this is merely just the
beginning of the journey…
But now the great part comes in:
apply all you have learned to your daily work- you’d be amazed with the things
you can accomplish with your newly acquired skills
Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 90
The content was shared with the
hope to boost learning process for
those making their first steps with
the KNIME platform
Feel free to share your experience and
comments on using it (good or bad)
Mail: nathaniel@post.bgu.ac.il
Or via my LinkedIn page

More Related Content

What's hot

Data Archiving & Purging.pptx
Data Archiving & Purging.pptxData Archiving & Purging.pptx
Data Archiving & Purging.pptxMishika Bharadwaj
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Hayim Makabee
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
Class imbalance problem1
Class imbalance problem1Class imbalance problem1
Class imbalance problem1chs71
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonWes McKinney
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Itai Yaffe
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendSalah Amean
 
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...Dataconomy Media
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best PracicesAndy Petrella
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
 
Digital strategy to transform GE into digital industrial leader
Digital strategy to transform GE into digital industrial leaderDigital strategy to transform GE into digital industrial leader
Digital strategy to transform GE into digital industrial leaderYouji Zhang
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Matteo Manca
 

What's hot (20)

Data Archiving & Purging.pptx
Data Archiving & Purging.pptxData Archiving & Purging.pptx
Data Archiving & Purging.pptx
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
Knime
KnimeKnime
Knime
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
Class imbalance problem1
Class imbalance problem1Class imbalance problem1
Class imbalance problem1
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trend
 
Machine learning & Time Series Analysis
Machine learning & Time Series AnalysisMachine learning & Time Series Analysis
Machine learning & Time Series Analysis
 
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
Digital strategy to transform GE into digital industrial leader
Digital strategy to transform GE into digital industrial leaderDigital strategy to transform GE into digital industrial leader
Digital strategy to transform GE into digital industrial leader
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 

Similar to Machine Learning Course Day 1 Recap

Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxCopyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxdickonsondorris
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuSeokhyun Yoon
 
Salesforce-Cloud computing service as a software(SaaS) Group 7.docx
Salesforce-Cloud computing service as a software(SaaS) Group 7.docxSalesforce-Cloud computing service as a software(SaaS) Group 7.docx
Salesforce-Cloud computing service as a software(SaaS) Group 7.docxjeffsrosalyn
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLBryan Bischof
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra
 
Kaitie Watson Lab 2 2013
Kaitie Watson Lab 2 2013Kaitie Watson Lab 2 2013
Kaitie Watson Lab 2 2013Kaitie Watson
 
Assignment Week 1.docDue by 11pm June 30th Chapter 1.docx
Assignment Week 1.docDue by 11pm June 30th Chapter 1.docxAssignment Week 1.docDue by 11pm June 30th Chapter 1.docx
Assignment Week 1.docDue by 11pm June 30th Chapter 1.docxssuser562afc1
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationKate Subramanian
 
Intelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modelingIntelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modelingAlessio Villardita
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
BTE 320-498 Summer 2017 Take Home Exam (200 poi.docx
BTE 320-498 Summer 2017 Take Home Exam (200 poi.docxBTE 320-498 Summer 2017 Take Home Exam (200 poi.docx
BTE 320-498 Summer 2017 Take Home Exam (200 poi.docxAASTHA76
 
6957 June13 exam_paper
6957 June13 exam_paper6957 June13 exam_paper
6957 June13 exam_paperjom1987
 
Final exam 2011 spring
Final exam 2011 springFinal exam 2011 spring
Final exam 2011 springSou Tibon
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniquesijtsrd
 
Stock Market Analysis and Prediction (1) (2).pdf
Stock Market Analysis and Prediction (1) (2).pdfStock Market Analysis and Prediction (1) (2).pdf
Stock Market Analysis and Prediction (1) (2).pdfdigitallynikitasharm
 
01VD062009003760042.pdf
01VD062009003760042.pdf01VD062009003760042.pdf
01VD062009003760042.pdfSunilMatsagar1
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docxrohithprabhas1
 
ENGR 131 Elementary Computer ProgrammingTeam IN – Instructor
ENGR 131  Elementary Computer ProgrammingTeam IN – InstructorENGR 131  Elementary Computer ProgrammingTeam IN – Instructor
ENGR 131 Elementary Computer ProgrammingTeam IN – InstructorTanaMaeskm
 

Similar to Machine Learning Course Day 1 Recap (20)

Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxCopyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
 
Machine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dkuMachine learning and_neural_network_lecture_slide_ece_dku
Machine learning and_neural_network_lecture_slide_ece_dku
 
Salesforce-Cloud computing service as a software(SaaS) Group 7.docx
Salesforce-Cloud computing service as a software(SaaS) Group 7.docxSalesforce-Cloud computing service as a software(SaaS) Group 7.docx
Salesforce-Cloud computing service as a software(SaaS) Group 7.docx
 
ODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in MLODSC West 2021 – Composition in ML
ODSC West 2021 – Composition in ML
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
 
Kaitie Watson Lab 2 2013
Kaitie Watson Lab 2 2013Kaitie Watson Lab 2 2013
Kaitie Watson Lab 2 2013
 
Assignment Week 1.docDue by 11pm June 30th Chapter 1.docx
Assignment Week 1.docDue by 11pm June 30th Chapter 1.docxAssignment Week 1.docDue by 11pm June 30th Chapter 1.docx
Assignment Week 1.docDue by 11pm June 30th Chapter 1.docx
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
 
Intelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modelingIntelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modeling
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
CS8592-OOAD Question Bank
CS8592-OOAD  Question BankCS8592-OOAD  Question Bank
CS8592-OOAD Question Bank
 
BTE 320-498 Summer 2017 Take Home Exam (200 poi.docx
BTE 320-498 Summer 2017 Take Home Exam (200 poi.docxBTE 320-498 Summer 2017 Take Home Exam (200 poi.docx
BTE 320-498 Summer 2017 Take Home Exam (200 poi.docx
 
6957 June13 exam_paper
6957 June13 exam_paper6957 June13 exam_paper
6957 June13 exam_paper
 
Final exam 2011 spring
Final exam 2011 springFinal exam 2011 spring
Final exam 2011 spring
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
 
Stock Market Analysis and Prediction (1) (2).pdf
Stock Market Analysis and Prediction (1) (2).pdfStock Market Analysis and Prediction (1) (2).pdf
Stock Market Analysis and Prediction (1) (2).pdf
 
01VD062009003760042.pdf
01VD062009003760042.pdf01VD062009003760042.pdf
01VD062009003760042.pdf
 
Minitab.pptx
Minitab.pptxMinitab.pptx
Minitab.pptx
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
 
ENGR 131 Elementary Computer ProgrammingTeam IN – Instructor
ENGR 131  Elementary Computer ProgrammingTeam IN – InstructorENGR 131  Elementary Computer ProgrammingTeam IN – Instructor
ENGR 131 Elementary Computer ProgrammingTeam IN – Instructor
 

More from Nathaniel Shimoni

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Current and future challenges in data science
Current and future challenges in data scienceCurrent and future challenges in data science
Current and future challenges in data scienceNathaniel Shimoni
 
Introduction to competitive data science
Introduction to competitive data scienceIntroduction to competitive data science
Introduction to competitive data scienceNathaniel Shimoni
 
Starting data science with kaggle.com
Starting data science with kaggle.comStarting data science with kaggle.com
Starting data science with kaggle.comNathaniel Shimoni
 

More from Nathaniel Shimoni (6)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Current and future challenges in data science
Current and future challenges in data scienceCurrent and future challenges in data science
Current and future challenges in data science
 
ML whitepaper v0.2
ML whitepaper v0.2ML whitepaper v0.2
ML whitepaper v0.2
 
My path to data science
My path to data scienceMy path to data science
My path to data science
 
Introduction to competitive data science
Introduction to competitive data scienceIntroduction to competitive data science
Introduction to competitive data science
 
Starting data science with kaggle.com
Starting data science with kaggle.comStarting data science with kaggle.com
Starting data science with kaggle.com
 

Recently uploaded

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Machine Learning Course Day 1 Recap

  • 1. Machine Learning course with Nathaniel Shimoni Aug 2019
  • 3. outline Introduction to the KNIME platform Data IO (reading/writing data from/to files) Basic data manipulation (data preprocessing) Basic data exploration and plotting Initial data modeling – logistic regression Exploration of model results Using complex features to improve model results Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 3
  • 4. The KNIME platform Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 4 KNIME platform is an interactive tool that allows data processing, modeling, visualization and many more… KNIME explorer Node repository Work area Execution console Workflow outline
  • 5. The KNIME platform – Input and output Reading and writing data from/to files 1. go to the node repository and search for “read” 2. Select csv reader and drag to work area 3. Right click the node and select configure 4. Browse and select the file that you would like to open (throughout this tutorial we will use the file “Churn_Modelling.csv” in the course materials folder) 5. We can customize the file separators Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 5
  • 6. The KNIME platform – Input and output Reading and writing data from/to files 6. We can select number of lines to skip and number of lines to read 7. And also customize the encoding type (this can be useful for some file type/languages) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 6
  • 7. The KNIME platform – Input and output Finally after we are done configuring the node we can run it and explore the resulting data: 1. Right click the node 2. Click execute or press F7 to run the cell 3. Once reading is complete click File Table to view the result – expect to see the following output: Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 7
  • 8. The KNIME platform – Input and output 1. Notice the column types (D for double, I for integer, S for string) 2. We can also view the data dimensions (10k rows and 13 columns) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 8
  • 9. The KNIME platform – Input and output 1. We can also get some basic information about the data we loaded such as minimal and maximal value Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 9
  • 10. The KNIME platform – data manipulation Filtering Sometimes we want to look/process/model only part of the data so we need to filter out the rest 1. Add a row filter node to the workflow 2. Filter out all the rows in which age is lower than 30 3. Note that you can choose whether to include or exclude the rows that satisfy the condition Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 10
  • 11. The KNIME platform – data manipulation Filtering (cont.) We can also use various logical rules for filtering: 1. Add a rule based row filter node 2. Configure the new node by adding one or more logical rules 3. Add “=>“ key combination and either TRUE or FALSE as an output Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 11
  • 12. The KNIME platform – data manipulation Replacing values Sometimes we would like to replace specific values with another alternative value this would even be necessary when: 1. We get text error messages within a numerical feature column 2. We have outliers that we want to change to another value 3. We have some missing of fixed values (read errors) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 12
  • 13. The KNIME platform – data manipulation Replacing values 1. Select string replacer node 2. Replace any name that starts with ‘A’ to ‘Ace’ Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 13
  • 14. The KNIME platform – data manipulation Missing values: We have several ways to deal with missing values depending on type of the feature we can: 1. Fill in with a specific value 2. Replace with following/previous good value 3. Fill in based on some statistical value (average, mode etc.) 4. Fill in with an interpolation or with a moving average * Use the missing value node to experiment with all of these options Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 14
  • 15. The KNIME platform – data manipulation Pivoting / unpivoting: In some cases we would like to aggregate data per each feature value we can use pivoting node to accomplish this task In this example we will check the average age and 90th percentile of age for male / female in each country 1. Add pivoting node to the workflow 2. Add ‘geography’ in the groups tab 3. Add ‘gender ‘ in the pivots tab 4. Add age to the aggregation tab (twice) 5. Define mean and percentile as desired aggregation functions Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 15
  • 16. The KNIME platform – data manipulation Pivoting / unpivoting: We get 3 views from this process 1. A pivot table view 2. A group-totals view (summarizes rows) 3. A pivot totals view (summarizes columns) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 16
  • 17. The KNIME platform – data manipulation Renaming features Sometimes our column names are tricky to remember or even distinguish – in such cases it can be a good idea to rename these columns to contain a more human-friendly name 1. Add a rename node to the workflow 2. Change the names of columns to human-understandable names Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 17
  • 18. The KNIME platform – data manipulation Creating new features In most cases the data we get when we start our research process will not contain the optimal features to begin with and we would like to create additional features that will result in a better predictive power 1. Add a formula node 2. Create a new feature that holds the following formula: balance / salary Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 18
  • 19. The KNIME platform – data manipulation Dealing with textual and categorical features For most algorithms data should not contain any textual feature for processing. This can be solved using “one hot encoding” 1. Add a ‘one to many’ node 2. Add the desired string features to selection 3. Run the node and check the output table 4. You should now see columns that contain binary values for each of the values in these string features Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 19
  • 20. The KNIME platform – data exploration Before modeling our data we should first get to know it better This can be done in many ways – using plots, exploring statistical aggregations and extremes of our data, and also via basic modeling and viewing the results and errors We will start with scatter plots There are many scatter plot nodes I recommend using the “2D-3D scatter plot” node for exploration as its interactive mode is very convenient and fits this stage well. Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 20
  • 21. The KNIME platform – data exploration Scatter plots 1. Add a “2D-3D scatter plot” node 2. Select columns for your plot (this could be changed later in the interactive mode) 3. Make sure to adjust number of rows to display according to your need Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 21
  • 22. The KNIME platform – data exploration Scatter plots 4. Run the node 5. Select the relevant feature for each of the axes 6. Use the target column in color values 7. Note that you can filter the presented results using the sliders in the bottom right corner 8. You can rotate the plot by dragging one of the axis to the desired direction Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 22
  • 23. The KNIME platform – basic modeling and concepts Its finally time to create our model – a Logistic Regression model Use the Partitioning (aka train test split) node to create train and validation sets This node has 2 return ports One for our training set – the data our model will be trained on And the other for our validation (or test) the data that the model will not be exposed to and we will test the model performance with Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 23
  • 24. The KNIME platform – basic modeling and concepts 1. Add a logistic regression learner node to the workflow 2. Set our desired target column 3. Select the desired explaining variables Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 24
  • 25. The KNIME platform – basic modeling and concepts 1. Add a logistic regression predictor to the workflow 2. Make sure to connect the predictor to both the model and the validation partition we created 3. We can define if predictions will be probabilities or the predicted category 4. The flow should look like the one below: Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 25
  • 26. The KNIME platform – result analysis & error analysis Checking our results: 1. Add a scorer node 2. Define the target column and the predicted column 3. run the node Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 26
  • 27. The KNIME platform – result analysis & error analysis Checking our results: 1. We can now view either the confusion matrix or the accuracy results (shown in the bottom of the list) 2. A summarized view of both is available using the view in the middle of the list Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 27
  • 28. The KNIME platform – complex feature creation Now we can use all what we have done so far to create some advanced features and improve our results Your task: Use all what we have learned so far to improve the model results You may explore additional exploration and transformation nodes At this point do not use other learner-predictor nodes Let the competition begin! Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 28
  • 30. outline Recap – logistic regression model Feature selection in Knime Classification using random forest & XGBoost Hyper-Parameter tuning in Knime Regression using random forest & XGBoost What to do when data is not IID (Identically independently distributed) Time series unique characteristics Mini-Hackathon! Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 30
  • 31. The KNIME platform – basic modeling recap 1. Using a logistic regression model we got initial result of ~80% accuracy and with some effort we managed to improve it to 83% accuracy still using logistic regression model Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 31
  • 32. The KNIME platform – basic modeling - feature selection It turns out that we can get to 83% accuracy with only 7 features (to improve our score we have to ignore the ‘gender’, ‘hasCrCredit’, and ‘balance’ features) But, how can we decide which features to use and which to ignore? Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 32
  • 33. The KNIME platform – basic modeling - feature selection loop introducing feature selection loops! Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 33 First we run some experiments with different combinations Then we filter in only the relevant columns Finally we run the model with best feature combination
  • 34. The KNIME platform – feature selection loop Now lets do this one step at a time… First lets reconstruct our experiment: 1. Read the data from the previous exercise to a new workflow (“churn_modeling.csv” file) 2. Encode the target column to be nominal using ‘number to string’ node 3. Remove outliers from the data using the ‘numeric outliers’ node 4. Fill in missing values using the ‘missing’ node 5. Split the data to training and testing using partitioning node (use the same method and seed from lesson 1 for a valid comparison) 6. Add ‘logistic regression learner’ and ‘logistic regression predictor’ nodes 7. Add a ‘scorer’ node so that we can view the metrics of our model’s predictions Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 34
  • 35. The KNIME platform – feature selection loop And… add the feature selection loop: 8. Add a ‘feature selection loop start’ node before the partitioning node 9. Add a ‘feature selection loop end’ after the scorer node 10. Right click the scorer node and select “show flow variables ports” you will notice two red dots above the node 11. Connect the scorer variable output to the feature selection loop end input port (red dots) 12. Add a ‘feature selection filter’ node Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 35
  • 36. The KNIME platform – feature selection loop 13. Copy and paste the partitioning, learner, predictor and scorer nodes 14. Connect the new partitioning node to the feature selection filter node You should now have a workflow similar to the one below: Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 36
  • 37. outline Recap – logistic regression model Feature selection in Knime Classification using random forest & XGBoost Hyper-Parameter tuning in Knime Regression using random forest & XGBoost What to do when data is not IID (Identically independently distributed) Time series unique characteristics Mini-Hackathon! Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 37
  • 38. The KNIME platform – random forest model Now that we are familiar with some tree-based models we can give them a try… 1. Read the data from the previous exercise to a new workflow (“churn_modeling.csv” file) 2. Encode the target column to be nominal using number to string node 3. Add features as you wish 4. Add a random forest learner and predictor nodes 5. Tune the number of models, maximal depth, minimum node size and split criteria as you wish 6. Add a scorer node and ROC-curve node 7. Run the workflow and explore results Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 38
  • 39. The KNIME platform – XGBoost model Now that we are familiar with some tree-based models we can give them a try… 1. Read the data from the previous exercise to a new workflow (“churn_modeling.csv” file) 2. Encode the target column to be nominal using number to string node 3. Add features as you wish 4. Add a XGBoost learner and predictor nodes 5. Tune the number of models, maximal depth, minimum node size and split criteria as you wish 6. Add a scorer node and ROC-curve node 7. Run the workflow and explore results Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 39
  • 40. The KNIME platform – comparing XGBoost vs. RF models • Once added the two new models our workflow should look as follows: Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 40
  • 41. outline Recap – logistic regression model Feature selection in Knime Classification using random forest & XGBoost Hyper-Parameter tuning in Knime Regression using random forest & XGBoost What to do when data is not IID (Identically independently distributed) Time series unique characteristics Mini-Hackathon! Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 41
  • 42. The KNIME platform – (hyper)parameter optimization • Well results look great, in comparison with our previous logistic regression models, but are they optimal? • To answer this question we may wish to use parameter optimization loop nodes Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 42
  • 43. The KNIME platform – (hyper)parameter optimization 1. With our previous random forest and xgboost models we will add two parameter optimization loop – the below steps are for the Random forest model please complete the xgboost with similar steps 2. Add a parameter optimization loop start node and connect it to the variable in-port 3. Add a parameter optimization loop end and connect the variable out-port of the scorer to its in-port Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 43
  • 44. The KNIME platform – (hyper)parameter optimization 4. Configure the hyper parameter loop start node: • Add a subsample, min_child_weight & max_depth parameters with relevant ranges • Select number of iterations for random search or step-size for brute-force search 5. Configure the learner to use the parameters Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 44
  • 45. The KNIME platform – (hyper)parameter optimization 6. Configure the parameter optimization loop end set objective function to Accuracy and optimization to maximized 7. Run the loop from the loop end node 8. View the best configuration or all configuration results Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 45
  • 46. outline Recap – logistic regression model Feature selection in Knime Classification using random forest & XGBoost Parameter tuning in Knime Regression using random forest & XGBoost What to do when data is not IID (Identically independently distributed) Time series unique characteristics Mini-Hackathon! Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 46
  • 47. The KNIME platform – regression • The main difference between what we have done so far for classification problems to what we will do with regression tasks is the characteristics of the target column • In classification tasks our target is categorical (represented as string in Knime) and in regression tasks our target will be numerical. • Pay attention to use appropriate nodes – contain ‘(regression)’ in the node description for both learner and predictor • To the right you can find an example for a random forest based regression workflow • Build & run the workflow • Report the results you got Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 47 We will use a numeric scorer as metrics for regression differs from classification metrics The line plot will show the difference between predicted value and the ground truth
  • 48. The KNIME platform – regression 1. As in our previous tasks we will start with reading the data – insert a csv reader node and read the file: ‘steam data - lesson 2.csv’ 2. As before after reading the data lets first understand what is our goal – we would like to validate the data of P64TI4332 tag. 3. We will do this by using all other parameters to predict the value of the P64TI4332 tag Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 48
  • 49. The KNIME platform – regression We can now use the things we have learned so far to compare the regression error metrics (MAE, RMSE, R2) among various regression algorithms we will compare: • linear regression • polynomial regression • random forest • XGBoost (no need to panic from the number of nodes – its going to be quite simple ) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 49
  • 50. outline Recap – logistic regression model Feature selection in Knime Classification using random forest & XGBoost Parameter tuning in Knime Regression using random forest & XGBoost What to do when data is not IID (Identically independently distributed) Time series unique characteristics Mini-Hackathon! Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 50
  • 51. Not IID – still OK! just pay attention • When data is not IID we need to make adjustments to: • Validation method • Our feature creation process • Our target(s) • Data originating groups Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 51
  • 52. Time series data • Split by time, not randomly • Use lagged columns for features • Use time difference based features • Be aware of these important terms: • Horizon of forecast • Lag of available data • Seasonality (may be more than one num) • Trend • Frequency • Sampling rate and timing • Prediction vs. backcast vs. current time regression Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 52 Random split of the data resembles imputation task Time based separation is appropriate for prediction tasks
  • 53. The KNIME platform – prediction 1. We will now reuse the data from the previous regression task and redefine our problem as prediction task 2. Given the current (and historical) data we should predict the temperature 4 hours ahead 3. Use the lag column node to create the appropriate target column 4. Try to come up with effective explaining features Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 53
  • 54. outline Recap – logistic regression model Feature selection in Knime Classification using random forest & XGBoost Parameter tuning in Knime Regression using random forest & XGBoost What to do when data is not IID (Identically independently distributed) Time series unique characteristics Mini-Hackathon! Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 54
  • 56. Class mini-hackathon! Use all what we have learned so far to create a model that 1. Reads the case study XXX.csv file 2. Predicts the current target using the other variables 3. *Predicts tomorrow’s target using today’s features (24h horizon) 4. **Classifies whether tomorrow’s target will be higher or lower than today’s current target You may explore any additional nodes that you think relevant And probably find nodepit.com to be useful Let the competition begin! (guided classroom competition) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 56
  • 58. outline Recap - What to do when data is not IID (Identically independently distributed) Time series unique characteristics Lag-column node – the long (and obvious) way to create lagged features Some more useful loop types: • Column list loops • Table row to variable loops Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 58
  • 59. Not IID – still OK! just pay attention • When data is not IID we need to make adjustments to: • Validation method • Our feature creation process • Our target(s) • Data originating groups Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 59
  • 60. Time series data • Split by time, not randomly • Use lagged columns for features • Use time-difference-based features • Be aware of these important terms: • Horizon of forecast • Lag of available data • Seasonality (may be more than one num) • Trend • Frequency • Sampling rate and timing • Prediction vs. backcast vs. current time regression Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 60 Random split of the data resembles imputation task Time based separation is appropriate for prediction tasks
  • 61. The KNIME platform – prediction 1. We will now reuse the data from the previous regression task and redefine our problem as prediction task 2. Given the current (and historical) data we should predict the target few hours ahead 3. Use the lag column node to create the appropriate feature columns 4. Try to come up with effective explaining features Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 61
  • 62. The KNIME platform – prediction 1. We will now reuse the data from the previous regression task and redefine our problem as prediction task 2. Given the current (and historical) data we should predict the target few hours ahead 3. Use the lag column node to create the appropriate feature columns 4. Try to come up with effective explaining features Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 62
  • 63. The KNIME platform – basic modeling recap 1. Based on our last workflow (from HW) we would now like to create a prediction task that uses the same variables but with lagged features instead of those from the same timestamp Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 63
  • 64. The KNIME platform – basic modeling recap • While the former example is valid and will work – it will be hard to experiment with (consider changing the lag-interval from 5 to 4 or creating few lag- intervals instead of just one) • Introducing: table row variable loops Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 64
  • 65. The KNIME platform – basic modeling recap • But what if we want to create few lag-intervals • Introducing: column list loops Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 65
  • 66. The KNIME platform – basic modeling recap • Now let’s combine it all together… Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 66
  • 68. outline From “we’re new to Knime” to “we’ve pushed Knime to the edge” Deep Learning within Knime: • Constructing the network • Preparing the data • Training a learner • Saving the network and weights • Predicting new data Other network types LSTMs using knime CNNs using knime Knime forums Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 68
  • 69. outline From “we’re new to Knime” to “we’ve pushed Knime to the edge” Deep Learning within Knime: • Constructing the network • Preparing the data • Training a learner • Saving the network and weights • Predicting new data Other network types LSTMs using knime CNNs using knime Knime forums (you should really write a blogpost on this part of the lesson) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 69
  • 70. The KNIME platform – time series modeling with Deep Learning We will start with the workflow we created during the last lesson Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 70
  • 71. The KNIME platform – Deep Learning in Knime • The first network we will construct is a simple feed forward network • A feed forward network works similarly to the learners that we have trained before – we just need to define its structure first Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 71
  • 72. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 72 We will use the same preprocessing workflow from the past lesson We will define the network architecture using Keras nodes Externally modeling stage is very similar to what we have done before, inside configuration largely differs Additional preprocessing for FFN is minimal - we need to remove all missing values
  • 73. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 73 For input layers we will define the shape of the input in FFN - this will be the number of features we feed the model with For hidden layers we will select the number of units (neurons) the layer will contain and the activation function that we would like to use For output layers we will set the number of units (neurons) the layer will contain and the activation function that we would like to use, Note that this will be done according to our target and not selected by our choice Note that the type of this activation will be highly responsible for the model’s final result
  • 74. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 74 Now let’s configure the network learner hyperparametersTake the target column out of the input columns Verify that conversion is set to “from number (double)” Define the target column on the relevant tab Verify that conversion is set to “from number (double)” Remember to use a loss function appropriate to the problem you are trying to model !
  • 75. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 75 Now let’s configure the network learner hyperparameters In the options tab define “epochs” – number of time the model sees all of the training samples Set the learning rate – that’s right inverse relation to number of epochs
  • 76. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 76 After executing the network learner Right click the learner node and select “view : Learning Monitor” Let's take a look at the various information that this view supplies us with… view accuracy (for classification tasks) Or loss for any task Training/validation error Current epoch & batch Stop learning process (and keep results!)
  • 77. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 77 We need to configure our predictor (keras executer) to yield the results we want And configure our scorer accordingly Add an output to the predictor Select the last layer’s output as the predictors output Make sure that output type is numeric so we can use our scorer Check this box to have the option to run a scorer
  • 78. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 78 Finally, we have trained the network, and can save the model weights for later use Select a file name, click save and Run the Keras network writer node!
  • 79. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 79 Now lets try and read the network we saved and use it for prediction Connect the predictor node: • to the network reader port • to the validation data Define the predictor output (as before) And … Predict!
  • 80. outline From “we’re new to Knime” to “we’ve pushed Knime to the edge” Deep Learning within Knime: • Constructing the network • Preparing the data • Training a learner • Saving the network and weights • Predicting new data Other network types LSTMs using knime CNNs using knime Knime forums (you should really write a blogpost on this part of the lesson) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 80
  • 81. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 81 By now we can probably wrap this part as a meta-node (We can also add more lags) We will define the new network architecture using LSTM/CuDNN LSTM nodes Additional preprocessing for using LSTMs is more complex and contains some extra steps
  • 82. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 82 We will define the new network architecture using LSTM/CuDNN LSTM nodes For input layers we will define the shape of the input in LSTM - this will be (number of time stamps {3 in our example}) x (the number of features {9 in our example }) Check this boxes if next node is also LSTM node Select number of neurons For output layers we will set the number of units (neurons) the layer will contain and the activation function that we would like to use, Note that this will be done according to our target and not selected by our choice
  • 83. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 83 We will usually want to normalize our data prior to using it in a NN to help convergence of the model You may consider to normalize the target as well but keep in mind to de- normalize it after prediction and before assessment of model results We would like the columns to be ordered by features and not by time lags so let’s sort the columns… Column order before sorting Column order after sorting Use the actions buttons to create the sorting order you need
  • 84. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 84 Now comes the hacky part… Use “create column collection” to convert multiple columns to one collection column
  • 85. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 85 Now comes hacky part II Use “data row to image” to convert the collection column we got in the previous stage to a 3D array a.k.a. “Tensor” (While this stage is not mandatory, it will enable us to look-into the tensor dimensions and verify we got the desired results)
  • 86. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 86 Finally we need to add the target column back before training the learner All other steps are the same as in the FFN workflow!
  • 87. outline From “we’re new to Knime” to “we’ve pushed Knime to the edge” Deep Learning within Knime: • Constructing the network • Preparing the data • Training a learner • Saving the network and weights • Predicting new data Other network types LSTMs using knime CNNs using knime Knime forums (you should really write a blogpost on this part of the lesson) Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 87
  • 88. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 88 We will define the new network architecture using 1D convolutions layer nodes, add layers, and dropout layers Same additional preprocessing as for using LSTMs
  • 89. The KNIME platform – Deep Learning in Knime Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 89 Select number of filters to train Select filter size Select stride size (gap between adjacent filters) Is padding needed? Same=>yes ; normal=> no padding Creates a skip within the graph * Verify that all input tensors have the same dimensions Randomly erases {drop rate} neurons while training
  • 90. Deep Learning You have just competed your first “Deep Learning for multivariate time series modeling with Knime” !!! Of course, this is merely just the beginning of the journey… But now the great part comes in: apply all you have learned to your daily work- you’d be amazed with the things you can accomplish with your newly acquired skills Aug 4th 2019 Machine learning and data science with KNIME – Nathaniel Shimoni 90
  • 91. The content was shared with the hope to boost learning process for those making their first steps with the KNIME platform Feel free to share your experience and comments on using it (good or bad) Mail: nathaniel@post.bgu.ac.il Or via my LinkedIn page

Editor's Notes

  1. Lets read some data from a file Now lets save this data to another file
  2. Lets read some data from a file Now lets save this data to another file
  3. Lets read some data from a file Now lets save this data to another file
  4. Lets read some data from a file Now lets save this data to another file
  5. Lets read some data from a file Now lets save this data to another file
  6. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  7. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  8. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  9. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  10. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  11. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  12. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  13. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  14. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  15. From the file we have read filter only those rows that hold: a specific value, a range of values, does not meet a specific condition, meet a set of conditions Lts explore the data types that we get for each column (convert between different type) For some of the columns we have undesired values, lets replace them (note that data type is not changed automatically) For some of the rows we have missing values – lets first exclude them, sometimes we wouldn’t like to omit missing values rows but rather fill them with values, this is known as data imputation In some cases it can be a good idea to look at summarized data per category- like in excel we call that pivoting, and can also perform unpivoting Now rename the column to… XXX Lets create a new column that holds x-y, x/y, Lets create a new column by splitting column x Lets create a new column by combining two or more columns
  16. Lets start with basic exploration of our data Plot the histogram of column x Plot a bar plot of the number of occurrences of value x per category For each of the features in our data create a scatter plot with the target as y and the feature as x axis Also answer: what is the number of unique values for this feature? What is the min, max, mean(average), median and percentiles (10,20,…,90)
  17. Lets start with basic exploration of our data Plot the histogram of column x Plot a bar plot of the number of occurrences of value x per category For each of the features in our data create a scatter plot with the target as y and the feature as x axis Also answer: what is the number of unique values for this feature? What is the min, max, mean(average), median and percentiles (10,20,…,90)
  18. Lets start with basic exploration of our data Plot the histogram of column x Plot a bar plot of the number of occurrences of value x per category For each of the features in our data create a scatter plot with the target as y and the feature as x axis Also answer: what is the number of unique values for this feature? What is the min, max, mean(average), median and percentiles (10,20,…,90)
  19. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  20. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  21. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  22. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  23. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  24. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  25. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  26. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  27. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  28. Lets read some data from a file Now lets save this data to another file
  29. Lets read some data from a file Now lets save this data to another file
  30. Lets read some data from a file Now lets save this data to another file
  31. Lets read some data from a file Now lets save this data to another file
  32. Lets read some data from a file Now lets save this data to another file
  33. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  34. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  35. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  36. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  37. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  38. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  39. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  40. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  41. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  42. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  43. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  44. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  45. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  46. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  47. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  48. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  49. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  50. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  51. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  52. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  53. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  54. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)
  55. Now that we know how our data looks like lets create our very first ML model – a logistic regression model Before we do we need to decide how will we now that our model is good for unseen data (use partitioning) Split the data to 70% train and 30% validation: Use random splitting Use selection of the first 70% for training Use selection by another criteria (stratification)