SlideShare a Scribd company logo
1 of 71
Download to read offline
Machine Learning
in
Finance
Stefan Duprey
Statistical learning scope
Data Mining
Exploration
Univariate
Pie chart,
Histogram, etc…
Multivariate
Feature
selection and
transformation
Modelling
Clustering
Partitive
K-means
Gaussian
mixture model
SOMHierarchical
Classification
Discriminant
Decision Tree
Neural Network
Support Vector
Machine
Regression
Classifier
for
Credit Scoring
Decision rule for Support Vector Machines
A quadratic optimization problem !
SVM non-linear case
SVM summary
 avoid the plague of local minima
 the engineer’s expertise is in the appropriate
kernel (beware of overfitting, cross-validate and
experiment your own kernels)
 only classify between 2 class (one vs all or one
vs one methodology)
 a reference in use cases in computer vision,
bio informatics
Neural Network : what are they ?
Neural Network summary
Gradient descent algorithm : stochastic, mini-
batch, conjugate
plague of local minima : difficult to calibrate
 the engineer’s expertise is in the appropriate
architecture (beware of overfitting, cross-
validate and experiment your own architecture
‘deeper learning’)
>> t = classregtree(X,Y);
>> Y_pred = t(X_new);
Regression Trees
Forests of Trees
predictors
up
down
down
up
up
up
down
up
down
up
up
.
.
.
response
Y
>> t = TreeBagger(nb_trees,X,Y);
>> [Y_pred,allpred] = predict(t,X_new);
Splitting criteria : information gain
Why a regression and what is a
regression ?
A regression is a model to explain and predict a process :
supervised machine learning
Why regularizing ?• Terms are correlated
• The regression matrix becomes close to singular
• Badly conditioned matrix yield poor numerical results
• Bayesian interpretation
Likelihood
Regularisation term
Posterior
Prior
We rather minimize
Why Lasso and Elastic Net?• No method owns the truth
• Reduce the number of predictors in a regression model
• Identify important predictors
• Select among redundant predictors
• Produce shrinkage estimates with potentially lower
predictive errors than ordinary least squares (cross
validation)
Lasso :
Elastic Net :
Ensemble learning
Why ensemble learning ?
‘melding results from many weak learners into one high-
quality ensemble predictor’
Main differences between Bagging and
Boosting
BAGGING BOOSTING
Bagging is randomness Boosting is adaptative and deterministic
Bootstrapped sample Complete initial sample
Each model must perform well over the whole
sample
Each model has to perform better than the
previous one on outliers
Every model have the same weight Models are weighted according to their
performance
Defining features
Advantages and disadvantages
BAGGING BOOSTING
Reducing model variance Variance might rise
Not a simple model anymore Not a simple model anymore
Can be parallelized Can not be parallelized
Less noise over fitting : better than boosting
when noise
Models are weighted according to their
performance
Bagging is usually efficienter than boosting On specific cases, boosting might achieve a far
better accuracy
Big Data
Learning
over
Distributed Data
Distributed memory : MDCS & the MAP/REDUCE
paradigm
Big data & Machine learning
“It’s not who has the best algorithm that wins . It’s who
has the most data”
Quick overview
Exploratory analysis
Clustering
Classification
Aims of this presentation
 awareness of the range of methods for
multivariate data
 reasonable understanding of algorithms
Data Mining
• Exploratory Data
Analysis
• Clustering
• Classification
• Regression
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
• Categorical
• Ordinal
• Discontinuous
Exploratory Data Analysis
 Why exploratory analysis ? Can be used to:
o Graphical view
o “Pre filtering”: preliminary data trends and behaviour
• Means:
• Multivariate Plots
• Features transformation : principal component analysis, factor model
• Features selection : stepwise optimization
Data Exploration: Getting an overview of individual
variables
Basic Histogram
>> hist(x(:,1))
Custom Number of Bins
>> hist(x(:,1),50)
By Group
>> hist(byGroup,20)
Gaussian fit
>> histfit(x(:,2))
3D Histogram
>> hist3(x(:,1:2))
Scatter Plot
>>gscatter(x(:,1),x(:,2),groups)
Pie Chart
>> pie3(proportions,groups)
>> X = [MPG,Acceleration,Displacement,Weight,Horsepower];
Box Plot
>> boxplot(x(:,1),groups)
5 10 15 20 25 30 35 40 45 50
0
10
20
30
40
50
60
70
80
5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
6 8 10 12 14 16 18 20 22 24 26
0
10
20
30
40
50
60
5 10 15 20 25 30 35 40 45 50
8
10
12
14
16
18
20
22
24
26
3
4
5
6
8
10
15
20
25
30
35
40
45
3 4 5 6 8
5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
byGroup(:,1)
byGroup(:,2)
Group6
Group5
Group8
Group3
Group4
Data Exploration: Getting an overview of multiple
variables
Plot Matrix by Group
>> gplotmatrix(x,x,groups)
Parallel Coordinates Plot
>> parallelcoords(x,'Group',groups)
Andrews’ Plot
>> andrewsplot(x,'Group',groups)
Glyph Plot
>> glyphplot(x)
Chernoff Faces
>> glyphplot(x,'Glyph','face')
MPG Acceleration Displacement Weight Horsepow er
MPGAccelerationDisplacementWeightHorsepower
50 1001502002000 4000200 40010 2020 40
50
100
150
200
2000
4000
200
400
10
20
20
40
MPG Acceleration Displacement Weight Horsepower
-3
-2
-1
0
1
2
3
4
CoordinateValue
4
6
8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-8
-6
-4
-2
0
2
4
6
8
t
f(t)
4
6
8
chevrolet chevelle malibu buick skylark 320 plymouth satellite
amc rebel sst ford torino ford galaxie 500
chevrolet impala plymouth fury iii pontiac catalina
chevrolet chevelle malibubuick skylark 320 plymouth satellite
amc rebel sst ford torino ford galaxie 500
chevrolet impala plymouth fury iii pontiac catalina
Principal component analysis
1 2 3 4 5 6 7 8 9 10
0
0.005
0.01
0.015
0.02
0.0249
Principal Component
VarianceExplained(%)
0%
20%
40%
60%
80%
100%
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
Component 1
CommerzbankDeutscheBank
Infineon
ThyssenKruppMANDaimlerHeidelbergerAllianzDeutscheBahnBMWSalzgitterSiemensDeutschePostLufthansa
BASFAdidasMetroVWLindeEONMunichReBayerRWESAPMRKDeutscheTelekomBeiersdorf
Fresenius
HenkelFreseniusMedical
Component 2
Component3
>>[pcs,scrs,variances]=princomp(stocks);
-3 -2 -1 0 1 2 3
-2
0
2
-3
-2
-1
0
1
2
3
Factor model
 Alternative to PCA to improve your components
>>[Lambda,Psi,T,stats,F]=factoran(stocks,3,'rotate','promax);
-1
-0.5
0
0.5
1 -1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
Component 2
DeutscheBank
DaimlerAllianzMAN
ThyssenKrupp
BMWLufthansa
Siemens
DeutschePost
Commerzbank
BASF
Adidas
Linde
MunichRe
MetroHeidelberger
SAP
Bayer
Salzgitter
Infineon
DeutscheBahn
EONRWE
VW
DeutscheTelekom
BeiersdorfMRKFresenius
Henkel
FreseniusMedical
Component 1
Component3
Paring predictors : stepwise optimization Some predictors might be correlated, other irrelevant
 Requires Statistics Toolbox™
>>[coeff,inOut]=stepwisefit(stocks, index);
2007 2008 2009 2010 2011
-0.1
0
0.1
0.2
0.3
Returns
original data
stepwise fit
2007 2008 2009 2010 2011
0.5
1
1.5
Prices
Cloud of randomly generated points
• Each cluster center is randomly chosen inside specified bounds
• Each cluster contains the specified number of points per cluster
• Each cluster point is sampled from a gaussian distribution
• Multidimensionnal dataset
>>clusters = 8; % number of clusters.
>>points = 30; % number of points in each cluster.
>>std_dev = 0.05; % common cluster standard deviation
>>bounds = [0 1]; % bounds for the cluster center
>>[x,vcentroid,proportions,groups] =cluster_generation(bounds,clusters,points,std_dev);
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
Clustering Why clustering ?
o Segment populations into natural subgroups
o Identify outliers
o As a preprocessing method – build separate models on each
• Means
• Hierarchical clustering
• Clustering with neural network (self-organizer map, competitive layer)
• Clustering with K-means nearest neighbours
• Clustering with K-means fuzzy logic
• Clustering using Gaussian mixture models
• Predictors: categorical, ordinal, discontinuous -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Input Vectors
x(1)
x(2)
Hierarchical Cluster Analysis – what is it doing?
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Cutt-off = 0.1
Hierarchical Cluster Analysis – how do I do it ?
• Calculate pairwise distances between points
>> distances = pdist(x)
• Carry out hierarchical cluster analysis
>> tree = linkage(distances)
• Visualise as a dendrogram
>> dendrogram(tree)
• Assign points to clusters
>> assignments = cluster(tree,‘cutoff',0.1)
Assessing the quality of a hierarchical cluster
analysis
• The cophenetic correlation coefficient measures how
closely the length of the tree links match the original
distances between points
• How ‘faithful’ the tree is to the original data
• 0 is poor, 1 is good
>> cophenet(tree,distances)
K-Means Cluster Analysis – what is it doing?
Randomly pick K cluster
centroids
Assign points to the
closest centroid
Recalculate positions of
cluster centroids
Reassign points to the
closest centroid
Recalculate positions of
cluster centroids
Repeat until centroid positions converge
………
K-Means Cluster Analysis – how do I do it ?
Running the K-mean algorithm for K fixed
>> [memberships,centroids] = kmeans(x,K);
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Evaluating a K-Means analysis and choosing K
• Try a range of different K’s, and
compare the point-centroid
distances for each
>> for K=3:15
[clusters,centroids,distances] =
kmeans(data,K);
totaldist(K-2)=sum(distances);
end
plot(3:15,totaldist);
• Create silhouette plots
>> silhouette(data,clusters)
Sidebar: Distance Metrics
• Measures of how similar datapoints are – different
definitions make sense for different data
• Many built-in distance metrics, or define your own
>> doc pdist
>> distances = pdist(data,metric); %pdist = pairwise distances
>> squareform(distances)
>> kmeans(data,k,’distance’,’cityblock’) %not all metrics supported
Euclidean Distance
Default
Cityblock Distance
Useful for discrete variables
Cosine Distance
Useful for clustering variables
Fuzzy c-means Cluster Analysis – what is it doing?
• Very similar to K-means
• Samples are not assigned definitively to a cluster, but
have a ‘membership’ value relative to each cluster
 Requires Fuzzy Logic Toolbox™
 Running the fuzzy K-mean algorithm
for K fixed
>> [centroids, memberships]=fcm(x,K);
Gaussian Mixture Models
• Assume that data is drawn from a fixed number K of normal
distributions
• Fit these parameters using the EM algorithm
>> gmobj = gmdistribution.fit(x,8);
>> assignments = cluster(gmobj,x);
 Plot the probability density
>> ezsurf(@(x,y)pdf(gmobj,[x y]));
0
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0
10
20
Evaluating a Gaussian Mixture Model clustering
• Plot the probability density function of the model
>> ezsurf(@(x,y)pdf(gmobj,[x y]));
• Plot the posterior probabilities of observations
>> p = posterior(gmobj,data);
>> scatter(data(:,1),data(:,2),5,p(:,g)); % Do this for each group g
• Plot the Mahalanobis distances of observations to components
>> m = mahal(gmobj,data);
>> scatter(data(:,1),data(:,2),5,m(:,g)); % Do this for each group g
Choosing the right number of components in a
Gaussian Mixture Model
• Evaluate for a range of K and plot AIC and/or BIC
• AIC (Akaike Information Criterion) and BIC (Bayesian
Information Criterion) are measures of the quality of
the model fit, with a penalty for higher K
>> for K=3:15
gmobj = gmdistribution.fit(data,K);
AIC(K-2) = gmobj.AIC;
end
plot(3:15,AIC);
Neural Networks – what are they?
Input
variables
Weights
Bias
Transfer
function
Output
variable
A two layer
feedforward
network
Build your
architecture
Self Organising Maps Neural Net – what are they?
• Start with a regular grid of
‘neurons’ laid over the dataset
• The size of the grid gives the
number of clusters
• Neurons compete to recognise
datapoints (by being close to
them)
• Winning neurons are moved
closer to the datapoints
• Repeat until convergence
-0.5 0 0.5 1
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
SOM Weight Positions
Weight 1
Weight2
-0.2 0 0.2 0.4 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
SOM Weight Positions
Weight 1
Weight2
Summary: Cluster analysis
No method owns the truth
Use the diagnostic tools to assess your clusters
Beware of local minima : global optimization
Classification
 Why classification ? Can be used to:
o Learning the way to classify from already classified
observations
oClassify new observations
• Means:
• Discriminant analysis classification
• Bootstrapped aggregated decision tree classifier
• Neural network classifier
• Support vector machine classifier
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
Discriminant Analysis – how does it work?
• Fit a multivariate normal density to each class
• linear — Fits a multivariate normal density to each group,
with a pooled estimate of covariance. This is the default.
• diaglinear — Similar to linear, but with a diagonal
covariance matrix estimate (naive Bayes classifiers).
• quadratic — Fits multivariate normal densities with
covariance estimates stratified by group.
• diagquadratic — Similar to quadratic, but with a diagonal
covariance matrix estimate (naive Bayes classifiers).
• Classify a new point by evaluating its probability for
each density function, and classifying to the highest
probability
Discriminant Analysis – how do I do it?
• Linear Discriminant Analysis
>> classes = classify(sample,training,group)
• Quadratic Discriminant Analysis
>> classes = classify(x,x,y,’quadratic’)
• Naïve Bayes
>> nbGau= NaiveBayes.fit(x, y);
>> y_pred= nbGau.predict(x);
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
Interpreting Discriminant Analyses
• Visualise the posterior probability
surfaces
>> [XI,YI] = meshgrid(linspace(4,8),
linspace(2,4.5));
>> X = XI(:); Y = YI(:);
>> [class,err,P] = classify([X Y],
meas(:,1:2), species,'quadratic');
>> for i=1:3
ZI = reshape(P(:,i),100,100);
surf(XI,YI,ZI,'EdgeColor','none');
hold on;
end
Interpreting Discriminant Analyses
• Visualise the probability density
of sample observations
• An indicator of the region in
which the model has support
from training data
>> [XI,YI] = meshgrid(linspace(4,8),
linspace(2,4.5));
>> X = XI(:); Y = YI(:);
>> [class,err,P,logp] = classify([X Y],
meas(:,1:2), species, 'quadratic');
>> ZI = reshape(logp,100,100);
>> surf(XI,YI,ZI,'EdgeColor','none');
Classifying K-Nearest Neigbours – what does it do?
• One of the simplest classifiers – a sample is classified
by taking the K nearest points from the training set,
and choosing the majority class of those K points
• There is no real training phase – all the work is done
during the application of the model
>> classes =
knnclassify(sample,training,group,K)
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
Decision Trees – how do they work?
• Threshold value for a variable
that partitions the dataset
• Threshold for all predictors
• Resulting model is a tree where
each node is a logical test on a
predictor (var1<thresh1,
var2>thresh2)
Decision Trees – how do I build them ?
• Build tree model
>> tree = classregtree(x,y);
>> view(tree)
• Evaluate the model on new data
>> tree(x_new)
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
Enhancing the model : bagged trees
• Prune the decision tree
>> [cost,secost,ntnodes,bestlevel] =test(t, 'test', x, y);
>> topt = prune(t, 'level', bestlevel);
• Bootstrapped aggregated trees forest
>> [cost,secost,ntnodes,bestlevel] =test(t, 'test', x, y);
>> forest = TreeBagger(100, x, y);
>> y_pred = predict(forest,x);
• Visualise class boundaries as before
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
Pattern Recognition Neural Network– what are
they?
• Two-layer (i.e. one-hidden-layer) feed forward neural
networks can learn any input-output relationship
given enough neurons in the hidden layer.
• No restrictions on the predictors
Pattern Recognition Neural Network– how do I
build them ?
• Build a neural network model
>> net = patternnet(10);
• Train the net to classify
observations
>> [net,tr] = train(net,x,y);
• Apply the model to new data
>> y_pred = net(x);
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
x1
x2
1
2
3
4
5
6
7
8
Support Vector Machines – what are they?
• The SVM algorithm finds a boundary between the classes
that maximises the minimum distance of the boundary
to any of the points
• No restrictions on the predictors
• 1 vs all to classify multiple classes
Support Vector Machines – how do I build them ?
• Build an SVM model
>> svmmodel = svmtrain(x,y)
• Try different kernel functions
>> svmmodel =
svmtrain(x,y,’kernel_function’,’rbf’)
• Apply the model to new data
>> classes =
svmclassify(svmmodel,x_new);
-1
0
1
2
3
4
1
2
Support Vectors
Evaluating a Classifying Model
• Three main strategies
• Resubstitution – test the model on the same data that you
trained it with
• Cross-Validation
• Holdout Test on a completely new dataset
• Use cross-validation to evaluate model parameters such as the number of leaf
for a tree or the number of hidden neurons.
 Apply cross validation to your classifying model
>> cp = cvpartition(y,'k',10);
>> ldaFun= @(xtrain,ytrain,xtest)(classify(xtest,xtrain,ytrain));
>> ldaCVErr = crossval('mcr',x,y,'predfun',ldaFun,'partition',cp)
Summary: Classification algorithms
No absolute best methods
Simple does not mean inefficient
Decision trees produce models and neural network overfit the
noise : use bootstrapping and cross-validation
Parallelize
Regression
Why Regression ? Can be used to:
oLearn to model a continuous response from observations
oPredict the response for new observations
• Means:
• Linear regressions
• Non-linear regressions
• Bootstrapped regression tree
• Neural network as a fitting tool
New data set with a continuous response from one
predictor
• Non-linear function to fit
• A continuous response to fit from one continuous predictor
>>[x,t] = simplefit_dataset;
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
Linear Regression – what is it?
• A collection of methods that
find the best coefficients b
such that y ≈ X*b
• Best b means minimising
the least squares difference
between the predicted and
actual values of y
• “Linear” means linear in b –
you can include extra
variables to give a nonlinear
relationship in X
Linear Regression – how do I do it ?
>> b = xy
• Linear Regression
>> b = regress(y, [ones(size(X,1),1) x])
>> stats = regstats(y, [ones(size(x,1),1) x])
• Robust Regression – better in the presence of outliers
>> robust_b = robustfit(X,y) %NB (X,y) not (y,X)
• Ridge Regression – better if data is close to collinear
>> ridge_b = ridge(y,X,k) %k is the ridge parameter
• Apply the model to new data
>> y = newdata*b;
Interpreting a linear regression model
• Examine coefficients to see
which predictors have a large
effect on the response
>> [b,bint,r,rint,stats]=regress(y,X)
>> errorbar(1:size(b,1),b, b-
bint(:,1),bint(:,2)-b)
• Examine residuals to check for
possible outliers
>> rcoplot(r,rint)
• Examine R2 statistic and p-
value to check overall model
significance
>> stats(1)*100 %R2 as a percentage
>> stats(3) %p-value
• Additional diagnostics with
regstats
Non linear curve fitting
Least square algorithm
>> model = @(b,x)(b(1)+b(2).*cos(b(3)*x+b(4))+b(5).*cos(b(6)*x+b(7))+b(8).*cos(b(9)*x+b(10)));
>> [ahat,r,J,cov,mse] = nlinfit(x,t,model,a0);
0 1 2 3 4 5 6 7 8 9 10
-5
0
5
10
15
0 10 20 30 40 50 60 70 80 90 100
0
0.05
0.1
0.15
0.2
Fit Neural Network– what are they?
• Fitting networks are feedforward neural networks used to fit
an input-output relationship.
• This architecture can learn any input-output relationship given
enough neurons.
• No restrictions on the predictors
(categorical,ordinal,discontinuous)
Fit Neural Network– how do I build them ?
• Build a fit neural net model
>> net = fitnet(10);
• Train the net to fit the target
>> [net,tr] = train(net,x,t);
• Apply the model to new data
>> y_pred = net(x);
0 1 2 3 4 5 6 7 8 9
-2
0
2
4
6
8
10
12
Function Fit for Output Element 1
OutputandTarget
-0.02
0
0.02
0.04
Error
Input
Targets
Outputs
Errors
Fit
Targets - Outputs
Regression trees– what are they?
• A decision tree with binary splits for regression. An object
of class RegressionTree can predict responses for new data
with the predict method.
• No restrictions on the predictors
(categorical,ordinal,discontinuous)
Regression trees – how do I use them?
• Build a fit neural net model
>> rtree = RegressionTree.fit(x,t);
• Train the net to fit the target
>> y_tree = predict(rtree,x);
• Apply the model to new data
>> y_pred = net(x);
0 1 2 3 4 5 6 7 8 9 10
0
5
10
0 10 20 30 40 50 60 70 80 90 100
0
0.5
1
1.5
x 10
-15
Summary
Data Mining
Exploration
Univariate
Pie chart,
Histogram, etc…
Multivariate
Feature
selection and
transformation
Modelling
Clustering
Partitive
K-means
Gaussian
mixture model
SOMHierarchical
Classification
Discriminant
Decision Tree
Neural Network
Support Vector
Machine
Regression

More Related Content

What's hot

林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 

What's hot (20)

Capitalico / Chart Pattern Matching in Financial Trading Using RNN
Capitalico / Chart Pattern Matching in Financial Trading Using RNNCapitalico / Chart Pattern Matching in Financial Trading Using RNN
Capitalico / Chart Pattern Matching in Financial Trading Using RNN
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Machine learning the next revolution or just another hype
Machine learning   the next revolution or just another hypeMachine learning   the next revolution or just another hype
Machine learning the next revolution or just another hype
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Machine learning on Hadoop data lakes
Machine learning on Hadoop data lakesMachine learning on Hadoop data lakes
Machine learning on Hadoop data lakes
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Neural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for PhysicistsNeural Networks and Deep Learning for Physicists
Neural Networks and Deep Learning for Physicists
 
Google Big Data Expo
Google Big Data ExpoGoogle Big Data Expo
Google Big Data Expo
 

Viewers also liked

The ai app - introduction
The ai app - introductionThe ai app - introduction
The ai app - introduction
bigbamnetwork
 
Presentation business analytics in finance 16 9-2014
Presentation business analytics in finance 16 9-2014Presentation business analytics in finance 16 9-2014
Presentation business analytics in finance 16 9-2014
GuyVanderSande
 
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Ian Lumb
 

Viewers also liked (20)

Machine learning use cases in finance
Machine learning use cases in financeMachine learning use cases in finance
Machine learning use cases in finance
 
Using Machine Learning & AI to Enhance Fraud Detection
Using Machine Learning & AI to Enhance Fraud DetectionUsing Machine Learning & AI to Enhance Fraud Detection
Using Machine Learning & AI to Enhance Fraud Detection
 
Machine Learning in Customer Analytics
Machine Learning in Customer AnalyticsMachine Learning in Customer Analytics
Machine Learning in Customer Analytics
 
Financial security and machine learning
Financial security and machine learningFinancial security and machine learning
Financial security and machine learning
 
Big data &amp; analytics for banking new york lars hamberg
Big data &amp; analytics for banking new york   lars hambergBig data &amp; analytics for banking new york   lars hamberg
Big data &amp; analytics for banking new york lars hamberg
 
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock markets
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Detecting Financial Danger Zones with Machine Learning - Marika Vezzoli. Dece...
Detecting Financial Danger Zones with Machine Learning - Marika Vezzoli. Dece...Detecting Financial Danger Zones with Machine Learning - Marika Vezzoli. Dece...
Detecting Financial Danger Zones with Machine Learning - Marika Vezzoli. Dece...
 
The ai app - introduction
The ai app - introductionThe ai app - introduction
The ai app - introduction
 
Equity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learningEquity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learning
 
SAM: Sympathetic AI Messenger bot &lt;/violence> hackathon 2016
SAM: Sympathetic AI Messenger bot &lt;/violence> hackathon 2016SAM: Sympathetic AI Messenger bot &lt;/violence> hackathon 2016
SAM: Sympathetic AI Messenger bot &lt;/violence> hackathon 2016
 
Attribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleAttribution Modeling and Big Data, Google
Attribution Modeling and Big Data, Google
 
Machine Learning as a Service
Machine Learning as a ServiceMachine Learning as a Service
Machine Learning as a Service
 
Presentation business analytics in finance 16 9-2014
Presentation business analytics in finance 16 9-2014Presentation business analytics in finance 16 9-2014
Presentation business analytics in finance 16 9-2014
 
Behavioral Analytics for Financial Intelligence
Behavioral Analytics for Financial IntelligenceBehavioral Analytics for Financial Intelligence
Behavioral Analytics for Financial Intelligence
 
Enterprise Content Search Paradigms
Enterprise Content Search ParadigmsEnterprise Content Search Paradigms
Enterprise Content Search Paradigms
 
GeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and AkkaGeekNight 22.0 Multi-paradigm programming in Scala and Akka
GeekNight 22.0 Multi-paradigm programming in Scala and Akka
 
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
 
Developing for Hybrid Cloud with Bluemix
Developing for Hybrid Cloud with BluemixDeveloping for Hybrid Cloud with Bluemix
Developing for Hybrid Cloud with Bluemix
 

Similar to Machine learning for_finance

Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Maxim Kazantsev
 

Similar to Machine learning for_finance (20)

Skytree big data london meetup - may 2013
Skytree   big data london meetup - may 2013Skytree   big data london meetup - may 2013
Skytree big data london meetup - may 2013
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Computational decision making
Computational decision makingComputational decision making
Computational decision making
 
Clustering
ClusteringClustering
Clustering
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Data mining
Data mining Data mining
Data mining
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

 
iiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdfiiit delhi unsupervised pdf.pdf
iiit delhi unsupervised pdf.pdf
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
My8clst
My8clstMy8clst
My8clst
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 

More from Stefan Duprey

More from Stefan Duprey (16)

Dynamical smart liquidity on decentralized exchanges for lucrative market making
Dynamical smart liquidity on decentralized exchanges for lucrative market makingDynamical smart liquidity on decentralized exchanges for lucrative market making
Dynamical smart liquidity on decentralized exchanges for lucrative market making
 
Smart systematic short strangle
Smart systematic short strangleSmart systematic short strangle
Smart systematic short strangle
 
Short Term Intraday Long Only Crypto Strategies
Short Term Intraday Long Only Crypto StrategiesShort Term Intraday Long Only Crypto Strategies
Short Term Intraday Long Only Crypto Strategies
 
On Chain Weekly Rebal Low Expo Strategy
On Chain Weekly Rebal Low Expo StrategyOn Chain Weekly Rebal Low Expo Strategy
On Chain Weekly Rebal Low Expo Strategy
 
Stable_pool_optimal_allocation.pdf
Stable_pool_optimal_allocation.pdfStable_pool_optimal_allocation.pdf
Stable_pool_optimal_allocation.pdf
 
Curve_fairness_IOUs.pdf
Curve_fairness_IOUs.pdfCurve_fairness_IOUs.pdf
Curve_fairness_IOUs.pdf
 
Financial quantitative strategies using artificial intelligence
Financial quantitative strategies using artificial intelligenceFinancial quantitative strategies using artificial intelligence
Financial quantitative strategies using artificial intelligence
 
Intraday news event_study
Intraday news event_studyIntraday news event_study
Intraday news event_study
 
Multi risk factor model
Multi risk factor model Multi risk factor model
Multi risk factor model
 
Impact best bid/ask limit order execution
Impact best bid/ask limit order executionImpact best bid/ask limit order execution
Impact best bid/ask limit order execution
 
Optimal order execution
Optimal order executionOptimal order execution
Optimal order execution
 
A new axisymmetric finite element
A new axisymmetric finite elementA new axisymmetric finite element
A new axisymmetric finite element
 
Page rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commercePage rank optimization to push successful URLs or products for e-commerce
Page rank optimization to push successful URLs or products for e-commerce
 
Compounded autoregressive processes for Credit Risk modelling
Compounded autoregressive processes for Credit Risk modellingCompounded autoregressive processes for Credit Risk modelling
Compounded autoregressive processes for Credit Risk modelling
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Algorithmic trading
Algorithmic tradingAlgorithmic trading
Algorithmic trading
 

Recently uploaded

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 

Recently uploaded (20)

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 

Machine learning for_finance

  • 2. Statistical learning scope Data Mining Exploration Univariate Pie chart, Histogram, etc… Multivariate Feature selection and transformation Modelling Clustering Partitive K-means Gaussian mixture model SOMHierarchical Classification Discriminant Decision Tree Neural Network Support Vector Machine Regression
  • 4. Decision rule for Support Vector Machines
  • 7. SVM summary  avoid the plague of local minima  the engineer’s expertise is in the appropriate kernel (beware of overfitting, cross-validate and experiment your own kernels)  only classify between 2 class (one vs all or one vs one methodology)  a reference in use cases in computer vision, bio informatics
  • 8. Neural Network : what are they ?
  • 9. Neural Network summary Gradient descent algorithm : stochastic, mini- batch, conjugate plague of local minima : difficult to calibrate  the engineer’s expertise is in the appropriate architecture (beware of overfitting, cross- validate and experiment your own architecture ‘deeper learning’)
  • 10. >> t = classregtree(X,Y); >> Y_pred = t(X_new); Regression Trees
  • 11. Forests of Trees predictors up down down up up up down up down up up . . . response Y >> t = TreeBagger(nb_trees,X,Y); >> [Y_pred,allpred] = predict(t,X_new);
  • 12. Splitting criteria : information gain
  • 13. Why a regression and what is a regression ? A regression is a model to explain and predict a process : supervised machine learning
  • 14. Why regularizing ?• Terms are correlated • The regression matrix becomes close to singular • Badly conditioned matrix yield poor numerical results • Bayesian interpretation Likelihood Regularisation term Posterior Prior We rather minimize
  • 15. Why Lasso and Elastic Net?• No method owns the truth • Reduce the number of predictors in a regression model • Identify important predictors • Select among redundant predictors • Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares (cross validation) Lasso : Elastic Net :
  • 16. Ensemble learning Why ensemble learning ? ‘melding results from many weak learners into one high- quality ensemble predictor’
  • 17. Main differences between Bagging and Boosting BAGGING BOOSTING Bagging is randomness Boosting is adaptative and deterministic Bootstrapped sample Complete initial sample Each model must perform well over the whole sample Each model has to perform better than the previous one on outliers Every model have the same weight Models are weighted according to their performance Defining features Advantages and disadvantages BAGGING BOOSTING Reducing model variance Variance might rise Not a simple model anymore Not a simple model anymore Can be parallelized Can not be parallelized Less noise over fitting : better than boosting when noise Models are weighted according to their performance Bagging is usually efficienter than boosting On specific cases, boosting might achieve a far better accuracy
  • 19. Distributed memory : MDCS & the MAP/REDUCE paradigm
  • 20. Big data & Machine learning “It’s not who has the best algorithm that wins . It’s who has the most data”
  • 22. Aims of this presentation  awareness of the range of methods for multivariate data  reasonable understanding of algorithms
  • 23. Data Mining • Exploratory Data Analysis • Clustering • Classification • Regression -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8 • Categorical • Ordinal • Discontinuous
  • 24. Exploratory Data Analysis  Why exploratory analysis ? Can be used to: o Graphical view o “Pre filtering”: preliminary data trends and behaviour • Means: • Multivariate Plots • Features transformation : principal component analysis, factor model • Features selection : stepwise optimization
  • 25. Data Exploration: Getting an overview of individual variables Basic Histogram >> hist(x(:,1)) Custom Number of Bins >> hist(x(:,1),50) By Group >> hist(byGroup,20) Gaussian fit >> histfit(x(:,2)) 3D Histogram >> hist3(x(:,1:2)) Scatter Plot >>gscatter(x(:,1),x(:,2),groups) Pie Chart >> pie3(proportions,groups) >> X = [MPG,Acceleration,Displacement,Weight,Horsepower]; Box Plot >> boxplot(x(:,1),groups) 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 6 8 10 12 14 16 18 20 22 24 26 0 10 20 30 40 50 60 5 10 15 20 25 30 35 40 45 50 8 10 12 14 16 18 20 22 24 26 3 4 5 6 8 10 15 20 25 30 35 40 45 3 4 5 6 8 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 byGroup(:,1) byGroup(:,2) Group6 Group5 Group8 Group3 Group4
  • 26. Data Exploration: Getting an overview of multiple variables Plot Matrix by Group >> gplotmatrix(x,x,groups) Parallel Coordinates Plot >> parallelcoords(x,'Group',groups) Andrews’ Plot >> andrewsplot(x,'Group',groups) Glyph Plot >> glyphplot(x) Chernoff Faces >> glyphplot(x,'Glyph','face') MPG Acceleration Displacement Weight Horsepow er MPGAccelerationDisplacementWeightHorsepower 50 1001502002000 4000200 40010 2020 40 50 100 150 200 2000 4000 200 400 10 20 20 40 MPG Acceleration Displacement Weight Horsepower -3 -2 -1 0 1 2 3 4 CoordinateValue 4 6 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -8 -6 -4 -2 0 2 4 6 8 t f(t) 4 6 8 chevrolet chevelle malibu buick skylark 320 plymouth satellite amc rebel sst ford torino ford galaxie 500 chevrolet impala plymouth fury iii pontiac catalina chevrolet chevelle malibubuick skylark 320 plymouth satellite amc rebel sst ford torino ford galaxie 500 chevrolet impala plymouth fury iii pontiac catalina
  • 27. Principal component analysis 1 2 3 4 5 6 7 8 9 10 0 0.005 0.01 0.015 0.02 0.0249 Principal Component VarianceExplained(%) 0% 20% 40% 60% 80% 100% -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Component 1 CommerzbankDeutscheBank Infineon ThyssenKruppMANDaimlerHeidelbergerAllianzDeutscheBahnBMWSalzgitterSiemensDeutschePostLufthansa BASFAdidasMetroVWLindeEONMunichReBayerRWESAPMRKDeutscheTelekomBeiersdorf Fresenius HenkelFreseniusMedical Component 2 Component3 >>[pcs,scrs,variances]=princomp(stocks); -3 -2 -1 0 1 2 3 -2 0 2 -3 -2 -1 0 1 2 3
  • 28. Factor model  Alternative to PCA to improve your components >>[Lambda,Psi,T,stats,F]=factoran(stocks,3,'rotate','promax); -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Component 2 DeutscheBank DaimlerAllianzMAN ThyssenKrupp BMWLufthansa Siemens DeutschePost Commerzbank BASF Adidas Linde MunichRe MetroHeidelberger SAP Bayer Salzgitter Infineon DeutscheBahn EONRWE VW DeutscheTelekom BeiersdorfMRKFresenius Henkel FreseniusMedical Component 1 Component3
  • 29. Paring predictors : stepwise optimization Some predictors might be correlated, other irrelevant  Requires Statistics Toolbox™ >>[coeff,inOut]=stepwisefit(stocks, index); 2007 2008 2009 2010 2011 -0.1 0 0.1 0.2 0.3 Returns original data stepwise fit 2007 2008 2009 2010 2011 0.5 1 1.5 Prices
  • 30. Cloud of randomly generated points • Each cluster center is randomly chosen inside specified bounds • Each cluster contains the specified number of points per cluster • Each cluster point is sampled from a gaussian distribution • Multidimensionnal dataset >>clusters = 8; % number of clusters. >>points = 30; % number of points in each cluster. >>std_dev = 0.05; % common cluster standard deviation >>bounds = [0 1]; % bounds for the cluster center >>[x,vcentroid,proportions,groups] =cluster_generation(bounds,clusters,points,std_dev); -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8
  • 31. Clustering Why clustering ? o Segment populations into natural subgroups o Identify outliers o As a preprocessing method – build separate models on each • Means • Hierarchical clustering • Clustering with neural network (self-organizer map, competitive layer) • Clustering with K-means nearest neighbours • Clustering with K-means fuzzy logic • Clustering using Gaussian mixture models • Predictors: categorical, ordinal, discontinuous -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Input Vectors x(1) x(2)
  • 32. Hierarchical Cluster Analysis – what is it doing? -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cutt-off = 0.1
  • 33. Hierarchical Cluster Analysis – how do I do it ? • Calculate pairwise distances between points >> distances = pdist(x) • Carry out hierarchical cluster analysis >> tree = linkage(distances) • Visualise as a dendrogram >> dendrogram(tree) • Assign points to clusters >> assignments = cluster(tree,‘cutoff',0.1)
  • 34. Assessing the quality of a hierarchical cluster analysis • The cophenetic correlation coefficient measures how closely the length of the tree links match the original distances between points • How ‘faithful’ the tree is to the original data • 0 is poor, 1 is good >> cophenet(tree,distances)
  • 35. K-Means Cluster Analysis – what is it doing? Randomly pick K cluster centroids Assign points to the closest centroid Recalculate positions of cluster centroids Reassign points to the closest centroid Recalculate positions of cluster centroids Repeat until centroid positions converge ………
  • 36. K-Means Cluster Analysis – how do I do it ? Running the K-mean algorithm for K fixed >> [memberships,centroids] = kmeans(x,K); -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 37. Evaluating a K-Means analysis and choosing K • Try a range of different K’s, and compare the point-centroid distances for each >> for K=3:15 [clusters,centroids,distances] = kmeans(data,K); totaldist(K-2)=sum(distances); end plot(3:15,totaldist); • Create silhouette plots >> silhouette(data,clusters)
  • 38. Sidebar: Distance Metrics • Measures of how similar datapoints are – different definitions make sense for different data • Many built-in distance metrics, or define your own >> doc pdist >> distances = pdist(data,metric); %pdist = pairwise distances >> squareform(distances) >> kmeans(data,k,’distance’,’cityblock’) %not all metrics supported Euclidean Distance Default Cityblock Distance Useful for discrete variables Cosine Distance Useful for clustering variables
  • 39. Fuzzy c-means Cluster Analysis – what is it doing? • Very similar to K-means • Samples are not assigned definitively to a cluster, but have a ‘membership’ value relative to each cluster  Requires Fuzzy Logic Toolbox™  Running the fuzzy K-mean algorithm for K fixed >> [centroids, memberships]=fcm(x,K);
  • 40. Gaussian Mixture Models • Assume that data is drawn from a fixed number K of normal distributions • Fit these parameters using the EM algorithm >> gmobj = gmdistribution.fit(x,8); >> assignments = cluster(gmobj,x);  Plot the probability density >> ezsurf(@(x,y)pdf(gmobj,[x y])); 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0 10 20
  • 41. Evaluating a Gaussian Mixture Model clustering • Plot the probability density function of the model >> ezsurf(@(x,y)pdf(gmobj,[x y])); • Plot the posterior probabilities of observations >> p = posterior(gmobj,data); >> scatter(data(:,1),data(:,2),5,p(:,g)); % Do this for each group g • Plot the Mahalanobis distances of observations to components >> m = mahal(gmobj,data); >> scatter(data(:,1),data(:,2),5,m(:,g)); % Do this for each group g
  • 42. Choosing the right number of components in a Gaussian Mixture Model • Evaluate for a range of K and plot AIC and/or BIC • AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are measures of the quality of the model fit, with a penalty for higher K >> for K=3:15 gmobj = gmdistribution.fit(data,K); AIC(K-2) = gmobj.AIC; end plot(3:15,AIC);
  • 43. Neural Networks – what are they? Input variables Weights Bias Transfer function Output variable A two layer feedforward network Build your architecture
  • 44. Self Organising Maps Neural Net – what are they? • Start with a regular grid of ‘neurons’ laid over the dataset • The size of the grid gives the number of clusters • Neurons compete to recognise datapoints (by being close to them) • Winning neurons are moved closer to the datapoints • Repeat until convergence -0.5 0 0.5 1 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 SOM Weight Positions Weight 1 Weight2 -0.2 0 0.2 0.4 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SOM Weight Positions Weight 1 Weight2
  • 45. Summary: Cluster analysis No method owns the truth Use the diagnostic tools to assess your clusters Beware of local minima : global optimization
  • 46. Classification  Why classification ? Can be used to: o Learning the way to classify from already classified observations oClassify new observations • Means: • Discriminant analysis classification • Bootstrapped aggregated decision tree classifier • Neural network classifier • Support vector machine classifier -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Group1 Group2 Group3 Group4 Group5 Group6 Group7 Group8
  • 47. Discriminant Analysis – how does it work? • Fit a multivariate normal density to each class • linear — Fits a multivariate normal density to each group, with a pooled estimate of covariance. This is the default. • diaglinear — Similar to linear, but with a diagonal covariance matrix estimate (naive Bayes classifiers). • quadratic — Fits multivariate normal densities with covariance estimates stratified by group. • diagquadratic — Similar to quadratic, but with a diagonal covariance matrix estimate (naive Bayes classifiers). • Classify a new point by evaluating its probability for each density function, and classifying to the highest probability
  • 48. Discriminant Analysis – how do I do it? • Linear Discriminant Analysis >> classes = classify(sample,training,group) • Quadratic Discriminant Analysis >> classes = classify(x,x,y,’quadratic’) • Naïve Bayes >> nbGau= NaiveBayes.fit(x, y); >> y_pred= nbGau.predict(x); -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 -0.5 0 0.5 1 1.5 x1 x2 group1 group2 group3 group4 group5 group6 group7 group8 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 -0.5 0 0.5 1 1.5 x1 x2 group1 group2 group3 group4 group5 group6 group7 group8
  • 49. Interpreting Discriminant Analyses • Visualise the posterior probability surfaces >> [XI,YI] = meshgrid(linspace(4,8), linspace(2,4.5)); >> X = XI(:); Y = YI(:); >> [class,err,P] = classify([X Y], meas(:,1:2), species,'quadratic'); >> for i=1:3 ZI = reshape(P(:,i),100,100); surf(XI,YI,ZI,'EdgeColor','none'); hold on; end
  • 50. Interpreting Discriminant Analyses • Visualise the probability density of sample observations • An indicator of the region in which the model has support from training data >> [XI,YI] = meshgrid(linspace(4,8), linspace(2,4.5)); >> X = XI(:); Y = YI(:); >> [class,err,P,logp] = classify([X Y], meas(:,1:2), species, 'quadratic'); >> ZI = reshape(logp,100,100); >> surf(XI,YI,ZI,'EdgeColor','none');
  • 51. Classifying K-Nearest Neigbours – what does it do? • One of the simplest classifiers – a sample is classified by taking the K nearest points from the training set, and choosing the majority class of those K points • There is no real training phase – all the work is done during the application of the model >> classes = knnclassify(sample,training,group,K) -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 -0.5 0 0.5 1 1.5 x1 x2 group1 group2 group3 group4 group5 group6 group7 group8
  • 52. Decision Trees – how do they work? • Threshold value for a variable that partitions the dataset • Threshold for all predictors • Resulting model is a tree where each node is a logical test on a predictor (var1<thresh1, var2>thresh2)
  • 53. Decision Trees – how do I build them ? • Build tree model >> tree = classregtree(x,y); >> view(tree) • Evaluate the model on new data >> tree(x_new) -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 -0.5 0 0.5 1 1.5 x1 x2 group1 group2 group3 group4 group5 group6 group7 group8
  • 54. Enhancing the model : bagged trees • Prune the decision tree >> [cost,secost,ntnodes,bestlevel] =test(t, 'test', x, y); >> topt = prune(t, 'level', bestlevel); • Bootstrapped aggregated trees forest >> [cost,secost,ntnodes,bestlevel] =test(t, 'test', x, y); >> forest = TreeBagger(100, x, y); >> y_pred = predict(forest,x); • Visualise class boundaries as before -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 -0.5 0 0.5 1 1.5 x1 x2 group1 group2 group3 group4 group5 group6 group7 group8
  • 55. Pattern Recognition Neural Network– what are they? • Two-layer (i.e. one-hidden-layer) feed forward neural networks can learn any input-output relationship given enough neurons in the hidden layer. • No restrictions on the predictors
  • 56. Pattern Recognition Neural Network– how do I build them ? • Build a neural network model >> net = patternnet(10); • Train the net to classify observations >> [net,tr] = train(net,x,y); • Apply the model to new data >> y_pred = net(x); 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x1 x2 1 2 3 4 5 6 7 8
  • 57. Support Vector Machines – what are they? • The SVM algorithm finds a boundary between the classes that maximises the minimum distance of the boundary to any of the points • No restrictions on the predictors • 1 vs all to classify multiple classes
  • 58. Support Vector Machines – how do I build them ? • Build an SVM model >> svmmodel = svmtrain(x,y) • Try different kernel functions >> svmmodel = svmtrain(x,y,’kernel_function’,’rbf’) • Apply the model to new data >> classes = svmclassify(svmmodel,x_new); -1 0 1 2 3 4 1 2 Support Vectors
  • 59. Evaluating a Classifying Model • Three main strategies • Resubstitution – test the model on the same data that you trained it with • Cross-Validation • Holdout Test on a completely new dataset • Use cross-validation to evaluate model parameters such as the number of leaf for a tree or the number of hidden neurons.  Apply cross validation to your classifying model >> cp = cvpartition(y,'k',10); >> ldaFun= @(xtrain,ytrain,xtest)(classify(xtest,xtrain,ytrain)); >> ldaCVErr = crossval('mcr',x,y,'predfun',ldaFun,'partition',cp)
  • 60. Summary: Classification algorithms No absolute best methods Simple does not mean inefficient Decision trees produce models and neural network overfit the noise : use bootstrapping and cross-validation Parallelize
  • 61. Regression Why Regression ? Can be used to: oLearn to model a continuous response from observations oPredict the response for new observations • Means: • Linear regressions • Non-linear regressions • Bootstrapped regression tree • Neural network as a fitting tool
  • 62. New data set with a continuous response from one predictor • Non-linear function to fit • A continuous response to fit from one continuous predictor >>[x,t] = simplefit_dataset; 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
  • 63. Linear Regression – what is it? • A collection of methods that find the best coefficients b such that y ≈ X*b • Best b means minimising the least squares difference between the predicted and actual values of y • “Linear” means linear in b – you can include extra variables to give a nonlinear relationship in X
  • 64. Linear Regression – how do I do it ? >> b = xy • Linear Regression >> b = regress(y, [ones(size(X,1),1) x]) >> stats = regstats(y, [ones(size(x,1),1) x]) • Robust Regression – better in the presence of outliers >> robust_b = robustfit(X,y) %NB (X,y) not (y,X) • Ridge Regression – better if data is close to collinear >> ridge_b = ridge(y,X,k) %k is the ridge parameter • Apply the model to new data >> y = newdata*b;
  • 65. Interpreting a linear regression model • Examine coefficients to see which predictors have a large effect on the response >> [b,bint,r,rint,stats]=regress(y,X) >> errorbar(1:size(b,1),b, b- bint(:,1),bint(:,2)-b) • Examine residuals to check for possible outliers >> rcoplot(r,rint) • Examine R2 statistic and p- value to check overall model significance >> stats(1)*100 %R2 as a percentage >> stats(3) %p-value • Additional diagnostics with regstats
  • 66. Non linear curve fitting Least square algorithm >> model = @(b,x)(b(1)+b(2).*cos(b(3)*x+b(4))+b(5).*cos(b(6)*x+b(7))+b(8).*cos(b(9)*x+b(10))); >> [ahat,r,J,cov,mse] = nlinfit(x,t,model,a0); 0 1 2 3 4 5 6 7 8 9 10 -5 0 5 10 15 0 10 20 30 40 50 60 70 80 90 100 0 0.05 0.1 0.15 0.2
  • 67. Fit Neural Network– what are they? • Fitting networks are feedforward neural networks used to fit an input-output relationship. • This architecture can learn any input-output relationship given enough neurons. • No restrictions on the predictors (categorical,ordinal,discontinuous)
  • 68. Fit Neural Network– how do I build them ? • Build a fit neural net model >> net = fitnet(10); • Train the net to fit the target >> [net,tr] = train(net,x,t); • Apply the model to new data >> y_pred = net(x); 0 1 2 3 4 5 6 7 8 9 -2 0 2 4 6 8 10 12 Function Fit for Output Element 1 OutputandTarget -0.02 0 0.02 0.04 Error Input Targets Outputs Errors Fit Targets - Outputs
  • 69. Regression trees– what are they? • A decision tree with binary splits for regression. An object of class RegressionTree can predict responses for new data with the predict method. • No restrictions on the predictors (categorical,ordinal,discontinuous)
  • 70. Regression trees – how do I use them? • Build a fit neural net model >> rtree = RegressionTree.fit(x,t); • Train the net to fit the target >> y_tree = predict(rtree,x); • Apply the model to new data >> y_pred = net(x); 0 1 2 3 4 5 6 7 8 9 10 0 5 10 0 10 20 30 40 50 60 70 80 90 100 0 0.5 1 1.5 x 10 -15
  • 71. Summary Data Mining Exploration Univariate Pie chart, Histogram, etc… Multivariate Feature selection and transformation Modelling Clustering Partitive K-means Gaussian mixture model SOMHierarchical Classification Discriminant Decision Tree Neural Network Support Vector Machine Regression