Intelligent Analysis of Environmental Data

International Workshop:
Intelligent Analysis of Environmental Data

Institute of Geomatics and
Analysis of Risk (IGAR)
University of Lausanne,
Switzerland

Prof. Mikhail Kanevski

M. Kanevski, Palermo 2009 1

Comments and questions to:
• Mikhail.Kanevski@unil.ch
– www.unil.ch/igar
– www.geokernels.org


General Introduction
Typical problems
Approaches
Solutions
Future research


Geo- and Environmental Data
(classes, continuous, images, networks, geomanifolds,…)

• Spatio-temporal
• Multi-scale
• Multivariate
• Highly variable at many scales
• High-dimensional geo-feature spaces
• Uncertainties
• ………….

• In some cases we do have science-based
models: data/knowledge/models integration


Spatio-temporal data in terms of
patterns/structures:

a. pattern recognition (pattern
discovery, pattern extraction),
b. pattern modelling,
c. pattern prediction


Main Topics:
• Review and posing of typical problems.
• From “numbers” to data
• Collection of data: Monitoring networks and data
representativity? Monitoring network optimisation.
• Get more information value from your data –
EXPLORE ! Exploratory spatio-temporal data
analysis (EDA, ESDA).
• Predictions/estimations or simulations? Risk
analysis and mapping
• Let data speak for themselves: learning from data.
Data mining, Machine learning.


Methods:
• Monitoring networks descriptions
• Geostatistics: predictions/simulations
• Machine Learning(neural nets, SLT):
– Neural networks: MLP, PNN, GRNN, RBF, SOM.
ANNEX models. Hybrid models
– Support Vector Machines
• Recent trends in geostatistics: Multiple-points
geostatistics, pattern based geostatistics.
• Bayesian approach for uncertainty assessment,
integration of data and science-based models
(Bayesian Maximum Entropy)


Spatial data analysis: typical tasks
• Predict a value at a given point.
• Build a map (isolines, 3D surfaces,..).
• Estimate prediction error.
• Take into account measurement errors.
• Risk mapping: Uncertainty mapping around unknown
value. Estimate the probability of exceeding of a
given/decision level.
• Joint predictions of several variables (improve
predictions on primary variable using auxiliary data and
information).
• Optimization of monitoring network (design/ redesign)
• Simulations: modelling of spatial uncertainty and
variability
• Data/Science-based models assimilation/fusion
• Image analysis. Remote sensing
• Spatio-temporal events (forest fires, epidemiology,
crime,…)
• Predictions/simulations in high dimensional spaces
• ………………………………………..


Generic Methodology
Data Base
DATA
Management System
Statistical Quick Monitoring
Description Visualisation Network Analysis

Variography Deterministic Monitoring
Interpolations Network
Cross-validation Generation

Machine Learning
Geostatistical
Algorithms
Predictions & Simulations
Decision-oriented Mapping GIS,
M. Kanevski, Palermo 2009
Remote Sensing
9

GEOSTATISTICAL ANALYSIS
• Basic/Naïve statistical analysis. EDA
• ESDA (regionalized EDA)
• Structural analysis. Spatial correlation analysis
(variography)
• Model selection: Cross-validation, jack-knife,…
• Prediction and error mapping for decision
making (family of kriging models)
• Probability and Risk mapping. Conditional
stochastic simulations


Some Geostatistics
• Exploration of spatial correlations

• Family of kriging models (simple, ordinary,
disjunctive, indicator,…)

• Conditional Stochastic Simulations


Briansk region (radioactivity, Cs137)


Heavy metals, Japan


Switzerland, indoor radon


Measures to characterise MN

• Topological
• Statistical
• Fractal/multifractal
• Lacunarity


Preferential Sampling. Declustering
Problem


Example: geostatistical spatial co-predictions

Sr90 « expensive » information.
Cs137 « cheap » exhaustive information.

(Cross)Variography


Use of Cs137 to
improve Sr90
predictions
(reduced errors
and uncertainty).

Decision-oriented
mapping:
« Thick isolines »


Simulations and Interpolations


Unconditional simulations


SGSim of the precipitation:


Results of the simulations


Post-processing of simulations: mean
and standard deviation


Geostatistics: some comments
• Geostatistics is a powerful and well elaborated
model-dependent approach.
• Geostatistics proposes a variety of models for spatial
data analysis and modeling. It has long and
successful history of developments and applications
• Some problems:
Nonlinearity
Non-stationarity
Two-point statistics
Data/models integration
Data mining. Pattern recognition

• Hybrid Models (ANN/SVM + Geostat) can help.

Some useful comments, conclusions
and future research

• 1. Detection of patterns: try k-NN or GRNN
• as an exploratory tools
• Cross-validation: leave-one-out, leave k-out,
jackknife,etc. as a control tool
• Model selection and model asssessment


K- Nearest Neighbours


K-NN prediction:
NN methods use those k-observations in the training data
set T closest in input space to prediction point x to
estimate Y
k
∧ 1
Y= ∑( x) yi
k xi ∈ Nk
Where Nk(x) is the neighborhood of x defined by the
closest points in the training set

k-NN Classifiers
These classifiers are memory-based and do
not require any model to be fit! Given a
query point x, we find the k training points
closest in the distance to x and then
classify using MAJORITY vote among the
k neighbors.


Because it uses only the training point closest to
the query point, the bias of the 1-nn estimate is
often low, but the variance is high.

A famous result of Cover and Hurt (1967) shows
that asymptotically the error rate of the 1-nn
classifier is never more than twice the Bayes
rate.

This result can provide a rough idea about the best
performance that is possible in a given problem:
if the 1-nn rule has a 10% error rate, then
asymptotically the Bayes error rate is at least
5%.


Dirichlet cells, Thiessen tessellation,
Voronoï polygons


• How to find k ?

Possible answer:

Cross-validation or leave-one-out


k-NN prediction (n=6 ?)
W3~(1/n)

3 W4~(1/n)
W2~(1/n)
r3 4
2 r2 r4

r5 W5~(1/n)

5
r1 r6
6
W1~(1/n)
W6~(1/n)
1


Cross-validation
W3~(1/n)

3 W4~(1/n)
W2~(1/n)
r3 4
2 r2 r4

r5 W5~(1/n)

5
r1 r6
6
W1~(1/n)
W6~(1/n)
1

Calculate error = (prediction-data)


Leave-next-one-out, etc
W3~(1/n)

3 W4~(1/n)
W2~(1/n)
r3 4
2 r2 r4

r5 W5~(1/n)

r1 r6
6
W1~(1/n)
W6~(1/n)
1

5
Calculate error = (prediction-data)

Data and k-nn Cross-
validation error curve


Complete data set and
500 training points linearly interpolated


Cross-validation curve


K-nn predictions


Machine Learning Algorithms
• Machine learning is an area of artificial intelligence
concerned with the development of techniques
which allow computers to "learn".
• More specifically, machine learning is a method
for creating computer programs by the analysis of
data sets. Machine learning overlaps heavily with
statistics, since both fields study the analysis of
data, but unlike statistics, machine learning is
concerned with the algorithmic complexity of
computational implementations. ...


Algorithms
Common algorithm types include:
• supervised learning – where the algorithm generates a function that
maps inputs to desired outputs.
• unsupervised learning – which models a set of inputs: labeled
examples are not available.
• semi-supervised learning – which combines both labeled and
unlabeled examples to generate an appropriate function or classifier.
• reinforcement learning – where the algorithm learns a policy of how to
act given an observation of the world. Every action has some impact in
the environment, and the environment provides feedback that guides
the learning algorithm.
• transduction – similar to supervised learning, but does not explicitly
construct a function: instead, tries to predict new outputs based on
training inputs, training outputs, and new inputs.
• The performance and computational analysis of machine learning
algorithms is a branch of statistics known as
computational learning theory.


ML Topics (short lists)
• Machine learning topics
• Modeling conditional probability density functions,
regression and classification
– Artificial neural networks
– Decision trees
– Gene expression programming
– Genetic Programming
– Gaussian process regression
– Linear discriminant analysis
– k-Nearest Neighbor
– Minimum message length
– Perceptron
– Quadratic classifier
– Radial basis functions
– Support vector machines


ML Topics (continued)
• Modeling probability density functions through generative models:
– Expectation-maximization algorithm
– Graphical models including Bayesian networks and Markov Random Fields
– Generative Topographic Mapping
• Appromixate inference techniques:
– Markov chain Monte Carlo method
– Variational Bayes
• Meta-Learning (Ensemble methods):
– Boosting
– Bootstrap Aggregating aka Bagging
– Random forest
– Weighted Majority Algorithm
• Optimization: most of methods listed above either use optimization or are
instances of optimization algorithms.
• Multi-objective Machine Learning: An approach that addresses multiple, and
often confliciting learning objectives explicitly using Pareto-based multi-
objective optimization techniques.


Machine Learning
• Artificial Neural Networks
3. Multilayer perceptrons (MLP)
4. General Regression Neural
Networks (GRNN)
• Statistical Learning Theory
 Support Vector Classification
 Support Vector Regression
 Monitoring Networks Optimization


A Generic Model of
Learning from Data/Examples

Generator Supervisor

Learning
Machine

The Problem of Risk Minimization

In order to choose the best available model
to the supervisor’s response, one measure
the LOSS or discrepancy L(y,f(x,α))
between the response y of the supervisor
to a given input x and the response f(x,α)
provided by the Loss Measure.


Three Main Learning Problems
• Regression Estimation. Let the supervisor’s
answer y, be a real value, and let f(x,α ), α∈Λ ,
be a set of real functions which contains the
regression function

f ( x, α) = ydF ( y ¦ x )
0 ∫


The Problem of Risk Minimization
Consider the expected value of the loss,
given by the risk functional

R (α) = ∫ L( y , f ( x, α))dF ( x, y )
The goal is to find the function f(x,α 0) which minimises
the risk in the situation where the joint pdf is
unknown and the only available information is
contained in the training set.


• Classification problem:
A B
A
A
A A
A B B
A B
A
B
A A
A
B
B
B
B B
B


• Pattern Recognition (classification).
y = {0,1}, classification error:

0, if y = f ( x,α )
L( y, f ( x,α )) =
1, if y ≠ f ( x,α )


• Regression problem

f(x) ?

 f ( x)
ˆ 
x→ y

• Regression Estimation
It is known that regression function is the one
which minimizes the following loss-function:

L( y, f ( x, α )) = ( y − f ( x, α )) 2


• Probability density estimation

p(x)

M. Kanevski, x
Palermo 2009 53

• Density Estimation. For this problem
we consider the following loss-
function:

L( p( x,α )) = − log p( x,α )


Inductive, Deductive and Transductive

F(x,y)

Induction Deduction

Training samples
(xi, yi) (ynew,xnew)

Transduction

Why Machine Learning algorithms?
• Universal, nonlinear, robust tools
• Data adapted
• Easy data and knowledge integration
• Efficient in high dimensional spaces
• Good generalisation (low prediction
error)
• Input/feature selection


Our experience, some applications
• Hydrogeology, pollution/contamination (soil, water, air,
food chains,…), topo-climatic modelling, geophysics
• Renewable resources – wind fields
• Natural hazards/risks: forest fires, avalanches, indoor
radon,
• Optimization of monitoring networks
• Crime data, epidemiology
• MNL for remote sensing, change detection
• Socio-economic spatio-temporal multivariate data
• Spatial econometrics. Financial data. Econophysics
• Fractals, Chaos, EVT,
• Time series


Model Selection & Model Evaluation


Guillaume d'Occam (1285 - 1349)
“Pluralitas non est ponenda sine
necessitate”

Occam’s razor:
“The more simple explanation of the
phenomena is more likely to be
correct”

Model Assessment and Model
Selection:
Two separate goals


Model Selection:

Estimating the performance of different
models in order to choose the
(approximate) best one

Model Assessment:
Having chosen a final model, estimating its
prediction error (generalization error) on
new data

If we are in a data-rich situation, the best
solution is to split randomly (?) data

Raw Data

Train: 50% Validation:25% Test:25%
(Train) (test) (validation)


Interpretation

• The training set is used to fit the models

• The validation set is used to estimate prediction
error for model selection (tuning
hyperparameters)

• The test set is used for assessment of the
generalization error of the final chosen model

Elements of Statistical Learning- Hastie, Tibshirani & Friedman 2001


Bias and Variance.
Model’s complexity
c. Underfitting
3

2.5

2 b. Overfitting
3
1.5
2.5

1
2

0.5
1.5

2 4 6 8 10 1

0.5

2 4 6 8 10


One of the most serious problems that arises in
connectionist learning by neural networks is
overfitting of the provided training examples.
This means that the learned function fits very
closely the training data however it does not
generalise well, that is it can not model
sufficiently well unseen data from the same task.
Solution: Balance the statistical bias and statistical
variance when doing neural network learning in
order to achieve smallest average generalization
error


Bias-Variance Dilemma
Assume that
Y = f (X ) + ε
where
E (ε ) = 0,
Var (ε ) = σ 2
ε


We can derive an expression for the
expected prediction error of a
regression at an input point X=x0
using squared-error loss:


∧
Err ( x0 ) = E[(Y − f ( x0 )) ¦ X = x0 ] =
2

∧ ∧ ∧
σ + [ E f ( x0 ) − f ( x0 )] + E[ f ( x0 ) − E f ( x0 )] =
2
ε
2 2

∧ ∧
σ + Bias ( f ( x0 )) + Var ( f ( x0 )) =
2
ε
2

IrreducibleError + Bias + Variance 2


• The first term is the variance of the target around
its true mean f(x0), and cannot be avoided no
matter how well we estimate f(x0), unless σε2=0.
• The second term is the squared bias, the amount
by which the average of our estimate differs from
the true mean
• The last term is the variance, the expected
squared deviation of ∧ around its mean.
f ( x0 )


For the k-NN regression fit
∧
Err ( x0 ) = E[(Y − f ( x0 )) ¦ X = x0 ] = 2

k
1
σ + [ f ( x0 ) − ∑ f ( xl )] + σ ε / k
2
ε
2 2

k l =1
Here we assume for simplicity that training
inputs are fixed, and the randomness arises
from the Y. The number of neighbors k is
inversely related to the model complexity

Elements of Statistical Learning. Hastie, Tibshirani & Friedman 2001


• A neural network is only as good as the
training data!

• Poor training data inevitably leads to an
unreliable and unpredictable network.

• Exploratory Data Analysis and data
preprocessing are extremely important!!!


• If possible, prior to training, add some
noise or other randomness to your
example (such as a random scaling
factor). This helps to account for noise and
natural variability in real data, and tends to
produce a more reliable network.


Hybrid Models:
Geostatistics + ML


Data F1,F2,...,Fn
Structural analysis Statistical Trend
Variogram
Raw Data Variogram
description analysis Data for
training validation testing

Lag (km) ANN architecture choice
Validation Testing
Statistical description
ANN Training
Multivariate structural
analysis
Accuracy Test ANN estimates for F1,F2,...,Fn

Variogram model for residuals
Validation Residual Variogram

ANN Residuals
F1,F2,...,Fn
Variogram

Cross-
validation
Lag (km)

Final estimates
Cokriging
(ANN + Geostatistics)
errors estimates

NNRK/CK
Algorithm

Model: Neural Network Residual Cokriging

Artificial Neural
Network Estimate Final estimate of 90Sr with
Geostatistical Estimate
NNRCK
of the Residuals

Conclusions
• Machine Learning: universal data-driven
recently developed approach with many
successful applications. Nonlinear, robust.
Integration of different types of data and
information. Efficient in high dimensional
space.
• But: Depends on the quality and quantity of
data. Uncertainty characterization.
Diagnostic tools. Hyper-parameters tuning.


Topics for the research
• Multitask learning
• Automatic feature selection/ feature extraction
• Uncertainties characterisation
• Understanding and visluation of high
dimensional data
• Modelling on geomanifold, semi-supervised
learning
• Active learning
• MLA and simulations?
• ……………………………………………………


Thank you for your attention!

www.geokernels.org
2004

2008

2009
www.unil.ch/igar


Intelligent Analysis of Environmental Data

Recommended

Recommended

More Related Content

Similar to Intelligent Analysis of Environmental Data

Similar to Intelligent Analysis of Environmental Data (20)

More from Beniamino Murgante

More from Beniamino Murgante (20)

Recently uploaded

Recently uploaded (20)

Intelligent Analysis of Environmental Data