2. Outline
● Introduction
The unsupervised learning
● The Self-Organizing Map
The biological inspiration
The algorithm
Characteristics
Examples
● Practical examples using MATLAB
1
3. Introduction
Unsupervised learning is a way to form “natural groupings”
or clusters of patterns.
Unsupervised learning seeks to determine how the data are
organized.
It is distinguished from supervised learning in that the
learner is given only unlabeled examples.
.
Among neural network models, the Self-Organizing Map
(SOM) are commonly used unsupervised learning
algorithms.
The SOM is a topographic organization in which nearby
locations in the map represent inputs with similar properties.
2
4. The Self-Organizing Map
The biological inspiration
Sensory information is processed in the
neocortex by highly ordered neuronal
networks.
• Tangential to the cortical surface,
W. Penfield representations of the sensory periphery
are organized into well-ordered maps.
• Taste maps in gustatory cortex (Accolla
et al., 2007)
• Somatotopic maps in primary
somatosensory cortex (Kaas, 1991).
3
5. The Self-Organizing Map
The biological inspiration
Other prominent cortical maps
are the tonotopic organization
of auditory cortex (Kalatsky et
al., 2005),
The most intensely studied
example is the primary visual
cortex, which is arranged with
superimposed maps of
retinotopy, ocular dominance
and orientation (Bonhoeffer
and Grinvald, 1991).
4
7. The Self-Organizing Map
The biological inspiration
Somatosensory cortex dominated by the representation
of teeth in the naked mole-rat brain
Kenneth C. Catania, and Michael S. Remple.
6
8. The Self-Organizing Map
The biological inspiration
A remarkably high degree of organization is obvious in the
primary somatosensory cortex, in which a clear pattern of
cytoarchitectonic units termed ‘barrels’ are observed in
perfect match with the arrangement of the whiskers on the
snout of the mouse (Woolsey and Van der Loos, 1970)
7
9. The Self-Organizing Map
The biological inspiration
Mapping functionally related sensory
information onto nearby cortical regions is
thought to minimize axonal wiring length and
simplify the synaptic circuits underlying
correlation-based associational plasticity.
8
10. The Self-Organizing Map
In a topology-preserving map, units located physically
next to each other will respond to classes of input vectors
that are likewise next to each other.
Although it is easy to visualize units next to each other in a
Teuvo Kohonen
two-dimensional array, it is not so easy to determine
which classes of vectors are next to each other in a high-
dimensional space.
Large-dimensional input vectors are, in a sense, projected
down on the two dimensional map in a way that maintains
the natural order of the input vectors.
This dimensional reduction could allow us to visualize
easily important relationships among the data that
otherwise might go unnoticed.
9
11. The Self-Organizing Map
A SOM is formed of neurons located on a
regular, usually 1- or 2-dimensional grid.
The neurons are connected to adjacent
neurons by a neighborhood relation
dictating the structure of the map.
In the 2-dimensional case the neurons of
the map can be arranged either on a
rectangular or a hexagonal lattice
2 2
1 1
0 Input Input
0
10
14. The algorithm
The training utilizes BMU
competitive learning.
The neuron with weight
vector most similar to the
input is called the best
matching unit (BMU).
The weights of the BMU
and neurons close to it in
the SOM lattice are
adjusted towards the
input vector.
The magnitude of the
change decreases with
time and with distance
from the BMU.
2
25. Clusters of sites with similar
characteristics
Soil What crops or varieties are likely to perform well where and
when.
Climate
Genotype
Homologues places for Colombian coffee production.
Brazil, Equator, East Africa, and New Guinea.
14
2
26. Clusters of sites with similar
characteristics
For commercial (mass production) crops (rice, corn) it is known the
“when” and “where”
For native crops (guanabana, lulo) or special types of crops (coffee
varieties) it is not the case.
When and what I must cultivate ?
Market demand
DAPA
(Diversification
Agriculture Project
The COCH project Alliance)
16
2
27. 1. Large database
The challenges
2. Multivariable problem
1 point
1 Km
1 Km
1 336,025 points
2
28. The challenges
Introduction
1. Large datasets
2. Multivariate problem
Climate, management, variety, climate estimates, soil etc.
Example. BIOCLIM is a bioclimatic prediction system which uses surrogate
terms (bioclimatic parameters) derived from mean monthly climate
estimates, to approximate energy and water balances at a given location
B1. Annual Mean Temperature B11. Mean Temperature of Coldest Quarter
B2. Mean Diurnal Range(Mean(period max-min)) B12. Annual Precipitation
B3. Isothermality (P2/P7) B13. Precipitation of Wettest Period
B4. Temperature Seasonality (Coefficient of Variation) B14. Precipitation of Driest Period
B5. Max Temperature of Warmest Period B15. Precipitation Seasonality
B6. Min Temperature of Coldest Period (Coefficient of Variation)
B7. Temperature Annual Range (P5-P6) B16. Precipitation of Wettest Quarter
B8. Mean Temperature of Wettest Quarter B17. Precipitation of Driest Quarter
B9. Mean Temperature of Driest Quarter B18. Precipitation of Warmest Quarter
B10. Mean Temperature of Warmest Quarter B19. Precipitation of Coldest Quarter
2
29. Clusters of sites with similar
characteristics
How to work ?
How to obtain Prototypes, Clustering and Visualization at the same
time ?
Approach
Unsupervised learning
Self-organizing maps
Two flavors of SOMs
Self-organizing maps Growing hierarchical map
Static map – Just one representation Different representations to different levels
2
30. Clusters of sites with similar
characteristics
Self-Organizing Map (SOM)
The clusters found in the
feature space in many
cases are not the same as
those found in geographic
space.
Represent clusters of a
multidimensional space:
map multidimensional data
onto a two-dimensional
lattice of cells.
Similarity of sugarcane
growing environmental
conditions (1999-2005)
using Self-organizing
2 maps
29
32. P1. Annual Mean Temperature
P2. Mean Diurnal Range(Mean(period max-min))
Introduction
P3. Isothermality (P2/P7)
P4. Temperature Seasonality (Coefficient of Variation)
P5. Max Temperature of Warmest Period
P6. Min Temperature of Coldest Period
P7. Temperature Annual Range (P5-P6)
P8. Mean Temperature of Wettest Quarter
P9. Mean Temperature of Driest Quarter
P10. Mean Temperature of Warmest Quarter
P11. Mean Temperature of Coldest Quarter
P12. Annual Precipitation
P13. Precipitation of Wettest Period
P14. Precipitation of Driest Period
P15. Precipitation Seasonality(Coefficient of Variation)
P16. Precipitation of Wettest Quarter
P17. Precipitation of Driest Quarter
P18. Precipitation of Warmest Quarter
P19. Precipitation of Coldest Quarter
GHSOM
Component
planes
2