Talk at IEEE ICMLA 2015 Miami
In this presentation, we suggest some data perturbations that can help to validate or reject a clustering methodology besides yielding insights on the time series at hand. We show in this study that Pearson correlation is not that relevant for clustering these time series since it yields unstable clusters; prefer a more robust measure such as Spearman correlation based on rank statistics.
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
On the stability of clustering financial time series
1. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
On the Stability of Clustering Financial Time
Series – How to investigate?
IEEE ICMLA Miami, Florida, USA, December 9-11, 2015
Gautier Marti, Philippe Very, Philippe Donnat, Frank Nielsen
9 December 2015
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
2. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
1 Introduction to financial time series clustering
2 Empirical results from the clustering stability study
3 Conclusion
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
3. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Financial time series (data from www.datagrapple.com)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
4. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Clustering?
Definition
Clustering is the task of grouping a set of objects in such a way
that objects in the same group (cluster) are more similar to each
other than those in different groups.
French banks (blue) and
building materials (red)
CDS over 2006-2015
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
5. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Why clustering?
Mathematical finance: Use of variance-covariance matrices
(e.g., Markowitz, Value-at-Risk)
Stylized fact: Empirical
variance-covariance matrices
estimated on financial time
series are very noisy
(Random Matrix Theory,
Noise Dressing of Financial
Correlation Matrices, Laloux
et al, 1999)
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
λ
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
ρ(λ)
Marchenko-Pastur distribution vs.
empirical eigenvalues distribution
of the correlation matrix
How to filter these variance-covariance matrices?
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
6. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
For filtering, clustering!
Mantegna (1999) et al’s work:
0 100 200 300 400 500
0
100
200
300
400
500
0 100 200 300 400 500
0
100
200
300
400
500
0 100 200 300 400 500
0
100
200
300
400
500
(left) empirical correlation matrix
(center) the same matrix seriated using a hierarchical clustering
(right) correlations filtered using the clustering structure
N.B. other applications: statarb, alternative risk measures
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
7. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Why stability?
statistical consistency of
the clustering method
requires assumptions that
may not hold in practice:
e.g. returns are i.i.d.,
underlying elliptical copula,
enough data is available
stability is a weaker
property: reproducibility of
results across a wide range
of slight data perturbations
Clusters obtained at time t, t + 1,
t + 2; Is the difference between the
successive clusters a“true”signal?
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
8. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Is the clustering of financial time series stable?
According to [2], clusters are not stable
with respect to the clustering algorithm,
but only a squared Euclidean distance was considered which is not
relevant for clustering assets from their returns (cf. [4]).
Idea: A more relevant distance should increase stability
We investigate the clustering stability resulting from using:
an Euclidean distance
a Pearson correlation distance [3]
a Spearman correlation distance
a distance for comparing two dependent random variables [4]
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
9. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Some usual distances for clustering financial time series
(Pi
t )t≥0
Si
t+1 = log Pi
t+1 −log Pi
t
(Si
t )t≥1
Euclidean distance:
d(Si , Sj ) = T
t=1(Si
t − Sj
t )2
Pearson correl.: ρ(Si , Sj ) =
T
t=1(Si
t −Si )(Sj
t −Sj )
T
t=1(Si
t −Si )2 T
t=1(Sj
t −Sj )2
Spearman correl.: ρS (Si , Sj ) =
1 − 6
T(T2−1)
T
t=1(Si
(t) − Si
(t))2
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
10. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Generic Non-Parametric Distance [4]
d2
θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2
+ (1 − θ)
1
2 R
dPi
dλ
−
dPj
dλ
2
dλ
(i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric,
(iii) dθ is invariant under diffeomorphism
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
11. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Generic Non-Parametric Distance [4]
d2
0 : 1
2 R
dPi
dλ −
dPj
dλ
2
dλ = Hellinger2
d2
1 : 3E |Pi (Xi ) − Pj (Xj )|2
=
1 − ρS
2
= 2−6
1
0
1
0
C(u, v)dudv
Remark: If
f (x, θ) = c(F1(x1; ν1), . . . , FN(xN; νN); θc)
N
i=1
fi (xi ; νi )
then with CML hypothesis
ds2
= ds2
copula +
N
i=1
ds2
margins
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
12. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
1 Introduction to financial time series clustering
2 Empirical results from the clustering stability study
3 Conclusion
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
13. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Sliding Window
PCA stability curve (red) vs.
Euclidean Clusters stability curve as
a function of time using results from
[1] for fair comparison: clusters are
more stable
most basic perturbation:
traders face it everyday
when monitoring their
indicators
we do not want to overfit
our analysis to this
particular stability goal
stability perf.: dist. [4]
Spearman Pearson
Euclidean
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
14. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Odd vs. Even
A clustering al-
gorithm applied
on two samples
describing the same
phenomenon should
yield the same
results.
How to obtain two
of these samples? (un)Stability of
clusters with L2
distance
Stability of clusters
with the proposed
distance [4]
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
15. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Economic Regimes
AXA 5-year CDS spread over 2006-2015
Average of the pairwise
correlations; correlation
skyrockets during crises
Is the clustering structure persistent?
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
16. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Economic Regimes Clustering Stability
Pearson (top left), Spearman (top right),
Euclidean (bottom left), corr+distr (bottom right)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
17. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Heart vs. Tails Clustering Stability
≈ orange+red vs. green+yellow periods
Pearson (top left), Spearman (top right),
Euclidean (bottom left), corr+distr (bottom right)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
18. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Multiscale
Is the clustering structure persistent to different sampling frequencies?
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
19. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Multiscale Clustering Stability
Pearson (top left), Spearman (top right),
Euclidean (bottom left), corr+distr (bottom right)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
20. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Maturities & Term Structure
An asset is described by several time series whose dynamics are similar:
Nokia Oyj is described here by the cost of insurance against its default
for {1, 3, 5, 7, 10} years
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
21. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Maturities & Term Structure Clustering Stability
Pearson (top left), Spearman (top right),
Euclidean (bottom left), corr+distr (bottom right)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
22. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
1 Introduction to financial time series clustering
2 Empirical results from the clustering stability study
3 Conclusion
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
23. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Discussion and questions?
A given clustering algorithm yields a particular clustering
structure, but with a relevant distance it can be more stable
The perturbations presented can be readily extended (e.g.
using different CDS datasets)
Disclosing stability results is interesting since complex
models often perform poorly (the many parameters are
somewhat overfitted) and cannot be used by practitioners
Correlation+distribution distance (presented in [4]) may work
for your applications (which ones?)
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
24. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
C. Ding and X. He.
K-means clustering via principal component analysis.
In Proceedings of the twenty-first international conference on
Machine learning, page 29. ACM, 2004.
V. Lemieux, P. S. Rahmdel, R. Walker, B. Wong, and
M. Flood.
Clustering techniques and their effect on portfolio formation
and risk analysis.
In Proceedings of the International Workshop on Data Science
for Macro-Modeling, pages 1–6. ACM, 2014.
R. N. Mantegna and H. E. Stanley.
Introduction to econophysics: correlations and complexity in
finance.
Cambridge university press, 1999.
G. Marti, P. Very, and P. Donnat.
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
25. Introduction to financial time series clustering
Empirical results from the clustering stability study
Conclusion
Toward a generic representation of random variables for
machine learning.
Pattern Recognition Letters, 2015.
Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series