Poster for NIPS Time Series Analysis 2016 in Barcelona, Spain.
We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset.
The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables.
Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns.
Finally, we illustrate the methodology with financial time series (credit default swaps, stocks, foreign exchange rates).
Code and numerical experiments are available online at \url{https://www.datagrapple.com/Tech} for reproducible research.
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Exploring and measuring non-linear correlations
1. Exploring and measuring non-linear correlations
G. Marti†
, S. Andler†‡
, F. Nielsen , P. Donnat†
(presented by M. Binkowski†∗
)
†
Hellebore Capital Ltd, Ecole Polytechnique, ‡
ENS de Lyon, ∗
Imperial College London
Motivations
• Interpretability of pairwise dependence
• Summary of associations between many variables
• Find abnormal dependence patterns
• Design robust and custom dependence coefficients
• Query the dataset for specific associations
• Realistic simulations of market variables
Copulas
Sklar’s Theorem
Let X = (Xi, Xj) be a random vector with
a joint cumulative distribution function F, and
having continuous marginal cumulative distribu-
tion functions Fi, Fj respectively. Then, there
exists a unique distribution C such that
F(Xi, Xj) = C(Fi(Xi), Fj(Xj)).
C, the copula of X, is the bivariate distribution
of uniform marginals Ui, Uj := Fi(Xi), Fj(Xj).
Fréchet-Hoeffding copula bounds
0 0.5 1
ui
0
0.5
1
uj
w(ui,uj)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 0.5 1
ui
0
0.5
1
uj
W(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 0.5 1
ui
0
0.5
1
uj
π(ui,uj)
0.00036
0.00037
0.00038
0.00039
0.00040
0.00041
0.00042
0.00043
0.00044
0 0.5 1
ui
0
0.5
1
uj
Π(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 0.5 1
ui
0
0.5
1
uj
m(ui,uj)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 0.5 1
ui
0
0.5
1
uj
M(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Figure 1: Copulas measure (left column) and cumulative dis-
tribution function (right column) heatmaps for negative de-
pendence (first row), independence (second row), i.e. the
uniform distribution over [0, 1]2
, and positive dependence
(third row)
The methodology - Clustering of copulas & custom dependence coefficients
The methodology leverages copulas for encoding depen-
dence between two variables, state-of-the-art optimal
transport for providing a relevant geometry to the cop-
ulas, and clustering for summarizing the main depen-
dence patterns found between the variables. Some of
the clusters centers can be used to parameterize a cus-
tom dependence coefficient.
Target/Forget Dependence Coefficient: Let {C−
l }l be
the set of forget-dependence copulas, and {C+
k }k be the
set of target-dependence copulas. Let C be the copula
of (Xi, Xj).
TFDC Xi, Xj; {C+
k }k, {C−
l }l :=
minl dM(C−
l , C)
minl dM(C−
l , C) + mink dM(C, C+
k )
∈ [0, 1].
Which geometry for copulas?
In [1], we detail the benefit of optimal transport over
information divergences for clustering copulas.
Figure 2: Copulas C1, C2, C3 encoding a correlation of
0.5, 0.99, 0.9999 respectively; Which pair of copulas is the near-
est? For Fisher-Rao, Kullback-Leibler, Hellinger and related di-
vergences: D(C1, C2) ≤ D(C2, C3); W2(C2, C3) ≤ W2(C1, C2)
We use results from [2], [3] to compute faster the
distances and barycenters needed for the clustering.
0 0.5 1
0
0.5
1 Bregman barycenter copula
0.0000
0.0008
0.0016
0.0024
0.0032
0.0040
0.0048
0.0056
0 0.5 1
0
0.5
1 Wasserstein barycenter copula
0.0000
0.0004
0.0008
0.0012
0.0016
0.0020
0.0024
0.0028
0.0032
Figure 3: Barycenter for: (left) Bregman geometry (which in-
cludes, for example, squared Euclidean and Kullback-Leibler dis-
tances); (right) Wasserstein geometry.
Copulas of financial time series
We apply clustering to the N
2 bivariate copulas of
a financial time series dataset consisting in daily re-
turns of stocks, credit default swaps and FX rates.
Figure 4: Credit default swaps: More mass in the top-right
corner, i.e. upper tail dependence. Insurance cost against the
default of companies tends to soar in distressed market.
Queries about dependence
(A) (B) (C) (D)
Figure 5: Target copulas (simulated or handcrafted) and their
respective nearest copulas which answer questions A,B,C,D
• (A) most Gaussian with ρ = 0.7?
• (B) both positively and negatively correlated?
• (C) extreme returns for one, small for the other?
• (D) uncorrelated but correlated for small returns?
References
[1] G. Marti, S. Andler, F. Nielsen, P. Donnat, IEEE
Statistical Signal Processing Workshop (2016), 1-5.
[2] M. Cuturi, Advances in Neural Information Processing
Systems (2013), 2292-2300.
[3] M. Cuturi, A. Doucet, Proceedings of the 31th
International Conference on Machine Learning (2014),
685-693.
HELLEBORECAPITAL