Exploring and measuring non-linear correlations

Exploring and measuring non-linear correlations
G. Marti†
, S. Andler†‡
, F. Nielsen , P. Donnat†
(presented by M. Binkowski†∗
)
†
Hellebore Capital Ltd, Ecole Polytechnique, ‡
ENS de Lyon, ∗
Imperial College London
Motivations
• Interpretability of pairwise dependence
• Summary of associations between many variables
• Find abnormal dependence patterns
• Design robust and custom dependence coefficients
• Query the dataset for specific associations
• Realistic simulations of market variables
Copulas
Sklar’s Theorem
Let X = (Xi, Xj) be a random vector with
a joint cumulative distribution function F, and
having continuous marginal cumulative distribu-
tion functions Fi, Fj respectively. Then, there
exists a unique distribution C such that
F(Xi, Xj) = C(Fi(Xi), Fj(Xj)).
C, the copula of X, is the bivariate distribution
of uniform marginals Ui, Uj := Fi(Xi), Fj(Xj).
Fréchet-Hoeffding copula bounds
0 0.5 1
ui
0
0.5
1
uj
w(ui,uj)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 0.5 1
ui
0
0.5
1
uj
W(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 0.5 1
ui
0
0.5
1
uj
π(ui,uj)
0.00036
0.00037
0.00038
0.00039
0.00040
0.00041
0.00042
0.00043
0.00044
0 0.5 1
ui
0
0.5
1
uj
Π(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 0.5 1
ui
0
0.5
1
uj
m(ui,uj)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 0.5 1
ui
0
0.5
1
uj
M(ui,uj)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Figure 1: Copulas measure (left column) and cumulative dis-
tribution function (right column) heatmaps for negative de-
pendence (first row), independence (second row), i.e. the
uniform distribution over [0, 1]2
, and positive dependence
(third row)
The methodology - Clustering of copulas & custom dependence coefficients
The methodology leverages copulas for encoding depen-
dence between two variables, state-of-the-art optimal
transport for providing a relevant geometry to the cop-
ulas, and clustering for summarizing the main depen-
dence patterns found between the variables. Some of
the clusters centers can be used to parameterize a cus-
tom dependence coefficient.
Target/Forget Dependence Coefficient: Let {C−
l }l be
the set of forget-dependence copulas, and {C+
k }k be the
set of target-dependence copulas. Let C be the copula
of (Xi, Xj).
TFDC Xi, Xj; {C+
k }k, {C−
l }l :=
minl dM(C−
l , C)
minl dM(C−
l , C) + mink dM(C, C+
k )
∈ [0, 1].
Which geometry for copulas?
In [1], we detail the benefit of optimal transport over
information divergences for clustering copulas.
Figure 2: Copulas C1, C2, C3 encoding a correlation of
0.5, 0.99, 0.9999 respectively; Which pair of copulas is the near-
est? For Fisher-Rao, Kullback-Leibler, Hellinger and related di-
vergences: D(C1, C2) ≤ D(C2, C3); W2(C2, C3) ≤ W2(C1, C2)
We use results from [2], [3] to compute faster the
distances and barycenters needed for the clustering.
0 0.5 1
0
0.5
1 Bregman barycenter copula
0.0000
0.0008
0.0016
0.0024
0.0032
0.0040
0.0048
0.0056
0 0.5 1
0
0.5
1 Wasserstein barycenter copula
0.0000
0.0004
0.0008
0.0012
0.0016
0.0020
0.0024
0.0028
0.0032
Figure 3: Barycenter for: (left) Bregman geometry (which in-
cludes, for example, squared Euclidean and Kullback-Leibler dis-
tances); (right) Wasserstein geometry.
Copulas of financial time series
We apply clustering to the N
2 bivariate copulas of
a financial time series dataset consisting in daily re-
turns of stocks, credit default swaps and FX rates.
Figure 4: Credit default swaps: More mass in the top-right
corner, i.e. upper tail dependence. Insurance cost against the
default of companies tends to soar in distressed market.
Queries about dependence
(A) (B) (C) (D)
Figure 5: Target copulas (simulated or handcrafted) and their
respective nearest copulas which answer questions A,B,C,D
• (A) most Gaussian with ρ = 0.7?
• (B) both positively and negatively correlated?
• (C) extreme returns for one, small for the other?
• (D) uncorrelated but correlated for small returns?
References
[1] G. Marti, S. Andler, F. Nielsen, P. Donnat, IEEE
Statistical Signal Processing Workshop (2016), 1-5.
[2] M. Cuturi, Advances in Neural Information Processing
Systems (2013), 2292-2300.
[3] M. Cuturi, A. Doucet, Proceedings of the 31th
International Conference on Machine Learning (2014),
685-693.
HELLEBORECAPITAL

Exploring and measuring non-linear correlations

Recommended

Recommended

More Related Content

More from Gautier Marti

More from Gautier Marti (14)

Recently uploaded

Recently uploaded (20)

Exploring and measuring non-linear correlations