A presentation for the Hong Kong Machine Learning meetup summarizing my hobby research over the past year. My goal is to be able to simulate realistic multivariate financial time series. If so, I will be able to compare different statistical methods for portfolio construction, studying complex networks, algorithmic trading, being able to do some reinforcement learning, etc. Still far from being achieved...
Lundin Gold April 2024 Corporate Presentation v4.pdf
My recent attempts at using GANs for simulating realistic stocks returns
1. My recent attempts at using GANs for simulating
realistic stocks returns
Hong Kong Machine Learning Meetup - Season 2 Episode 4 [online]
Gautier Marti
HKML
8 April 2020
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 1 / 28
2. Table of contents
1 Motivations
2 My attempts at building CorrGAN
Starting simple, always: The 3-dimensional case
From 3D to nD, many difficulties arise. . .
Exploring different architectures
Evaluation of CorrGAN
3 Next steps
Comparison of ML-based portfolio allocation methods
cCorrGAN for conditional sampling on the market state
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 2 / 28
4. Motivations
Most financial time series are too short!
We only observe one path of history out of the many possible.
As a consequence, most findings (e.g. trading algos, cross-sectional
alphas, portfolio construction methods) could be over-fitted to this one
particular observed path.
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 4 / 28
5. Monte Carlo Simulations: A set of techniques to alleviate
these problems
Ideally: We want to sample time series from the underlying true
(multivariate) distribution.
Some of the techniques available:
sampling from a parametric distribution (iid, parameters fit on a
single path, simplistic and unrealistic distribution) [1946]
bootstrapping (iid, only historical values) [1979]
stationary block-bootstrapping (only historical values) [1994]
GANs (less obvious assumptions, but dependent on many
hyper-parameters such as its architecture) [2014]
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 5 / 28
6. GANs
Already presented at the meetup by Alex Lau: http://www.hkml.ai/
2019/07/hong-kong-machine-learning-season-1-episode-12/
In finance (time series), not much yet but:
https://arxiv.org/abs/1901.01751, univariate time series;
https://arxiv.org/abs/1907.06673, univariate time series;
For multivariate time series, i.e. capturing the joint behaviour of a large
number of stocks, nothing really.
CorrGAN, https://arxiv.org/abs/1910.09504, is a first step.
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 6 / 28
7. CorrGAN scope
Simulating the full multivariate distribution of stocks returns, that is
their joint behaviour (think correlations between the stocks), and also
marginal behaviour (think their typical volatility and occasional
jumps) is hard.
With CorrGAN, I will only focus on their joint behaviour as captured
by correlation matrices (already a major simplification of the full
dependence distribution - cf. copula theory).
Goal: Sampling realistic correlation matrices which could have been
estimated from real stock returns.
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 7 / 28
8. Section 2
My attempts at building CorrGAN
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 8 / 28
9. Subsection 1
Starting simple, always: The 3-dimensional case
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 9 / 28
11. Subsection 2
From 3D to nD, many difficulties arise. . .
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 11 / 28
12. How to evaluate in nD?
Challenge: Not possible to visualize anymore the space of empirical and
simulated correlations, how to evaluate?
Several stylized facts are known about these matrices:
Distribution of pairwise correlations is significantly shifted to the
positive,
Eigenvalues follow the Marchenko–Pastur distribution, but for
1 a very large first eigenvalue,
2 a couple of other large eigenvalues,
Perron-Frobenius property (first eigenvector has positive entries),
Hierarchical structure of clusters,
Scale-free property of the corresponding MST.
http://marti.ai/ml/2019/07/15/
financial-correlations-stylized-facts.html
Alternative: Compare empirical (real) and generated (fake) distributions
using Topological Data Analysis https://arxiv.org/abs/1802.02664
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 12 / 28
13. Permutation invariance in neural networks?
GANs rely on deep nets. Those are in general not permutation invariant.
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 13 / 28
14. Why do we care about permutation invariance?
Regression task: Given a set of coefficients (the upper diagonal of a
correlation matrix), output the sum of its values.
Remark: There are n(n−1)
2 ! equivalent input vectors. If we don’t leverage
permutation invariance, the number of examples is not sufficient for the
model to “learn”. http://marti.ai/ml/2019/09/01/
correl-invariance-permutations-nn.html
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 14 / 28
15. Idea 1: Build invariance directly into the NN architecture
A simple neural network module based on the permutation invariance
property of the sum operator one can plug into the main deep net for
adding permutation invariance to it:
Deep Sets https://arxiv.org/abs/1703.06114
My experience is that it is not working technology yet. Some other
research supporting this claim https://arxiv.org/abs/1901.09006.
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 15 / 28
16. Idea 2: Find a canonical representation
Find a canonical representation, e.g. associate each of the n! equivalent
correlation matrices to the same one, the representer.
Arbitrary C Rij = CπS (i)πS (j) Rij = CπH (i)πH (j)
Figure 1: Three equivalent correlation matrices. The leftmost one has been
obtained by estimation on returns of arbitrarily ordered stocks; The one displayed
in the middle has been reordered by applying the same permutation πS to the
rows and columns (obtained by sorting the rows according to their sum); The
rightmost one by applying the same permutation πH to the rows and columns
(induced by a hierarchical clustering algorithm).
Question: Are some representations better than others?
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 16 / 28
18. MLP GAN
Did not manage to make it work: The GAN converges toward generating
the mean of the dataset.
Empirical Generated Mean of empirical
Figure 2: (Left) Flatten upper triangular of an empirical correlation matrix
re-ordered by πS and displayed in Figure 1; (Center) An example of vector
generated by the MLP GAN trained on 10,000 flatten upper triangular of
empirical correlation matrices re-ordered by πS . It seems that the model has
learnt to generate an average of the empirical correlations (Right).
http://marti.ai/ml/2019/09/22/
tf-mlp-gan-repr-correlation-matrices.html
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 18 / 28
19. DCGAN + Hierarchical sorting ≈ CorrGAN
Figure 3: Three correlation matrices; Can you guess which one is
DCGAN-generated?
http://marti.ai/ml/2019/10/13/
tf-dcgan-financial-correlation-matrices.html
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 19 / 28
20. Subsection 4
Evaluation of CorrGAN
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 20 / 28
21. Evaluation of CorrGAN
As a first evaluation, we can verify that the generated matrices verify the
known stylized facts:
Figure 4: (Left) Distribution of correlations; (Center) Distribution of eigenvalues;
(Right) First eigenvector entries
Results are summarized in the paper:
https://arxiv.org/abs/1910.09504
http://marti.ai/ml/2019/10/13/
tf-dcgan-financial-correlation-matrices.html
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 21 / 28
22. CorrGAN.io
One can look at outputs of the model (fake) vs real empirical correlations,
and try to guess which is which.
Figure 5: http://www.corrgan.io/, a simple web app using Flask.
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 22 / 28
24. Subsection 1
Comparison of ML-based portfolio allocation methods
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 24 / 28
25. Lopez de Prado HRP vs. Papenbrock-Raffinot HERC
http://marti.ai/qfin/2019/12/04/
hierarchical-risk-parity-part-3.html
http://marti.ai/qfin/2020/03/22/
herc-part-i-implementation.html
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 25 / 28
26. Subsection 2
cCorrGAN for conditional sampling on the market state
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 26 / 28
27. cCorrGAN - {normal, stressed, rally} market correlations
We may want to sample conditional on the market state. For example,
3-modal: normal, rally, and stressed.
Figure 6: Correlation matrices estimated when the market was in a normal, rally,
and stress state respectively.
Preparing the training set: http:
//marti.ai/qfin/2020/02/03/sp500-sharpe-vs-corrmats.html
Gautier Marti (HKML) GANs and financial stock returns 8 April 2020 27 / 28