Gaussian processes

Gaussian Processes
Kyle (Kwanghee Choi)

Normal Distribution
Ref. https://en.wikipedia.org/wiki/Normal_distribution

Properties of Normal Distribution
Ref. https://en.wikipedia.org/wiki/Normal_distribution
- Every normal distribution is a version of the N(0, 1) whose domain has been stretched by a factor σ (the
standard deviation) and then translated by µ (the mean value).
- Any linear combination of a fixed collection of normal deviates is a normal deviate.
- Of all probability distributions over the reals with a specified mean µ and variance σ2
, the normal distribution
N(µ, σ2
) is the one with maximum entropy.
- The independence between ˆμ and s can be employed to construct the so-called t-statistic:
- Inverting the distribution of this t-statistics will allow us to construct the confidence interval for μ.

Central Limit Theorem (CLT)
Ref. https://en.wikipedia.org/wiki/Central_limit_theorem
{X1
, …, Xn
}: Random sample of size n
a sequence of independent and identically distributed (i.i.d.) random variables drawn from
a distribution of expected value given by µ and finite variance given by σ2
.

Central Limit Theorem (CLT)
Ref. https://en.wikipedia.org/wiki/Illustration_of_the_central_limit_theorem

Multivariate Gaussian distributions
Ref. https://en.wikipedia.org/wiki/Multivariate_normal_distribution

2D Gaussian
Ref. https://distill.pub/2019/visual-exploration-gaussian-processes/

Marginalization & Conditioning
Ref. https://distill.pub/2019/visual-exploration-gaussian-processes/#MarginalizationConditioning

Gaussian Process Motivation: Non-linear Regression
Ref. https://thegradient.pub/gaussian-process-not-quite-for-dummies/
Traditional non-linear regression typically gives you one function
that it considers to fit these observations the best.
But what about the other ones that are also pretty good?

2D Gaussian as 2 Samples

2D Gaussian Conditioning

5D Gaussian

Family of Curves

Conditioning on Known Points
Ref. https://distill.pub/2019/visual-exploration-gaussian-processes/#Posterior

Kernels
Ref. https://distill.pub/2019/visual-exploration-gaussian-processes/#MultipleKernels

Impact of Kernels on Prior Distributions
Ref. https://distill.pub/2019/visual-exploration-gaussian-processes/#Prior

Combination of Kernels
Ref. https://distill.pub/2019/visual-exploration-gaussian-processes/#KernelCombinations

Gaussian Process in Continuous Case

Gaussian Processes as Single Layer Neural Networks
- If weight and bias parameters are taken to be i.i.d., post activations xj
1
, xj'
1
are
independent for j ≠ j'.
- As zi
1
(x) is a sum of i.i.d. terms, by CLT, it will be Gaussian distributed when the
network is infinitely wide.
- Therefore, any finite collection of {zi
1
(xα=1
), …, zi
1
(xα=k
)} will have a joint
multivariate Gaussian distribution, which is exactly the definition of Gaussian
process.
Ref. Radford M. Neal, Priors for Infinite Networks, University of Toronto, 1994

Gaussian Processes as Deep Neural Networks
- Constructing kernels equivalent to infinitely wide neural networks with two hidden
layers and nonlinearities
- Tamir Hazan et al., Steps toward deep kernel methods from infinite neural networks, arxiv 2015
- Dropout training in neural networks as approximate Bayesian inference in deep
Gaussian processes
- Yarin Gal et al., Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep
Learning, ICML 2016
- Exact equivalence of infinitely wide deep networks and Gaussian Processes
- Jaehoon Lee et al., Deep Neural Networks as Gaussian Processes, ICLR 2018
- Convergence towards Gaussian processes of Bayesian infinitely wide deep neural
networks
- Alexander G. de G. Matthews et al., Gaussian Process Behaviour in Wide Deep Neural Networks, ICLR
2018
- … and much more!

Next Steps
- Overparameterization obtains good test accuracy
- Chiyuan Zhang et al., Understanding Deep Learning Requires Rethinking Generalization, CVPR 2017
- Empirical properties of overfitted classifiers
- Mikhail Belkin et al., To Understand Deep Learning We Need to Understand Kernel Learning, ICML
2018
- Evolution of an ANN during training can be described by a kernel
- Arthur Jacot et al., Neural Tangent Kernels: Convergence and Generalization in Neural Networks,
NeurIPS 2018
- Efficient exact algorithm for computing the extension of NTK to CNN
- Sanjeev Arora et al., On Exact Computation with an Infinitely Wide Neural Net, NeurIPS 2019

Gaussian processes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Gaussian processes

Similar to Gaussian processes (20)

More from Kwanghee Choi

More from Kwanghee Choi (19)

Recently uploaded

Recently uploaded (20)

Gaussian processes