What is probabilistic programming and Bayesian statistics? What are their strengths and limitations? In his talk, Marco located Bayesian networks in the current AI landscape, gently introduced Bayesian reasoning and computation and explained how to implement generative models in R.
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
1. Gentle Introduction:
Bayesian Modelling and Probabilistic Programming in R
Geneva R Users Group
Speaker: Marco Wirthlin
@marcowirthlin
Image Source: https://i.stack.imgur.com/GONoV.jpg
Uploaded Version
2. This talk was made for Geneva R Users:
Image Source: https://i.stack.imgur.com/GONoV.jpg
4. Your Boss: “Can you give us a hand?”
“Look at this complex machine.
Sometimes it malfunctions and
produces items that will have
faults difficult to spot.
Can you predict when and
why this happens?”
6. How would you solve this? (Discriminative Edition)
Raw
Data
Tidy
Data
ML Ready
Data
Trained
Classifier
● Cleaning
● Munching
● Exploratory
Analysis
● KNN
● PCA/ICA
● Random Forest
● Feature
Engineering
● Regularization
● Model Tuning
● Training
Prediction /
Classification
● Validation
7. Raw
Data
Tidy
Data
● Cleaning
● Munching
● Exploratory
Analysis
● KNN
● PCA/ICA
● Random Forest
● Feature
Engineering
● Regularization
How would you solve this? (Generative Edition)
Candidate
Model(s)
Domain
Knowledge
● (Re)parametrization
● Refinement
● Prior/Posterior
Simulations
● Model Selection
● Scientific Comm.
Phenomenon
Simulations
+ Gain
Understanding
● Apply
Knowledge
● Know Uncertainty
Fix Problem (?)
8. What is a generative model?
●
Ng, A. Y. and Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems, pages 841–848.
●
Rasmus Bååth, Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?: https://www.youtube.com/watch?time_continue=366&v=3OJEae7Qb_o
Hypothesis of underlying
mechanisms
AKA: “Learning the class”
No shorcuts!
=D
[2, 8, ..., 9]
θ, μ, ξ, ...
Parameters
Model
Bayesian Inference
9. Recap: When to use which approach
● http://www.fharrell.com/post/stat-ml/
●
http://www.fharrell.com/post/stat-ml2/
Statistical Models
Little/Expensive/Inaccessible
Is relevant
Isolate effects of few
Are transparent
Many, Explicit
Understanding predictors
Data
Uncertainty
Num. of Param.
Interpretability
Assumptions
Goal
Machine Learning
Abundant
Not relevant
Many
Black Box
Some, Implicit
Overall Prediction
*
* Very general guidelines!
● E.g. Bayesian models scale well with many parameters and also with data due to inter and intra chain GPU parallelization.
● Example hybrid methods: Deep (Hierarchical) Bayesian Neural Networks, Bayesian Optimization. Gaussian Mixture Models
11. Likelihoods
Normal Distribution
=L p(D | θ)
~x N(μ, σ2
)
“The probability that D belongs to
a distribution with mean μ and SD
σ”
=L p(D | μ, σ2
)
“X ”is normally distributed
PDF: Fix parameters, vary data
L: Fix data, vary parameters
●
https://www.youtube.com/watch?v=ScduwntrMzc
Applet: https://seneketh.shinyapps.io/Likelihood_Intuition
12. Interlude: Frequentist Inference
Y = [7, ..., 2]
X = [2, ..., 9]
Y = a * X + b
Y ~ N(a * X + b, σ2
) =L p(D | a, b, σ2
)
argmax(Σln(p(D | a, b, σ2
))
a b σ2
MLE
“True” Population
“True” unique values
13. Interlude: Frequentist Inference
“True” Population
=D [7, 3,
2]
Sample: N=3
Sampling Distribution
e.g. F distribution
Test
statistic
Inter-group var./
Intra-group var.
∞
H0
mean
Central Limit
Theorem
“Long range” probability
● Sampling distribution applet: http://onlinestatbook.com/stat_sim/sampling_dist/index.html
14. Interlude: Frequentist Inference
● Sampling distribution applet: http://onlinestatbook.com/stat_sim/sampling_dist/index.html
“When a frequentist says that the probability for
"heads" in a coin toss is 0.5 (50%) she means that in
infinitively many such coin tosses, 50% of the coins
will show "head"”.
15. Bayesian Inference
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
16. Bayesian Inference
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
17. Bayesian Inference
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
Discrete Values:
Just sum it up!
:)
Cont. Values:
Integration over
complete parameter
space...
:(
18. Bayesian Inference:
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
Averaging over the complete
parameter space via integration is
impractical!
Solution: We sample from the
conjugate probability distribution with
smart MCMC algorithms!
(Subject of another talk)
19. Bayesian Inference
● John Kruschke: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan Chapter 5
● Rasmus Bååth: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
● Richard McElreath: Statistical Rethinking book and lectures (https://www.youtube.com/watch?v=4WVelCswXo4)
Lets compute this and sample from it!
20. Y = a * X + b
Y ~ N(a * X + b, σ2
)
Quantify all model parts with uncertainty
p(D, a, b, σ2
) = p(D | a, b, σ2
)*p(a)*p(b)*p(σ2
)
a ~ N(1, 0.1) b ~ N(4, 0.5) σ2
~ G(1, 0.1)
p(a) p(b) p(σ2
)
p(D | a, b, σ2
)
p(D | θ)
21. From model to code
Y = a * X + b
a ~ N(1, 0.1)
b ~ N(4, 0.5)
σ2
~ G(1, 0.1)
Y ~ N(a * X + b, σ2
)
●
More examples: https://mc-stan.org/users/documentation/case-studies
23. Example: Deep Bayesian Neural Nets
● https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/
●
https://twiecki.io/blog/2018/08/13/hierarchical_bayesian_neural_network/
24. Example: Bayesian Inference and Volatility Modeling Using Stan
https://luisdamiano.github.io/personal/volatility_stan2018.pdf
Credit: Michael Weylandt, Luis Damiano
25. Example: Bayesian Inference and Volatility Modeling Using Stan
https://luisdamiano.github.io/personal/volatility_stan2018.pdf
Credit: Michael Weylandt, Luis Damiano
29. All sources in one place!
About Generative vs. Discriminative models:
Ng, A. Y. and Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison
of logistic regression and naive bayes. In Advances in neural information processing systems,
pages 841–848.
Rasmus Bååth:
Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?:
https://www.youtube.com/watch?time_continue=366&v=3OJEae7Qb_o
When to use ML vs. Statistical Modelling:
Frank Harrell's Blog:
http://www.fharrell.com/post/stat-ml/
http://www.fharrell.com/post/stat-ml2/
Frequentist approach: How do sampling distributions work (applet):
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Bayesian inference and computation:
John Kruschke:
Doing Bayesian Data Analysis:
A Tutorial with R, JAGS, and Stan Chapter 5
Rasmus Bååth:
http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-two/
Richard McElreath:
Statistical Rethinking book and lectures
(https://www.youtube.com/watch?v=4WVelCswXo4)
Many model examples in Stan:
https://mc-stan.org/users/documentation/case-studies
About Bayesian Neural Networks:
https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/
https://twiecki.io/blog/2018/08/13/hierarchical_bayesian_neural_network/
Volatility Examples:
Hidden Markov Models:
https://github.com/luisdamiano/rfinance17
Volatility Garch Model and Bayesian Workflow:
https://luisdamiano.github.io/personal/volatility_stan2018.pdf
Dictionary: Stats ↔ ML
https://ubc-mds.github.io/resources_pages/terminology/
The Bayesian Workflow:
https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html
Algorithm explanation applet for MCMC exploration of the parameter space:
http://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/
Probabilistic Programming Conference Talks:
https://www.youtube.com/watch?v=crvNIGyqGSU
30. Who to follow on Twitter?
● Chris Fonnesbeck @fonnesbeck (pyMC3)
● Thomas Wiecki @twiecki (pyMC3)
Blog: https://twiecki.io/ (nice intros)
● Bayes Dose @BayesDose (general info and papers)
● Richard McElreath @rlmcelreath (ecology, Bayesian statistics expert)
All his lectures: https://www.youtube.com/channel/UCNJK6_DZvcMqNSzQdEkzvzA
● Michael Betancourt @betanalpha (Stan)
Blog: https://betanalpha.github.io/writing/
Specifically: https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html
● Rasmus Bååth @rabaath
Great video series: http://www.sumsar.net/blog/2017/02/introduction-to-bayesian-data-analysis-part-one/
● Frank Harrell @f2harrell (statistics sage)
Great Blog: http://www.fharrell.com/
● Andrew Gelman @StatModeling (statistics sage)
https://statmodeling.stat.columbia.edu/
● Judea Pearl @yudapearl
Book of Why: http://bayes.cs.ucla.edu/WHY/ (more about causality, BN and DAG)
● AND MANY MORE!
31. Dictionary: Stats ↔ ML
Check: https://ubc-mds.github.io/resources_pages/terminology/ for more terminology
Statistics
Estimation/Fitting
Hypothesis
Data Point
Regression
Classification
Covariates
Parameters
Response
Factor
Likelihood
Machine learning / AI
~ Learning
~ Classification rule
~ Example/ Instance
~ Supervised Learning
~ Supervised Learning
~ Features
~ Features
~ Label
~ Factor (categorical variables)
~ Cost Function (sometimes)
32. Data Science + AI + ML + Stats
Credit: Zoubin Ghahramani, CTO UBER. Talk: "Probabilistic Machine Learning: From theory to industrial impact"