SlideShare a Scribd company logo
1 of 109
Download to read offline
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models




Generalized linear models



            Generalized linear models
       3
              Generalisation of linear models
              Metropolis–Hastings algorithms
              The Probit Model
              The logit model
              Loglinear models




                                                                          313 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Generalisation of Linear Models



       Linear models model connection between a response variable y and
       a set x of explanatory variables by a linear dependence relation
       with [approximately] normal perturbations.




                                                                          314 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Generalisation of Linear Models



       Linear models model connection between a response variable y and
       a set x of explanatory variables by a linear dependence relation
       with [approximately] normal perturbations.

       Many instances where either of these assumptions not appropriate,
       e.g. when the support of y restricted to R+ or to N.




                                                                          315 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



bank

       Four measurements on 100 genuine Swiss banknotes and 100
       counterfeit ones:
               x1 length of the bill (in mm),
               x2 width of the left edge (in mm),
               x3 width of the right edge (in mm),
               x4 bottom margin width (in mm).
       Response variable y: status of the banknote [0 for genuine and 1
       for counterfeit]




                                                                          316 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



bank

       Four measurements on 100 genuine Swiss banknotes and 100
       counterfeit ones:
               x1 length of the bill (in mm),
               x2 width of the left edge (in mm),
               x3 width of the right edge (in mm),
               x4 bottom margin width (in mm).
       Response variable y: status of the banknote [0 for genuine and 1
       for counterfeit]

       Probabilistic model that predicts counterfeiting based on the four
       measurements


                                                                            317 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



The impossible linear model
       Example of the influence of x4 on y
       Since y is binary,
                              y|x4 ∼ B(p(x4 )) ,
        c Normal model is impossible




                                                                          318 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



The impossible linear model
       Example of the influence of x4 on y
       Since y is binary,
                              y|x4 ∼ B(p(x4 )) ,
        c Normal model is impossible
       Linear dependence in p(x) = E[y|x]’s

                                             p(x4i ) = β0 + β1 x4i ,




                                                                          319 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



The impossible linear model
       Example of the influence of x4 on y
       Since y is binary,
                              y|x4 ∼ B(p(x4 )) ,
        c Normal model is impossible
       Linear dependence in p(x) = E[y|x]’s

                                             p(x4i ) = β0 + β1 x4i ,

       estimated [by MLE] as

                                           pi = −2.02 + 0.268 xi4
                                           ˆ

       which gives pi = .12 for xi4 = 8 and ...
                   ˆ


                                                                          320 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



The impossible linear model
       Example of the influence of x4 on y
       Since y is binary,
                              y|x4 ∼ B(p(x4 )) ,
        c Normal model is impossible
       Linear dependence in p(x) = E[y|x]’s

                                             p(x4i ) = β0 + β1 x4i ,

       estimated [by MLE] as

                                           pi = −2.02 + 0.268 xi4
                                           ˆ

       which gives pi = .12 for xi4 = 8 and ... pi = 1.19 for xi4 = 12!!!
                   ˆ                            ˆ
       c Linear dependence is impossible
                                                                            321 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Generalisation of the linear dependence



       Broader class of models to cover various dependence structures.




                                                                          322 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Generalisation of the linear dependence



       Broader class of models to cover various dependence structures.

       Class of generalised linear models (GLM) where

                                              y|x, β ∼ f (y|xT β) .

       i.e., dependence of y on x partly linear




                                                                          323 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Notations

       Same as in linear regression chapter, with n–sample

                                                 y = (y1 , . . . , yn )

       and corresponding explanatory                      variables/covariates
                                                                          
                                x11                        x12 . . . x1k
                              x21                         x22 . . . x2k 
                                                                          
                                                          x32 . . . x3k 
                        X =  x31                                          
                             .                                        .
                                                            .     .    .
                             .                             .     .    .
                                 .                          .     .
                                                  xn1      xn2 . . . xnk



                                                                                 324 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Specifications of GLM’s

       Definition (GLM)
       A GLM is a conditional model specified by two functions:
               the density f of y given x parameterised by its expectation
          1

               parameter µ = µ(x) [and possibly its dispersion parameter
               ϕ = ϕ(x)]




                                                                             325 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Specifications of GLM’s

       Definition (GLM)
       A GLM is a conditional model specified by two functions:
               the density f of y given x parameterised by its expectation
          1

               parameter µ = µ(x) [and possibly its dispersion parameter
               ϕ = ϕ(x)]
               the link g between the mean µ and the explanatory variables,
          2

               written customarily as g(µ) = xT β or, equivalently,
               E[y|x, β] = g −1 (xT β).




                                                                              326 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Specifications of GLM’s

       Definition (GLM)
       A GLM is a conditional model specified by two functions:
               the density f of y given x parameterised by its expectation
          1

               parameter µ = µ(x) [and possibly its dispersion parameter
               ϕ = ϕ(x)]
               the link g between the mean µ and the explanatory variables,
          2

               written customarily as g(µ) = xT β or, equivalently,
               E[y|x, β] = g −1 (xT β).


       For identifiability reasons, g needs to be bijective.


                                                                              327 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Likelihood



       Obvious representation of the likelihood
                                                              n
                                                                  f yi |xiT β, ϕ
                                       ℓ(β, ϕ|y, X) =
                                                            i=1

       with parameters β ∈ Rk and ϕ > 0.




                                                                                   328 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Examples




               Ordinary linear regression
               Case of GLM where

                      g(x) = x, ϕ = σ 2 ,                and y|X, β, σ 2 ∼ Nn (Xβ, σ 2 ).




                                                                                            329 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Examples (2)
       Case of binary and binomial data, when

                                             yi |xi ∼ B(ni , p(xi ))

       with known ni
               Logit [or logistic regression] model
               Link is logit transform on probability of success

                                             g(pi ) = log(pi /(1 − pi )) ,
               with likelihood
                               Y „ ni « „ exp(xiT β) «yi „                 «ni −yi
                                n
                                                                  1
                                    yi    1 + exp(xiT β)    1 + exp(xiT β)
                               i=1
                                         (n           )ffi n
                                                         Y“              ”ni −yi
                                          X
                                             yi xiT β      1 + exp(xiT β)
                                   ∝ exp
                                                i=1                   i=1

                                                                                     330 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Canonical link
       Special link function g that appears in the natural exponential
       family representation of the density

                       g ⋆ (µ) = θ         if f (y|µ) ∝ exp{T (y) · θ − Ψ(θ)}




                                                                                331 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Canonical link
       Special link function g that appears in the natural exponential
       family representation of the density

                       g ⋆ (µ) = θ         if f (y|µ) ∝ exp{T (y) · θ − Ψ(θ)}


       Example
       Logit link is canonical for the binomial model, since

                                   ni                              pi
             f (yi |pi ) =                exp yi log                      + ni log(1 − pi )   ,
                                   yi                            1 − pi

       and thus
                                              θi = log pi /(1 − pi )

                                                                                                  332 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Examples (3)

       Customary to use the canonical link, but only customary ...
               Probit model
               Probit link function given by

                                                      g(µi ) = Φ−1 (µi )

               where Φ standard normal cdf
               Likelihood
                                                  n
                                                       Φ(xiT β)yi (1 − Φ(xiT β))ni −yi .
                               ℓ(β|y, X) ∝
                                                i=1

          Full processing




                                                                                           333 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Log-linear models
       Standard approach to describe associations between several
       categorical variables, i.e, variables with finite support
       Sufficient statistic: contingency table, made of the cross-classified
       counts for the different categorical variables. Full entry to loglinear models




                                                                                       334 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Log-linear models
       Standard approach to describe associations between several
       categorical variables, i.e, variables with finite support
       Sufficient statistic: contingency table, made of the cross-classified
       counts for the different categorical variables. Full entry to loglinear models
       Example (Titanic survivors)
                                                    Child                 Adult
                      Survivor         Class        Male         Female   Male    Female
                                        1st           0             0      118       4
                                        2nd           0             0      154      13
                           No           3rd          35            17      387      89
                                        Crew          0             0      670       3
                                        1st           5             1      57       140
                                        2nd          11            13      14       80
                          Yes           3rd          13            14      75       76
                                        Crew          0             0      192      20

                                                                                           335 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Poisson regression model


               Each count yi is Poisson with mean µi = µ(xi )
          1


               Link function connecting R+ with R, e.g. logarithm
          2

               g(µi ) = log(µi ).




                                                                          336 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Generalisation of linear models



Poisson regression model


               Each count yi is Poisson with mean µi = µ(xi )
          1


               Link function connecting R+ with R, e.g. logarithm
          2

               g(µi ) = log(µi ).

       Corresponding likelihood
                                           n
                                                   1
                                                          exp yi xiT β − exp(xiT β) .
                       ℓ(β|y, X) =
                                                  yi !
                                          i=1




                                                                                        337 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Metropolis–Hastings algorithms


                                                                          Convergence assessment



       Posterior inference in GLMs harder than for linear models




                                                                                                   338 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Metropolis–Hastings algorithms


                                                                          Convergence assessment



       Posterior inference in GLMs harder than for linear models

        c Working with a GLM requires specific numerical or simulation
       tools [E.g., GLIM in classical analyses]




                                                                                                   339 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Metropolis–Hastings algorithms


                                                                          Convergence assessment



       Posterior inference in GLMs harder than for linear models

        c Working with a GLM requires specific numerical or simulation
       tools [E.g., GLIM in classical analyses]

       Opportunity to introduce universal MCMC method:
       Metropolis–Hastings algorithm




                                                                                                   340 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Generic MCMC sampler


               Metropolis–Hastings algorithms are generic/down-the-shelf
               MCMC algorithms
               Only require likelihood up to a constant [difference with Gibbs
               sampler]
               can be tuned with a wide range of possibilities [difference with
               Gibbs sampler & blocking]
               natural extensions of standard simulation algorithms: based
               on the choice of a proposal distribution [difference in Markov
               proposal q(x, y) and acceptance]




                                                                               341 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Why Metropolis?

       Originally introduced by Metropolis, Rosenbluth, Rosenbluth, Teller
       and Teller in a setup of optimization on a discrete state-space. All
       authors involved in Los Alamos during and after WWII:




                                                                          342 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Why Metropolis?

       Originally introduced by Metropolis, Rosenbluth, Rosenbluth, Teller
       and Teller in a setup of optimization on a discrete state-space. All
       authors involved in Los Alamos during and after WWII:
               Physicist and mathematician, Nicholas Metropolis is considered
               (with Stanislaw Ulam) to be the father of Monte Carlo methods.
               Also a physicist, Marshall Rosenbluth worked on the development of
               the hydrogen (H) bomb
               Edward Teller was one of the first scientists to work on the
               Manhattan Project that led to the production of the A bomb. Also
               managed to design with Ulam the H bomb.




                                                                                  343 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Generic Metropolis–Hastings sampler

       For target π and proposal kernel q(x, y)

               Initialization: Choose an arbitrary x(0)




                                                                          344 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Generic Metropolis–Hastings sampler

       For target π and proposal kernel q(x, y)

               Initialization: Choose an arbitrary x(0)
               Iteration t:
                  1 Given x(t−1) , generate x ∼ q(x(t−1) , x)
                                            ˜
                  2 Calculate


                                                                    π(˜)/q(x(t−1) , x)
                                                                      x              ˜
                                  ρ(x(t−1) , x) = min
                                             ˜                                             ,1
                                                                     (t−1) )/q(˜, x(t−1) )
                                                                  π(x          x

                       With probability ρ(x(t−1) , x) accept x and set x(t) = x;
                                                   ˜         ˜                ˜
                   3

                       otherwise reject x and set x(t) = x(t−1) .
                                        ˜




                                                                                                345 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Universality


       Algorithm only needs to simulate from

                                                             q

       which can be chosen [almost!] arbitrarily, i.e. unrelated with π [q
       also called instrumental distribution]

       Note: π and q known up to proportionality terms ok since
       proportionality constants cancel in ρ.




                                                                             346 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Validation

       Markov chain theory
       Target π is stationary distribution of Markov chain (x(t) )t because
       probability ρ(x, y) satisfies detailed balance equation

                               π(x)q(x, y)ρ(x, y) = π(y)q(y, x)ρ(y, x)

       [Integrate out x to see that π is stationary]




                                                                              347 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Validation

       Markov chain theory
       Target π is stationary distribution of Markov chain (x(t) )t because
       probability ρ(x, y) satisfies detailed balance equation

                               π(x)q(x, y)ρ(x, y) = π(y)q(y, x)ρ(y, x)

       [Integrate out x to see that π is stationary]


       For convergence/ergodicity, Markov chain must be irreducible: q
       has positive probability of reaching all areas with positive π
       probability in a finite number of steps.


                                                                              348 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of proposal

       Theoretical guarantees of convergence very high, but choice of q is
       crucial in practice.




                                                                          349 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of proposal

       Theoretical guarantees of convergence very high, but choice of q is
       crucial in practice. Poor choice of q may result in
               very high rejection rates, with very few moves of the Markov
               chain (x(t) )t hardly moves, or in




                                                                              350 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of proposal

       Theoretical guarantees of convergence very high, but choice of q is
       crucial in practice. Poor choice of q may result in
               very high rejection rates, with very few moves of the Markov
               chain (x(t) )t hardly moves, or in
               a myopic exploration of the support of π, that is, in a
               dependence on the starting value x(0) , with the chain stuck in
               a neighbourhood mode to x(0) .




                                                                              351 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of proposal

       Theoretical guarantees of convergence very high, but choice of q is
       crucial in practice. Poor choice of q may result in
               very high rejection rates, with very few moves of the Markov
               chain (x(t) )t hardly moves, or in
               a myopic exploration of the support of π, that is, in a
               dependence on the starting value x(0) , with the chain stuck in
               a neighbourhood mode to x(0) .

       Note: hybrid MCMC
       Simultaneous use of different kernels valid and recommended



                                                                              352 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



The independence sampler
       Pick proposal q that is independent of its first argument,

                                                  q(x, y) = q(y)

       ρ simplifies into

                                                                  π(y)/q(y)
                                   ρ(x, y) = min 1,                           .
                                                                  π(x)/q(x)


       Special case: q ∝ π
       Reduces to ρ(x, y) = 1 and iid sampling




                                                                                  353 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



The independence sampler
       Pick proposal q that is independent of its first argument,

                                                  q(x, y) = q(y)

       ρ simplifies into

                                                                  π(y)/q(y)
                                   ρ(x, y) = min 1,                           .
                                                                  π(x)/q(x)


       Special case: q ∝ π
       Reduces to ρ(x, y) = 1 and iid sampling
       Analogy with Accept-Reject algorithm where max π/q replaced
       with the current value π(x(t−1) )/q(x(t−1) ) but sequence of
       accepted x(t) ’s not i.i.d.
                                                                                  354 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of q


       Convergence properties highly dependent on q.
               q needs to be positive everywhere on the support of π
               for a good exploration of this support, π/q needs to be
               bounded.




                                                                          355 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of q


       Convergence properties highly dependent on q.
               q needs to be positive everywhere on the support of π
               for a good exploration of this support, π/q needs to be
               bounded.

       Otherwise, the chain takes too long to reach regions with low q/π
       values.




                                                                           356 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



The random walk sampler



       Independence sampler requires too much global information about
       π: opt for a local gathering of information




                                                                          357 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



The random walk sampler



       Independence sampler requires too much global information about
       π: opt for a local gathering of information

       Means exploration of the neighbourhood of the current value x(t)
       in search of other points of interest.




                                                                          358 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



The random walk sampler



       Independence sampler requires too much global information about
       π: opt for a local gathering of information

       Means exploration of the neighbourhood of the current value x(t)
       in search of other points of interest.

       Simplest exploration device is based on random walk dynamics.




                                                                          359 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Random walks

       Proposal is a symmetric transition density

                                q(x, y) = qRW (y − x) = qRW (x − y)




                                                                          360 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Random walks

       Proposal is a symmetric transition density

                                q(x, y) = qRW (y − x) = qRW (x − y)


       Acceptance probability ρ(x, y) reduces to the simpler form

                                                                          π(y)
                                        ρ(x, y) = min 1,                         .
                                                                          π(x)

       Only depends on the target π [accepts all proposed values that
       increase π]



                                                                                     361 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of qRW


       Considerable flexibility in the choice of qRW ,
               tails: Normal versus Student’s t
               scale: size of the neighbourhood




                                                                          362 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of qRW


       Considerable flexibility in the choice of qRW ,
               tails: Normal versus Student’s t
               scale: size of the neighbourhood

       Can also be used for restricted support targets [with a waste of
       simulations near the boundary]




                                                                          363 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Choice of qRW


       Considerable flexibility in the choice of qRW ,
               tails: Normal versus Student’s t
               scale: size of the neighbourhood

       Can also be used for restricted support targets [with a waste of
       simulations near the boundary]

       Can be tuned towards an acceptance probability of 0.234 at the
       burnin stage [Magic number!]




                                                                          364 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Convergence assessment
       Capital question: How many iterations do we need to run???
          MCMC debuts




                                                                          365 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Convergence assessment
       Capital question: How many iterations do we need to run???
          MCMC debuts


               Rule # 1 There is no absolute number of simulations, i.e.
               1, 000 is neither large, nor small.
               Rule # 2 It takes [much] longer to check for convergence
               than for the chain itself to converge.
               Rule # 3 MCMC is a “what-you-get-is-what-you-see”
               algorithm: it fails to tell about unexplored parts of the space.
               Rule # 4 When in doubt, run MCMC chains in parallel and
               check for consistency.




                                                                                  366 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Convergence assessment
       Capital question: How many iterations do we need to run???
          MCMC debuts


               Rule # 1 There is no absolute number of simulations, i.e.
               1, 000 is neither large, nor small.
               Rule # 2 It takes [much] longer to check for convergence
               than for the chain itself to converge.
               Rule # 3 MCMC is a “what-you-get-is-what-you-see”
               algorithm: it fails to tell about unexplored parts of the space.
               Rule # 4 When in doubt, run MCMC chains in parallel and
               check for consistency.


       Many “quick-&-dirty” solutions in the literature, but not
       necessarily trustworthy.
                                                                                  367 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Prohibited dynamic updating


               Tuning the proposal in terms of its past performances can
               only be implemented at burnin, because otherwise this cancels
               Markovian convergence properties.




                                                                           368 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Prohibited dynamic updating


               Tuning the proposal in terms of its past performances can
               only be implemented at burnin, because otherwise this cancels
               Markovian convergence properties.


       Use of several MCMC proposals together within a single algorithm
       using circular or random design is ok. It almost always brings an
       improvement compared with its individual components (at the cost
       of increased simulation time)




                                                                           369 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Effective sample size
       How many iid simulations from π are equivalent to N
       simulations from the MCMC algorithm?




                                                                          370 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Metropolis–Hastings algorithms



Effective sample size
       How many iid simulations from π are equivalent to N
       simulations from the MCMC algorithm?

       Based on estimated k-th order auto-correlation,

                                          ρk = corr x(t) , x(t+k) ,

       effective sample size
                                                                               −1/2
                                                                  T0
                                          ess
                                                                          ρ2
                                      N         =n      1+2               ˆk          ,
                                                                 k=1



               Only partial indicator that fails to signal chains stuck in one
               mode of the target
                                                                                          371 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



The Probit Model

       Likelihood          Recall Probit


                                              n
                                                  Φ(xiT β)yi (1 − Φ(xiT β))ni −yi .
                        ℓ(β|y, X) ∝
                                            i=1




                                                                                      372 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



The Probit Model

       Likelihood          Recall Probit


                                              n
                                                  Φ(xiT β)yi (1 − Φ(xiT β))ni −yi .
                        ℓ(β|y, X) ∝
                                            i=1

       If no prior information available, resort to the flat prior π(β) ∝ 1
       and then obtain the posterior distribution
                                             n
                                                                yi                   ni −yi
                                                  Φ xiT β             1 − Φ(xiT β)
                      π(β|y, X) ∝                                                             ,
                                           i=1

       nonstandard and simulated using MCMC techniques.


                                                                                                  373 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



MCMC resolution




       Metropolis–Hastings random walk sampler works well for binary
       regression problems with small number of predictors

                                              ˆ
       Uses the maximum likelihood estimate β as starting value and
                                                         ˆ
       asymptotic (Fisher) covariance matrix of the MLE, Σ, as scale




                                                                          374 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



MLE proposal



       R function glm very useful to get the maximum likelihood estimate
                                                 ˆ
       of β and its asymptotic covariance matrix Σ.

       Terminology used in R program
       mod=summary(glm(y~X-1,family=binomial(link=quot;probitquot;)))
                                  ˆ                      ˆ
       with mod$coeff[,1] denoting β and mod$cov.unscaled Σ.




                                                                          375 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



MCMC algorithm

       Probit random-walk Metropolis-Hastings
                                           ˆ             ˆ
               Initialization: Set β (0) = β and compute Σ
               Iteration t:
                                 ˜                     ˆ
                        Generate β ∼ Nk+1 (β (t−1) , τ Σ)
                   1

                        Compute
                   2


                                                                                ˜
                                                                            π(β|y)
                                                      ˜
                                          ρ(β (t−1) , β) = min 1,
                                                                          π(β (t−1) |y)

                                                     ˜              ˜
                        With probability ρ(β (t−1) , β) set β (t) = β;
                   3
                                       (t)     (t−1)
                        otherwise set β = β          .



                                                                                          376 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



bank
     Probit modelling with




                                                                                                                           0.8
                                                      −1.0




                                                                                   1.0




                                                                                                                           0.4
     no intercept over the




                                                      −2.0




                                                                                                                           0.0
                                                                                   0.0
                                                                 0   4000   8000         −2.0 −1.5 −1.0 −0.5                             0 200   600   1000

     four measurements.
     Three different scales



                                                      3




                                                                                                                           0.0 0.4 0.8
                                                      2




                                                                                   0.4
                                                      1
     τ = 1, 0.1, 10: best

                                                      −1




                                                                                   0.0
     mixing behavior is
                                                                 0   4000   8000         −1     0     1         2     3                  0 200   600   1000




     associated with τ = 1.                           2.5




                                                                                   0.8




                                                                                                                           0.0 0.4 0.8
                                                      −0.5 1.0




                                                                                   0.4
     Average of the



                                                                                   0.0
     parameters over 9, 000                                      0   4000   8000         −0.5   0.5       1.5       2.5                  0 200   600   1000




     iterations gives plug-in
                                                      1.8




                                                                                                                           0.0 0.4 0.8
                                                                                   2.0
     estimate
                                                      1.2




                                                                                   1.0
                                                                                   0.0
                                                      0.6




                                                                 0   4000   8000         0.6    1.0       1.4        1.8                 0 200   600   1000




             pi = Φ (−1.2193xi1 + 0.9540xi2 + 0.9795xi3 + 1.1481xi4 ) .
             ˆ
                                                                                                                                                              377 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



G-priors for probit models

       Flat prior on β inappropriate for comparison purposes and Bayes
       factors.




                                                                          378 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



G-priors for probit models

       Flat prior on β inappropriate for comparison purposes and Bayes
       factors.
       Replace the flat prior with a hierarchical prior,

            β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1                             and π(σ 2 |X) ∝ σ −3/2 ,

       (almost) as in normal linear regression




                                                                                                     379 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



G-priors for probit models

       Flat prior on β inappropriate for comparison purposes and Bayes
       factors.
       Replace the flat prior with a hierarchical prior,

            β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1                             and π(σ 2 |X) ∝ σ −3/2 ,

       (almost) as in normal linear regression


       Warning
       The matrix X T X is not the Fisher information matrix



                                                                                                     380 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



G-priors for testing

       Same argument as before: while π is improper, use of the same
       variance factor σ 2 in both models means the normalising constant
       cancels in the Bayes factor.




                                                                           381 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



G-priors for testing

       Same argument as before: while π is improper, use of the same
       variance factor σ 2 in both models means the normalising constant
       cancels in the Bayes factor.

       Posterior distribution of β
                                                           “             ”−(2k−1)/4
                                   |X T X|1/2 Γ((2k − 1)/4) β T (X T X)β            π −k/2
           π(β|y, X)           ∝
                                                       h            i1−yi
                                       Y
                                       n
                                             Φ(xiT β)yi 1 − Φ(xiT β)
                                   ×
                                       i=1


       [where k matters!]



                                                                                             382 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Marginal approximation
       Marginal
                                                                          Z“             ”−(2k−1)/4
                               |X T X|1/2 π −k/2 Γ{(2k − 1)/4}              β T (X T X)β
           f (y|X)       ∝
                                                      h             i1−yi
                                      Y
                                      n
                                            Φ(xiT β)yi 1 − (Φ(xiT β)
                                  ×                                       dβ,
                                      i=1

       approximated by
                    M ˛˛      ˛˛−(2k−1)/2 Y                  h                 i1−yi
         |X T X|1/2 X ˛˛
                                           n
                          (m) ˛˛
                                              Φ(xiT β (m) )yi 1 − Φ(xiT β (m) )
                      ˛˛Xβ ˛˛
          π k/2 M m=1                     i=1
                                                                                     bb               b
                                                                              (m)
                                                                                    −β)T V −1 (β (m) −β)/4
                                                b
                               × Γ{(2k − 1)/4} |V |1/2 (4π)k/2 e+(β                                          ,

       where
                                              β (m) ∼ Nk (β, 2 V )
       with β MCMC approximation of Eπ [β|y, X] and V MCMC
       approximation of V(β|y, X).
                                                                                                                 383 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Linear hypothesis

       Linear restriction on β
                                                    H0 : Rβ = r
       (r ∈ Rq , R q × k matrix) where β 0 is (k − q) dimensional and X0
       and x0 are linear transforms of X and of x of dimensions
       (n, k − q) and (k − q).

       Likelihood
                                             n
                                                                                  1−yi
                          0
                                                  Φ(xiT β 0 )yi 1 − Φ(xiT β 0 )
                    ℓ(β |y, X0 ) ∝                                                       ,
                                                     0                 0
                                            i=1




                                                                                             384 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Linear test

       Associated [projected] G-prior

       β 0 |σ 2 , X0 ∼ Nk−q 0k−q , σ 2 (X0 X0 )−1
                                         T
                                                                          and π(σ 2 |X0 ) ∝ σ −3/2 ,




                                                                                                 385 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Linear test

       Associated [projected] G-prior

       β 0 |σ 2 , X0 ∼ Nk−q 0k−q , σ 2 (X0 X0 )−1
                                         T
                                                                          and π(σ 2 |X0 ) ∝ σ −3/2 ,


       Marginal distribution of y of the same type
                                                                      ffZ
                                                                             ˛˛     ˛˛
                                                      (2(k − q) − 1)         ˛˛Xβ 0 ˛˛−(2(k−q)−1)/2
                               |X0 X0 |1/2 π −(k−q)/2 Γ
                                 T
       f (y|X0 )       ∝
                                                             4
                                                h                i1−yi
                                 Y
                                 n
                                   Φ(xiT β 0 )yi 1 − (Φ(xiT β 0 )      dβ 0 .
                                      0                  0
                                 i=1




                                                                                                 386 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



banknote
                              π
       For H0 : β1 = β2 = 0, B10 = 157.73 [against H0 ]

       Generic regression-like output:
                               Estimate        Post. var.            log10(BF)

       X1                      -1.1552          0.0631                     4.5844 (****)
       X2                       0.9200          0.3299                    -0.2875
       X3                       0.9121          0.2595                    -0.0972
       X4                       1.0820          0.0287                    15.6765 (****)

       evidence against H0: (****) decisive, (***) strong,
       (**) subtantial, (*) poor



                                                                                           387 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Informative settings


       If prior information available on p(x), transform into prior
       distribution on β by technique of imaginary observations:




                                                                          388 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Informative settings


       If prior information available on p(x), transform into prior
       distribution on β by technique of imaginary observations:

       Start with k different values of the covariate vector, x1 , . . . , xk
                                                             ˜            ˜
       For each of these values, the practitioner specifies
         (i) a prior guess gi at the probability pi associated with xi ;
        (ii) an assessment of (un)certainty about that guess given by a
             number Ki of equivalent “prior observations”.




                                                                               389 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Informative settings


       If prior information available on p(x), transform into prior
       distribution on β by technique of imaginary observations:

       Start with k different values of the covariate vector, x1 , . . . , xk
                                                             ˜            ˜
       For each of these values, the practitioner specifies
         (i) a prior guess gi at the probability pi associated with xi ;
        (ii) an assessment of (un)certainty about that guess given by a
             number Ki of equivalent “prior observations”.
     On how many imaginary observations did you build this guess?




                                                                               390 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Informative prior
                                                     k
                                                          pKi gi −1 (1 − pi )Ki (1−gi )−1
                         π(p1 , . . . , pk ) ∝             i
                                                    i=1

       translates into [Jacobian rule]
                               k
                                                                           Ki (1−gi )−1
                                   Φ(˜ iT β)Ki gi −1 1 − Φ(˜ iT β)                        φ(˜ iT β)
            π(β) ∝                   x                     x                                x
                          i=1




                                                                                                      391 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The Probit Model



Informative prior
                                                     k
                                                          pKi gi −1 (1 − pi )Ki (1−gi )−1
                         π(p1 , . . . , pk ) ∝             i
                                                    i=1

       translates into [Jacobian rule]
                               k
                                                                               Ki (1−gi )−1
                                   Φ(˜ iT β)Ki gi −1 1 − Φ(˜ iT β)                            φ(˜ iT β)
            π(β) ∝                   x                     x                                    x
                          i=1



       [Almost] equivalent to using the G-prior
                                                    0                        #−1 1
                                                            quot;
                                                                X
                                                                k
                                        β ∼ Nk @0k ,                             A
                                                                      xj xjT
                                                                      ˜˜
                                                                j=1



                                                                                                          392 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model



The logit model

       Recall that [for ni = 1]

                                                                                     exp(µi )
             yi |µi ∼ B(1, µi ),              ϕ = 1 and g(µi ) =                                   .
                                                                                   1 + exp(µi )

       Thus
                                                                exp(xiT β)
                                      P(yi = 1|β) =
                                                              1 + exp(xiT β)
       with likelihood
                                                                   yi                           1−yi
                                  n
                                           exp(xiT β)                            exp(xiT β)
              ℓ(β|y, X) =                                                 1−
                                         1 + exp(xiT β)                        1 + exp(xiT β)
                                 i=1




                                                                                                       393 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model



Links with probit


               usual vague prior for β, π(β) ∝ 1




                                                                          394 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model



Links with probit


               usual vague prior for β, π(β) ∝ 1
               Posterior given by
                                      n                                   yi                       1−yi
                                                exp(xiT β)                          exp(xiT β)
               π(β|y, X) ∝                                                     1−
                                              1 + exp(xiT β)                      1 + exp(xiT β)
                                     i=1

               [intractable]




                                                                                                    395 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model



Links with probit


               usual vague prior for β, π(β) ∝ 1
               Posterior given by
                                      n                                   yi                       1−yi
                                                exp(xiT β)                          exp(xiT β)
               π(β|y, X) ∝                                                     1−
                                              1 + exp(xiT β)                      1 + exp(xiT β)
                                     i=1

               [intractable]
               Same Metropolis–Hastings sampler




                                                                                                    396 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model



bank




                                                       −1




                                                                               0.6




                                                                                                                  0.0 0.4 0.8
                                                       −3
   Same scale factor equal to




                                                                               0.3
                                                       −5




                                                                               0.0
   τ = 1: slight increase in
                                                             0   4000   8000         −5     −4   −3    −2   −1                  0   200   600   1000




   the skewness of the



                                                       246




                                                                                                                  0.0 0.4 0.8
                                                                               0.2
   histograms of the βi ’s.




                                                                               0.0
                                                       −2
                                                             0   4000   8000         −2     0    2     4     6                  0   200   600   1000




                                                                               0.4
                                                       246




                                                                                                                  0.0 0.4 0.8
   Plug-in estimate of




                                                                               0.2
                                                                               0.0
                                                       −2
   predictive probability of a                               0   4000   8000         −2     0    2     4     6                  0   200   600   1000




   counterfeit
                                                       3.5




                                                                                                                  0.0 0.4 0.8
                                                       2.5




                                                                               0.6
                                                       1.5




                                                                               0.0
                                                             0   4000   8000          1.5        2.5        3.5                 0   200   600   1000




                       exp (−2.5888xi1 + 1.9967xi2 + 2.1260xi3 + 2.1879xi4 )
            pi =
            ˆ                                                                  .
                     1 + exp (−2.5888xi1 + 1.9967xi2 + 2.1260xi3 + 2.1879xi4 )


                                                                                                                                                       397 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     The logit model



G-priors for logit models
       Same story: Flat prior on β inappropriate for Bayes factors, to be
       replaced with hierarchical prior,

                 β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1                        and π(σ 2 |X) ∝ σ −3/2


       Example (bank)
                               Estimate        Post. var.            log10(BF)

       X1                      -2.3970          0.3286                     4.8084 (****)
       X2                       1.6978          1.2220                    -0.2453
       X3                       2.1197          1.0094                    -0.1529
       X4                       2.0230          0.1132                    15.9530 (****)

       evidence against H0: (****) decisive, (***) strong,
       (**) subtantial, (*) poor

                                                                                                   398 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Loglinear models
          Introduction to loglinear models


       Example (airquality)
       Benchmark in R
             > air=data(airquality)
       Repeated measurements over 111 consecutive days of ozone u (in
       parts per billion) and maximum daily temperature v discretized
       into dichotomous variables
                    month 5                  6   7    8     9
         ozone    temp
         [1,31]   [57,79] 17                 4 2 5 18
                  (79,97] 0                  2332
         (31,168] [57,79] 6                  1031
                  (79,97] 1                  2 21 12 8

       Contingency table with 5 × 2 × 2 = 20 entries
                                                                          399 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Poisson regression

       Observations/counts y = (y1 , . . . , yn ) are integers, so we can
       choose
                                 yi ∼ P(µi )
       Saturated likelihood
                                                       n
                                                             1 yi
                                      ℓ(µ|y) =                  µ exp(−µi )
                                                            µi ! i
                                                      i=1




                                                                              400 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Poisson regression

       Observations/counts y = (y1 , . . . , yn ) are integers, so we can
       choose
                                 yi ∼ P(µi )
       Saturated likelihood
                                                       n
                                                             1 yi
                                      ℓ(µ|y) =                  µ exp(−µi )
                                                            µi ! i
                                                      i=1

       GLM constraint via log-linear link
                                                                              iT β
                               log(µi ) = xiT β ,             yi |xi ∼ P ex



                                                                                     401 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Categorical variables
       Special feature
       Incidence matrix X = (xi ) such that its elements are all zeros or
       ones, i.e. covariates are all indicators/dummy variables!




                                                                            402 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Categorical variables
       Special feature
       Incidence matrix X = (xi ) such that its elements are all zeros or
       ones, i.e. covariates are all indicators/dummy variables!


       Several types of (sub)models are possible depending on relations
       between categorical variables.




                                                                            403 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Categorical variables
       Special feature
       Incidence matrix X = (xi ) such that its elements are all zeros or
       ones, i.e. covariates are all indicators/dummy variables!


       Several types of (sub)models are possible depending on relations
       between categorical variables.


       Re-special feature
       Variable selection problem of a specific kind, in the sense that all
       indicators related with the same association must either remain or
       vanish at once. Thus much fewer submodels than in a regular
       variable selection problem.
                                                                             404 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Parameterisations
       Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K.




                                                                          405 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Parameterisations
       Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K.

       Simplest non-constant model is
                                    I                        J                      K
                                          u                        v                       w
                log(µτ ) =               βb Ib (uτ )   +          βb Ib (vτ )   +         βb Ib (wτ ) ,
                                  b=1                       b=1                     b=1

       that is,
                                                           u    v    w
                                        log(µl(i,j,k) ) = βi + βj + βk ,
       where index l(i, j, k) corresponds to u = i, v = j and w = k.




                                                                                                          406 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Parameterisations
       Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K.

       Simplest non-constant model is
                                    I                        J                      K
                                          u                        v                       w
                log(µτ ) =               βb Ib (uτ )   +          βb Ib (vτ )   +         βb Ib (wτ ) ,
                                  b=1                       b=1                     b=1

       that is,
                                                           u    v    w
                                        log(µl(i,j,k) ) = βi + βj + βk ,
       where index l(i, j, k) corresponds to u = i, v = j and w = k.
       Saturated model is
                                                                 uvw
                                              log(µl(i,j,k) ) = βijk

                                                                                                          407 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Log-linear model (over-)parameterisation

       Representation

           log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw ,
                                  i    j         ij
                                            k          ik    jk    ijk

       as in Anova models.




                                                                           408 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Log-linear model (over-)parameterisation

       Representation

           log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw ,
                                  i    j         ij
                                            k          ik    jk    ijk

       as in Anova models.
               λ appears as the overall or reference average effect
               λu appears as the marginal discrepancy (against the reference
                i
               effect λ) when u = i,
               λuv as the interaction discrepancy (against the added effects
                ij
               λ + λu + λv ) when (u, v) = (i, j)
                    i     j

       and so on...


                                                                              409 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Example of submodels
               if both v and w are irrelevant, then
           1

                                                                       u
                                                log(µl(i,j,k) ) = λ + λi ,
               if all three categorical variables are mutually independent, then
           2


                                        log(µl(i,j,k) ) = λ + λi + λv + λw ,
                                                               u
                                                                    j    k

               if u and v are associated but are both independent of w, then
           3


                                   log(µl(i,j,k) ) = λ + λi + λv + λw + λuv ,
                                                          u
                                                               j    k    ij

               if u and v are conditionally independent given w, then
           4


                               log(µl(i,j,k) ) = λ + λi + λv + λw + λuw + λvw ,
                                                      u
                                                           j    k    ik    jk

               if there is no three-factor interaction, then
           5


                          log(µl(i,j,k) ) = λ + λi + λv + λw + λuv + λuw + λvw
                                                 u
                                                      j    k    ij    ik    jk

               [the most complete submodel]
                                                                                   410 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Identifiability
       Representation

           log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw ,
                                  i    j         ij
                                            k          ik    jk    ijk

       not identifiable but Bayesian approach handles non-identifiable
       settings and still estimate properly identifiable quantities.




                                                                           411 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Identifiability
       Representation

           log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw ,
                                  i    j         ij
                                            k          ik    jk    ijk

       not identifiable but Bayesian approach handles non-identifiable
       settings and still estimate properly identifiable quantities.
       Customary to impose identifiability constraints on the parameters:
       set to 0 parameters corresponding to the first category of each
       variable, i.e. remove the indicator of the first category.

       E.g., if u ∈ {1, 2} and v ∈ {1, 2}, constraint could be

                                  λu = λv = λuv = λuv = λuv = 0 .
                                   1    1    11    12    21



                                                                           412 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Inference under a flat prior
       Noninformative prior π(β) ∝ 1 gives posterior distribution
                                                n
                                                                          yi
                                                      exp(xiT β)               exp{− exp(xiT β)}
                     π(β|y, X)          ∝
                                              i=1
                                                         n                      n
                                                             yi xiT β −               exp(xiT β)
                                        =     exp
                                                       i=1                      i=1




                                                                                                   413 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



Inference under a flat prior
       Noninformative prior π(β) ∝ 1 gives posterior distribution
                                                n
                                                                          yi
                                                      exp(xiT β)               exp{− exp(xiT β)}
                     π(β|y, X)          ∝
                                              i=1
                                                         n                      n
                                                             yi xiT β −               exp(xiT β)
                                        =     exp
                                                       i=1                      i=1




       Use of same random walk M-H algorithm as in probit and logit cases,
       starting with MLE evaluation

           > mod=summary(glm(y~-1+X,family=poisson()))



                                                                                                   414 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
     Loglinear models



airquality
                                                                    Effect   Post. mean    Post. var.
                                                                    λ            2.8041      0.0612
                                                                    λu          -1.0684      0.2176
                                                                     2
                                                                    λv          -5.8652      1.7141
                                                                     2
                                                                    λw          -1.4401      0.2735
                                                                     2
                                                                    λw          -2.7178      0.7915
                                                                     3
   Identifiable non-saturated model                                  λw          -1.1031      0.2295
                                                                     4
   involves 16 parameters                                           λw          -0.0036      0.1127
                                                                     5
                                                                    λuv
   Obtained with 10, 000 MCMC                                                    3.3559      0.4490
                                                                     22
                                                                    λuw         -1.6242      1.2869
   iterations with scale factor                                      22
                                                                    λuw        - 0.3456      0.8432
   τ 2 = 0.5                                                         23
                                                                    λuw         -0.2473      0.6658
                                                                     24
                                                                    λuw         -1.3335      0.7115
                                                                     25
                                                                    λvw          4.5493      2.1997
                                                                     22
                                                                    λvw          6.8479      2.5881
                                                                     23
                                                                    λvw          4.6557      1.7201
                                                                     24
                                                                    λvw          3.9558      1.7128
                                                                     25


                                                                                                       415 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
          Loglinear models



airquality: MCMC output




                                                                           0.0
                                                   −2
                           0.5
    3.0




                                                                           −1.5
                           −1.0




                                                   −6




                                                                           −3.0
                                                   −10
                           −2.5
    2.0




          0 4000   10000          0 4000   10000          0 4000   10000          0 4000   10000




                                                                           6
                                                   1.0
                           0.0
    −2




                                                                           5
                                                                           4
                           −1.5




                                                   0.0
    −4




                                                                           3
    −6




                                                   −1.0




                                                                           2
                           −3.0




          0 4000   10000          0 4000   10000          0 4000   10000          0 4000   10000
    0




                                                                           0
                           1




                                                   1
    −2




                                                                           −2
                                                   −1
                           −1




                                                                           −4
    −5




                                                   −3
                           −3




          0 4000   10000          0 4000   10000          0 4000   10000          0 4000   10000
                           12




                                                                           8
                                                   8
    8




                                                                           6
                                                   6
                           8




                                                                           4
    4




                                                   4




                                                                           2
                           4




                                                   2
    0




          0 4000   10000          0 4000   10000          0 4000   10000          0 4000   10000




                                                                                                   416 / 785
Bayesian Core:A Practical Approach to Computational Bayesian Statistics
   Generalized linear models
          Loglinear models



airquality: MCMC output




                                                                                                                                                            1400
                                                                                                   800
                                                                           0.0
                                                   −2
                           0.5




                                                                                                                              800
    3.0




                                                                           −1.5




                                                                                                                                                                                          400
                           −1.0




                                                   −6




                                                                                                   400




                                                                                                                              400




                                                                                                                                                            600
                                                                           −3.0
                                                   −10
                           −2.5
    2.0




                                                                                                   0




                                                                                                                              0




                                                                                                                                                            0




                                                                                                                                                                                          0
          0 4000   10000          0 4000   10000          0 4000   10000          0 4000   10000          2.0       3.0              −2.5 −1.0        0.5           −10    −6        −2          −3.0 −1.5 0.0




                                                                                                                                                                                          1000
                                                                                                                                                            1000
                                                                           6
                                                   1.0
                           0.0




                                                                                                   800
    −2




                                                                           5




                                                                                                                              400
                                                                           4
                           −1.5




                                                   0.0




                                                                                                   400
    −4




                                                                                                                                                                                          400
                                                                                                                                                            400
                                                                           3
    −6




                                                   −1.0




                                                                           2
                           −3.0




                                                                                                   0




                                                                                                                              0




                                                                                                                                                            0




                                                                                                                                                                                          0
          0 4000   10000          0 4000   10000          0 4000   10000          0 4000   10000          −7 −5 −3 −1                −3.0   −1.0 0.5               −1.0 0.0         1.0           2    4       6




                                                                                                                              800




                                                                                                                                                            800




                                                                                                                                                                                          800
    0




                                                                           0
                           1




                                                   1




                                                                                                   400




                                                                                                                              400
    −2




                                                                                                                                                            400




                                                                                                                                                                                          400
                                                                           −2
                                                   −1
                           −1




                                                                           −4
    −5




                                                   −3
                           −3




                                                                                                   0




                                                                                                                              0




                                                                                                                                                            0




                                                                                                                                                                                          0
          0 4000   10000          0 4000   10000          0 4000   10000          0 4000   10000           −5   −2 0                  −3 −1       1    3           −3 −1        1     3          −4 −2     0
                                                                                                   1200




                                                                                                                              1000
                           12




                                                                           8




                                                                                                                                                                                          500
                                                                                                                                                            500
                                                   8
    8




                                                                           6
                                                   6




                                                                                                   600
                           8




                                                                                                                              400
                                                                           4




                                                                                                                                                                                          200
                                                                                                                                                            200
    4




                                                   4




                                                                           2
                           4




                                                   2




                                                                                                   0




                                                                                                                              0




                                                                                                                                                            0




                                                                                                                                                                                          0
    0




          0 4000   10000          0 4000   10000          0 4000   10000          0 4000   10000           0    4         8             4     8       12             2468                          2468




                                                                                                                                                                                                                   417 / 785
Bayesian Core: Chapter 4
Bayesian Core: Chapter 4
Bayesian Core: Chapter 4
Bayesian Core: Chapter 4

More Related Content

What's hot

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learningbutest
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
 
Engr 371 final exam april 2006
Engr 371 final exam april 2006Engr 371 final exam april 2006
Engr 371 final exam april 2006amnesiann
 
Ib mathematics hl
Ib mathematics hlIb mathematics hl
Ib mathematics hlRoss
 
Math 1300: Section 5-1 Inequalities in Two Variables
Math 1300: Section 5-1 Inequalities in Two VariablesMath 1300: Section 5-1 Inequalities in Two Variables
Math 1300: Section 5-1 Inequalities in Two VariablesJason Aubrey
 
Engr 371 final exam april 2010
Engr 371 final exam april 2010Engr 371 final exam april 2010
Engr 371 final exam april 2010amnesiann
 
A Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusA Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusYoshihiro Mizoguchi
 
Math 1300: Section 8 -2 Union, Intersection, and Complement of Events; Odds
Math 1300: Section 8 -2 Union, Intersection, and Complement of Events; OddsMath 1300: Section 8 -2 Union, Intersection, and Complement of Events; Odds
Math 1300: Section 8 -2 Union, Intersection, and Complement of Events; OddsJason Aubrey
 
Math 1300: Section 5-2 Systems of Inequalities in two variables
Math 1300: Section 5-2 Systems of Inequalities in two variablesMath 1300: Section 5-2 Systems of Inequalities in two variables
Math 1300: Section 5-2 Systems of Inequalities in two variablesJason Aubrey
 
Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear ApproximationMatthew Leingang
 
Modulus and argand diagram
Modulus and argand diagramModulus and argand diagram
Modulus and argand diagramTarun Gehlot
 
Clique Relaxation Models in Networks: Theory, Algorithms, and Applications
Clique Relaxation Models in Networks: Theory, Algorithms, and ApplicationsClique Relaxation Models in Networks: Theory, Algorithms, and Applications
Clique Relaxation Models in Networks: Theory, Algorithms, and ApplicationsSSA KPI
 

What's hot (19)

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
 
Engr 371 final exam april 2006
Engr 371 final exam april 2006Engr 371 final exam april 2006
Engr 371 final exam april 2006
 
Ib mathematics hl
Ib mathematics hlIb mathematics hl
Ib mathematics hl
 
Math 1300: Section 5-1 Inequalities in Two Variables
Math 1300: Section 5-1 Inequalities in Two VariablesMath 1300: Section 5-1 Inequalities in Two Variables
Math 1300: Section 5-1 Inequalities in Two Variables
 
Engr 371 final exam april 2010
Engr 371 final exam april 2010Engr 371 final exam april 2010
Engr 371 final exam april 2010
 
Eigenaxes
EigenaxesEigenaxes
Eigenaxes
 
A Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusA Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational Calculus
 
Math 1300: Section 8 -2 Union, Intersection, and Complement of Events; Odds
Math 1300: Section 8 -2 Union, Intersection, and Complement of Events; OddsMath 1300: Section 8 -2 Union, Intersection, and Complement of Events; Odds
Math 1300: Section 8 -2 Union, Intersection, and Complement of Events; Odds
 
Math 1300: Section 5-2 Systems of Inequalities in two variables
Math 1300: Section 5-2 Systems of Inequalities in two variablesMath 1300: Section 5-2 Systems of Inequalities in two variables
Math 1300: Section 5-2 Systems of Inequalities in two variables
 
Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear Approximation
 
Modulus and argand diagram
Modulus and argand diagramModulus and argand diagram
Modulus and argand diagram
 
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
MUMS Opening Workshop - Model Uncertainty and Uncertain Quantification - Merl...
 
Em
EmEm
Em
 
Clique Relaxation Models in Networks: Theory, Algorithms, and Applications
Clique Relaxation Models in Networks: Theory, Algorithms, and ApplicationsClique Relaxation Models in Networks: Theory, Algorithms, and Applications
Clique Relaxation Models in Networks: Theory, Algorithms, and Applications
 

Similar to Bayesian Core: Chapter 4

Reading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimatorsReading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimatorsChristian Robert
 
Parameter Estimation for Semiparametric Models with CMARS and Its Applications
Parameter Estimation for Semiparametric Models with CMARS and Its ApplicationsParameter Estimation for Semiparametric Models with CMARS and Its Applications
Parameter Estimation for Semiparametric Models with CMARS and Its ApplicationsSSA KPI
 
unit 3 regression.pptx
unit 3 regression.pptxunit 3 regression.pptx
unit 3 regression.pptxssuser5c580e1
 
7 - Model Assessment and Selection
7 - Model Assessment and Selection7 - Model Assessment and Selection
7 - Model Assessment and SelectionNikita Zhiltsov
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticityhalimuth
 
09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)Duke Network Analysis Center
 

Similar to Bayesian Core: Chapter 4 (9)

Reading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimatorsReading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimators
 
Bayesian Core: Chapter 3
Bayesian Core: Chapter 3Bayesian Core: Chapter 3
Bayesian Core: Chapter 3
 
Parameter Estimation for Semiparametric Models with CMARS and Its Applications
Parameter Estimation for Semiparametric Models with CMARS and Its ApplicationsParameter Estimation for Semiparametric Models with CMARS and Its Applications
Parameter Estimation for Semiparametric Models with CMARS and Its Applications
 
unit 3 regression.pptx
unit 3 regression.pptxunit 3 regression.pptx
unit 3 regression.pptx
 
7 - Model Assessment and Selection
7 - Model Assessment and Selection7 - Model Assessment and Selection
7 - Model Assessment and Selection
 
Collier
CollierCollier
Collier
 
Heteroskedasticity
HeteroskedasticityHeteroskedasticity
Heteroskedasticity
 
09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)09 Inference for Networks – Exponential Random Graph Models (2017)
09 Inference for Networks – Exponential Random Graph Models (2017)
 
M16302
M16302M16302
M16302
 

More from Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 

More from Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Recently uploaded

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 

Recently uploaded (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 

Bayesian Core: Chapter 4

  • 1. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalized linear models Generalized linear models 3 Generalisation of linear models Metropolis–Hastings algorithms The Probit Model The logit model Loglinear models 313 / 785
  • 2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of Linear Models Linear models model connection between a response variable y and a set x of explanatory variables by a linear dependence relation with [approximately] normal perturbations. 314 / 785
  • 3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of Linear Models Linear models model connection between a response variable y and a set x of explanatory variables by a linear dependence relation with [approximately] normal perturbations. Many instances where either of these assumptions not appropriate, e.g. when the support of y restricted to R+ or to N. 315 / 785
  • 4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models bank Four measurements on 100 genuine Swiss banknotes and 100 counterfeit ones: x1 length of the bill (in mm), x2 width of the left edge (in mm), x3 width of the right edge (in mm), x4 bottom margin width (in mm). Response variable y: status of the banknote [0 for genuine and 1 for counterfeit] 316 / 785
  • 5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models bank Four measurements on 100 genuine Swiss banknotes and 100 counterfeit ones: x1 length of the bill (in mm), x2 width of the left edge (in mm), x3 width of the right edge (in mm), x4 bottom margin width (in mm). Response variable y: status of the banknote [0 for genuine and 1 for counterfeit] Probabilistic model that predicts counterfeiting based on the four measurements 317 / 785
  • 6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models The impossible linear model Example of the influence of x4 on y Since y is binary, y|x4 ∼ B(p(x4 )) , c Normal model is impossible 318 / 785
  • 7. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models The impossible linear model Example of the influence of x4 on y Since y is binary, y|x4 ∼ B(p(x4 )) , c Normal model is impossible Linear dependence in p(x) = E[y|x]’s p(x4i ) = β0 + β1 x4i , 319 / 785
  • 8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models The impossible linear model Example of the influence of x4 on y Since y is binary, y|x4 ∼ B(p(x4 )) , c Normal model is impossible Linear dependence in p(x) = E[y|x]’s p(x4i ) = β0 + β1 x4i , estimated [by MLE] as pi = −2.02 + 0.268 xi4 ˆ which gives pi = .12 for xi4 = 8 and ... ˆ 320 / 785
  • 9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models The impossible linear model Example of the influence of x4 on y Since y is binary, y|x4 ∼ B(p(x4 )) , c Normal model is impossible Linear dependence in p(x) = E[y|x]’s p(x4i ) = β0 + β1 x4i , estimated [by MLE] as pi = −2.02 + 0.268 xi4 ˆ which gives pi = .12 for xi4 = 8 and ... pi = 1.19 for xi4 = 12!!! ˆ ˆ c Linear dependence is impossible 321 / 785
  • 10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of the linear dependence Broader class of models to cover various dependence structures. 322 / 785
  • 11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Generalisation of the linear dependence Broader class of models to cover various dependence structures. Class of generalised linear models (GLM) where y|x, β ∼ f (y|xT β) . i.e., dependence of y on x partly linear 323 / 785
  • 12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Notations Same as in linear regression chapter, with n–sample y = (y1 , . . . , yn ) and corresponding explanatory variables/covariates   x11 x12 . . . x1k  x21 x22 . . . x2k     x32 . . . x3k  X =  x31  . . . . . . . . . . . . xn1 xn2 . . . xnk 324 / 785
  • 13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Specifications of GLM’s Definition (GLM) A GLM is a conditional model specified by two functions: the density f of y given x parameterised by its expectation 1 parameter µ = µ(x) [and possibly its dispersion parameter ϕ = ϕ(x)] 325 / 785
  • 14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Specifications of GLM’s Definition (GLM) A GLM is a conditional model specified by two functions: the density f of y given x parameterised by its expectation 1 parameter µ = µ(x) [and possibly its dispersion parameter ϕ = ϕ(x)] the link g between the mean µ and the explanatory variables, 2 written customarily as g(µ) = xT β or, equivalently, E[y|x, β] = g −1 (xT β). 326 / 785
  • 15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Specifications of GLM’s Definition (GLM) A GLM is a conditional model specified by two functions: the density f of y given x parameterised by its expectation 1 parameter µ = µ(x) [and possibly its dispersion parameter ϕ = ϕ(x)] the link g between the mean µ and the explanatory variables, 2 written customarily as g(µ) = xT β or, equivalently, E[y|x, β] = g −1 (xT β). For identifiability reasons, g needs to be bijective. 327 / 785
  • 16. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Likelihood Obvious representation of the likelihood n f yi |xiT β, ϕ ℓ(β, ϕ|y, X) = i=1 with parameters β ∈ Rk and ϕ > 0. 328 / 785
  • 17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Examples Ordinary linear regression Case of GLM where g(x) = x, ϕ = σ 2 , and y|X, β, σ 2 ∼ Nn (Xβ, σ 2 ). 329 / 785
  • 18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Examples (2) Case of binary and binomial data, when yi |xi ∼ B(ni , p(xi )) with known ni Logit [or logistic regression] model Link is logit transform on probability of success g(pi ) = log(pi /(1 − pi )) , with likelihood Y „ ni « „ exp(xiT β) «yi „ «ni −yi n 1 yi 1 + exp(xiT β) 1 + exp(xiT β) i=1 (n )ffi n Y“ ”ni −yi X yi xiT β 1 + exp(xiT β) ∝ exp i=1 i=1 330 / 785
  • 19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Canonical link Special link function g that appears in the natural exponential family representation of the density g ⋆ (µ) = θ if f (y|µ) ∝ exp{T (y) · θ − Ψ(θ)} 331 / 785
  • 20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Canonical link Special link function g that appears in the natural exponential family representation of the density g ⋆ (µ) = θ if f (y|µ) ∝ exp{T (y) · θ − Ψ(θ)} Example Logit link is canonical for the binomial model, since ni pi f (yi |pi ) = exp yi log + ni log(1 − pi ) , yi 1 − pi and thus θi = log pi /(1 − pi ) 332 / 785
  • 21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Examples (3) Customary to use the canonical link, but only customary ... Probit model Probit link function given by g(µi ) = Φ−1 (µi ) where Φ standard normal cdf Likelihood n Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . ℓ(β|y, X) ∝ i=1 Full processing 333 / 785
  • 22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Log-linear models Standard approach to describe associations between several categorical variables, i.e, variables with finite support Sufficient statistic: contingency table, made of the cross-classified counts for the different categorical variables. Full entry to loglinear models 334 / 785
  • 23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Log-linear models Standard approach to describe associations between several categorical variables, i.e, variables with finite support Sufficient statistic: contingency table, made of the cross-classified counts for the different categorical variables. Full entry to loglinear models Example (Titanic survivors) Child Adult Survivor Class Male Female Male Female 1st 0 0 118 4 2nd 0 0 154 13 No 3rd 35 17 387 89 Crew 0 0 670 3 1st 5 1 57 140 2nd 11 13 14 80 Yes 3rd 13 14 75 76 Crew 0 0 192 20 335 / 785
  • 24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Poisson regression model Each count yi is Poisson with mean µi = µ(xi ) 1 Link function connecting R+ with R, e.g. logarithm 2 g(µi ) = log(µi ). 336 / 785
  • 25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Generalisation of linear models Poisson regression model Each count yi is Poisson with mean µi = µ(xi ) 1 Link function connecting R+ with R, e.g. logarithm 2 g(µi ) = log(µi ). Corresponding likelihood n 1 exp yi xiT β − exp(xiT β) . ℓ(β|y, X) = yi ! i=1 337 / 785
  • 26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Metropolis–Hastings algorithms Convergence assessment Posterior inference in GLMs harder than for linear models 338 / 785
  • 27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Metropolis–Hastings algorithms Convergence assessment Posterior inference in GLMs harder than for linear models c Working with a GLM requires specific numerical or simulation tools [E.g., GLIM in classical analyses] 339 / 785
  • 28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Metropolis–Hastings algorithms Convergence assessment Posterior inference in GLMs harder than for linear models c Working with a GLM requires specific numerical or simulation tools [E.g., GLIM in classical analyses] Opportunity to introduce universal MCMC method: Metropolis–Hastings algorithm 340 / 785
  • 29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Generic MCMC sampler Metropolis–Hastings algorithms are generic/down-the-shelf MCMC algorithms Only require likelihood up to a constant [difference with Gibbs sampler] can be tuned with a wide range of possibilities [difference with Gibbs sampler & blocking] natural extensions of standard simulation algorithms: based on the choice of a proposal distribution [difference in Markov proposal q(x, y) and acceptance] 341 / 785
  • 30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Why Metropolis? Originally introduced by Metropolis, Rosenbluth, Rosenbluth, Teller and Teller in a setup of optimization on a discrete state-space. All authors involved in Los Alamos during and after WWII: 342 / 785
  • 31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Why Metropolis? Originally introduced by Metropolis, Rosenbluth, Rosenbluth, Teller and Teller in a setup of optimization on a discrete state-space. All authors involved in Los Alamos during and after WWII: Physicist and mathematician, Nicholas Metropolis is considered (with Stanislaw Ulam) to be the father of Monte Carlo methods. Also a physicist, Marshall Rosenbluth worked on the development of the hydrogen (H) bomb Edward Teller was one of the first scientists to work on the Manhattan Project that led to the production of the A bomb. Also managed to design with Ulam the H bomb. 343 / 785
  • 32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Generic Metropolis–Hastings sampler For target π and proposal kernel q(x, y) Initialization: Choose an arbitrary x(0) 344 / 785
  • 33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Generic Metropolis–Hastings sampler For target π and proposal kernel q(x, y) Initialization: Choose an arbitrary x(0) Iteration t: 1 Given x(t−1) , generate x ∼ q(x(t−1) , x) ˜ 2 Calculate π(˜)/q(x(t−1) , x) x ˜ ρ(x(t−1) , x) = min ˜ ,1 (t−1) )/q(˜, x(t−1) ) π(x x With probability ρ(x(t−1) , x) accept x and set x(t) = x; ˜ ˜ ˜ 3 otherwise reject x and set x(t) = x(t−1) . ˜ 345 / 785
  • 34. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Universality Algorithm only needs to simulate from q which can be chosen [almost!] arbitrarily, i.e. unrelated with π [q also called instrumental distribution] Note: π and q known up to proportionality terms ok since proportionality constants cancel in ρ. 346 / 785
  • 35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Validation Markov chain theory Target π is stationary distribution of Markov chain (x(t) )t because probability ρ(x, y) satisfies detailed balance equation π(x)q(x, y)ρ(x, y) = π(y)q(y, x)ρ(y, x) [Integrate out x to see that π is stationary] 347 / 785
  • 36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Validation Markov chain theory Target π is stationary distribution of Markov chain (x(t) )t because probability ρ(x, y) satisfies detailed balance equation π(x)q(x, y)ρ(x, y) = π(y)q(y, x)ρ(y, x) [Integrate out x to see that π is stationary] For convergence/ergodicity, Markov chain must be irreducible: q has positive probability of reaching all areas with positive π probability in a finite number of steps. 348 / 785
  • 37. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of proposal Theoretical guarantees of convergence very high, but choice of q is crucial in practice. 349 / 785
  • 38. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of proposal Theoretical guarantees of convergence very high, but choice of q is crucial in practice. Poor choice of q may result in very high rejection rates, with very few moves of the Markov chain (x(t) )t hardly moves, or in 350 / 785
  • 39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of proposal Theoretical guarantees of convergence very high, but choice of q is crucial in practice. Poor choice of q may result in very high rejection rates, with very few moves of the Markov chain (x(t) )t hardly moves, or in a myopic exploration of the support of π, that is, in a dependence on the starting value x(0) , with the chain stuck in a neighbourhood mode to x(0) . 351 / 785
  • 40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of proposal Theoretical guarantees of convergence very high, but choice of q is crucial in practice. Poor choice of q may result in very high rejection rates, with very few moves of the Markov chain (x(t) )t hardly moves, or in a myopic exploration of the support of π, that is, in a dependence on the starting value x(0) , with the chain stuck in a neighbourhood mode to x(0) . Note: hybrid MCMC Simultaneous use of different kernels valid and recommended 352 / 785
  • 41. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The independence sampler Pick proposal q that is independent of its first argument, q(x, y) = q(y) ρ simplifies into π(y)/q(y) ρ(x, y) = min 1, . π(x)/q(x) Special case: q ∝ π Reduces to ρ(x, y) = 1 and iid sampling 353 / 785
  • 42. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The independence sampler Pick proposal q that is independent of its first argument, q(x, y) = q(y) ρ simplifies into π(y)/q(y) ρ(x, y) = min 1, . π(x)/q(x) Special case: q ∝ π Reduces to ρ(x, y) = 1 and iid sampling Analogy with Accept-Reject algorithm where max π/q replaced with the current value π(x(t−1) )/q(x(t−1) ) but sequence of accepted x(t) ’s not i.i.d. 354 / 785
  • 43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of q Convergence properties highly dependent on q. q needs to be positive everywhere on the support of π for a good exploration of this support, π/q needs to be bounded. 355 / 785
  • 44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of q Convergence properties highly dependent on q. q needs to be positive everywhere on the support of π for a good exploration of this support, π/q needs to be bounded. Otherwise, the chain takes too long to reach regions with low q/π values. 356 / 785
  • 45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The random walk sampler Independence sampler requires too much global information about π: opt for a local gathering of information 357 / 785
  • 46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The random walk sampler Independence sampler requires too much global information about π: opt for a local gathering of information Means exploration of the neighbourhood of the current value x(t) in search of other points of interest. 358 / 785
  • 47. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms The random walk sampler Independence sampler requires too much global information about π: opt for a local gathering of information Means exploration of the neighbourhood of the current value x(t) in search of other points of interest. Simplest exploration device is based on random walk dynamics. 359 / 785
  • 48. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Random walks Proposal is a symmetric transition density q(x, y) = qRW (y − x) = qRW (x − y) 360 / 785
  • 49. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Random walks Proposal is a symmetric transition density q(x, y) = qRW (y − x) = qRW (x − y) Acceptance probability ρ(x, y) reduces to the simpler form π(y) ρ(x, y) = min 1, . π(x) Only depends on the target π [accepts all proposed values that increase π] 361 / 785
  • 50. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of qRW Considerable flexibility in the choice of qRW , tails: Normal versus Student’s t scale: size of the neighbourhood 362 / 785
  • 51. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of qRW Considerable flexibility in the choice of qRW , tails: Normal versus Student’s t scale: size of the neighbourhood Can also be used for restricted support targets [with a waste of simulations near the boundary] 363 / 785
  • 52. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Choice of qRW Considerable flexibility in the choice of qRW , tails: Normal versus Student’s t scale: size of the neighbourhood Can also be used for restricted support targets [with a waste of simulations near the boundary] Can be tuned towards an acceptance probability of 0.234 at the burnin stage [Magic number!] 364 / 785
  • 53. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Convergence assessment Capital question: How many iterations do we need to run??? MCMC debuts 365 / 785
  • 54. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Convergence assessment Capital question: How many iterations do we need to run??? MCMC debuts Rule # 1 There is no absolute number of simulations, i.e. 1, 000 is neither large, nor small. Rule # 2 It takes [much] longer to check for convergence than for the chain itself to converge. Rule # 3 MCMC is a “what-you-get-is-what-you-see” algorithm: it fails to tell about unexplored parts of the space. Rule # 4 When in doubt, run MCMC chains in parallel and check for consistency. 366 / 785
  • 55. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Convergence assessment Capital question: How many iterations do we need to run??? MCMC debuts Rule # 1 There is no absolute number of simulations, i.e. 1, 000 is neither large, nor small. Rule # 2 It takes [much] longer to check for convergence than for the chain itself to converge. Rule # 3 MCMC is a “what-you-get-is-what-you-see” algorithm: it fails to tell about unexplored parts of the space. Rule # 4 When in doubt, run MCMC chains in parallel and check for consistency. Many “quick-&-dirty” solutions in the literature, but not necessarily trustworthy. 367 / 785
  • 56. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Prohibited dynamic updating Tuning the proposal in terms of its past performances can only be implemented at burnin, because otherwise this cancels Markovian convergence properties. 368 / 785
  • 57. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Prohibited dynamic updating Tuning the proposal in terms of its past performances can only be implemented at burnin, because otherwise this cancels Markovian convergence properties. Use of several MCMC proposals together within a single algorithm using circular or random design is ok. It almost always brings an improvement compared with its individual components (at the cost of increased simulation time) 369 / 785
  • 58. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Effective sample size How many iid simulations from π are equivalent to N simulations from the MCMC algorithm? 370 / 785
  • 59. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Metropolis–Hastings algorithms Effective sample size How many iid simulations from π are equivalent to N simulations from the MCMC algorithm? Based on estimated k-th order auto-correlation, ρk = corr x(t) , x(t+k) , effective sample size −1/2 T0 ess ρ2 N =n 1+2 ˆk , k=1 Only partial indicator that fails to signal chains stuck in one mode of the target 371 / 785
  • 60. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model The Probit Model Likelihood Recall Probit n Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . ℓ(β|y, X) ∝ i=1 372 / 785
  • 61. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model The Probit Model Likelihood Recall Probit n Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . ℓ(β|y, X) ∝ i=1 If no prior information available, resort to the flat prior π(β) ∝ 1 and then obtain the posterior distribution n yi ni −yi Φ xiT β 1 − Φ(xiT β) π(β|y, X) ∝ , i=1 nonstandard and simulated using MCMC techniques. 373 / 785
  • 62. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model MCMC resolution Metropolis–Hastings random walk sampler works well for binary regression problems with small number of predictors ˆ Uses the maximum likelihood estimate β as starting value and ˆ asymptotic (Fisher) covariance matrix of the MLE, Σ, as scale 374 / 785
  • 63. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model MLE proposal R function glm very useful to get the maximum likelihood estimate ˆ of β and its asymptotic covariance matrix Σ. Terminology used in R program mod=summary(glm(y~X-1,family=binomial(link=quot;probitquot;))) ˆ ˆ with mod$coeff[,1] denoting β and mod$cov.unscaled Σ. 375 / 785
  • 64. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model MCMC algorithm Probit random-walk Metropolis-Hastings ˆ ˆ Initialization: Set β (0) = β and compute Σ Iteration t: ˜ ˆ Generate β ∼ Nk+1 (β (t−1) , τ Σ) 1 Compute 2 ˜ π(β|y) ˜ ρ(β (t−1) , β) = min 1, π(β (t−1) |y) ˜ ˜ With probability ρ(β (t−1) , β) set β (t) = β; 3 (t) (t−1) otherwise set β = β . 376 / 785
  • 65. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model bank Probit modelling with 0.8 −1.0 1.0 0.4 no intercept over the −2.0 0.0 0.0 0 4000 8000 −2.0 −1.5 −1.0 −0.5 0 200 600 1000 four measurements. Three different scales 3 0.0 0.4 0.8 2 0.4 1 τ = 1, 0.1, 10: best −1 0.0 mixing behavior is 0 4000 8000 −1 0 1 2 3 0 200 600 1000 associated with τ = 1. 2.5 0.8 0.0 0.4 0.8 −0.5 1.0 0.4 Average of the 0.0 parameters over 9, 000 0 4000 8000 −0.5 0.5 1.5 2.5 0 200 600 1000 iterations gives plug-in 1.8 0.0 0.4 0.8 2.0 estimate 1.2 1.0 0.0 0.6 0 4000 8000 0.6 1.0 1.4 1.8 0 200 600 1000 pi = Φ (−1.2193xi1 + 0.9540xi2 + 0.9795xi3 + 1.1481xi4 ) . ˆ 377 / 785
  • 66. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for probit models Flat prior on β inappropriate for comparison purposes and Bayes factors. 378 / 785
  • 67. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for probit models Flat prior on β inappropriate for comparison purposes and Bayes factors. Replace the flat prior with a hierarchical prior, β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1 and π(σ 2 |X) ∝ σ −3/2 , (almost) as in normal linear regression 379 / 785
  • 68. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for probit models Flat prior on β inappropriate for comparison purposes and Bayes factors. Replace the flat prior with a hierarchical prior, β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1 and π(σ 2 |X) ∝ σ −3/2 , (almost) as in normal linear regression Warning The matrix X T X is not the Fisher information matrix 380 / 785
  • 69. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for testing Same argument as before: while π is improper, use of the same variance factor σ 2 in both models means the normalising constant cancels in the Bayes factor. 381 / 785
  • 70. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model G-priors for testing Same argument as before: while π is improper, use of the same variance factor σ 2 in both models means the normalising constant cancels in the Bayes factor. Posterior distribution of β “ ”−(2k−1)/4 |X T X|1/2 Γ((2k − 1)/4) β T (X T X)β π −k/2 π(β|y, X) ∝ h i1−yi Y n Φ(xiT β)yi 1 − Φ(xiT β) × i=1 [where k matters!] 382 / 785
  • 71. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Marginal approximation Marginal Z“ ”−(2k−1)/4 |X T X|1/2 π −k/2 Γ{(2k − 1)/4} β T (X T X)β f (y|X) ∝ h i1−yi Y n Φ(xiT β)yi 1 − (Φ(xiT β) × dβ, i=1 approximated by M ˛˛ ˛˛−(2k−1)/2 Y h i1−yi |X T X|1/2 X ˛˛ n (m) ˛˛ Φ(xiT β (m) )yi 1 − Φ(xiT β (m) ) ˛˛Xβ ˛˛ π k/2 M m=1 i=1 bb b (m) −β)T V −1 (β (m) −β)/4 b × Γ{(2k − 1)/4} |V |1/2 (4π)k/2 e+(β , where β (m) ∼ Nk (β, 2 V ) with β MCMC approximation of Eπ [β|y, X] and V MCMC approximation of V(β|y, X). 383 / 785
  • 72. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Linear hypothesis Linear restriction on β H0 : Rβ = r (r ∈ Rq , R q × k matrix) where β 0 is (k − q) dimensional and X0 and x0 are linear transforms of X and of x of dimensions (n, k − q) and (k − q). Likelihood n 1−yi 0 Φ(xiT β 0 )yi 1 − Φ(xiT β 0 ) ℓ(β |y, X0 ) ∝ , 0 0 i=1 384 / 785
  • 73. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Linear test Associated [projected] G-prior β 0 |σ 2 , X0 ∼ Nk−q 0k−q , σ 2 (X0 X0 )−1 T and π(σ 2 |X0 ) ∝ σ −3/2 , 385 / 785
  • 74. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Linear test Associated [projected] G-prior β 0 |σ 2 , X0 ∼ Nk−q 0k−q , σ 2 (X0 X0 )−1 T and π(σ 2 |X0 ) ∝ σ −3/2 , Marginal distribution of y of the same type  ffZ ˛˛ ˛˛ (2(k − q) − 1) ˛˛Xβ 0 ˛˛−(2(k−q)−1)/2 |X0 X0 |1/2 π −(k−q)/2 Γ T f (y|X0 ) ∝ 4 h i1−yi Y n Φ(xiT β 0 )yi 1 − (Φ(xiT β 0 ) dβ 0 . 0 0 i=1 386 / 785
  • 75. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model banknote π For H0 : β1 = β2 = 0, B10 = 157.73 [against H0 ] Generic regression-like output: Estimate Post. var. log10(BF) X1 -1.1552 0.0631 4.5844 (****) X2 0.9200 0.3299 -0.2875 X3 0.9121 0.2595 -0.0972 X4 1.0820 0.0287 15.6765 (****) evidence against H0: (****) decisive, (***) strong, (**) subtantial, (*) poor 387 / 785
  • 76. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative settings If prior information available on p(x), transform into prior distribution on β by technique of imaginary observations: 388 / 785
  • 77. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative settings If prior information available on p(x), transform into prior distribution on β by technique of imaginary observations: Start with k different values of the covariate vector, x1 , . . . , xk ˜ ˜ For each of these values, the practitioner specifies (i) a prior guess gi at the probability pi associated with xi ; (ii) an assessment of (un)certainty about that guess given by a number Ki of equivalent “prior observations”. 389 / 785
  • 78. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative settings If prior information available on p(x), transform into prior distribution on β by technique of imaginary observations: Start with k different values of the covariate vector, x1 , . . . , xk ˜ ˜ For each of these values, the practitioner specifies (i) a prior guess gi at the probability pi associated with xi ; (ii) an assessment of (un)certainty about that guess given by a number Ki of equivalent “prior observations”. On how many imaginary observations did you build this guess? 390 / 785
  • 79. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative prior k pKi gi −1 (1 − pi )Ki (1−gi )−1 π(p1 , . . . , pk ) ∝ i i=1 translates into [Jacobian rule] k Ki (1−gi )−1 Φ(˜ iT β)Ki gi −1 1 − Φ(˜ iT β) φ(˜ iT β) π(β) ∝ x x x i=1 391 / 785
  • 80. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The Probit Model Informative prior k pKi gi −1 (1 − pi )Ki (1−gi )−1 π(p1 , . . . , pk ) ∝ i i=1 translates into [Jacobian rule] k Ki (1−gi )−1 Φ(˜ iT β)Ki gi −1 1 − Φ(˜ iT β) φ(˜ iT β) π(β) ∝ x x x i=1 [Almost] equivalent to using the G-prior 0 #−1 1 quot; X k β ∼ Nk @0k , A xj xjT ˜˜ j=1 392 / 785
  • 81. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model The logit model Recall that [for ni = 1] exp(µi ) yi |µi ∼ B(1, µi ), ϕ = 1 and g(µi ) = . 1 + exp(µi ) Thus exp(xiT β) P(yi = 1|β) = 1 + exp(xiT β) with likelihood yi 1−yi n exp(xiT β) exp(xiT β) ℓ(β|y, X) = 1− 1 + exp(xiT β) 1 + exp(xiT β) i=1 393 / 785
  • 82. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model Links with probit usual vague prior for β, π(β) ∝ 1 394 / 785
  • 83. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model Links with probit usual vague prior for β, π(β) ∝ 1 Posterior given by n yi 1−yi exp(xiT β) exp(xiT β) π(β|y, X) ∝ 1− 1 + exp(xiT β) 1 + exp(xiT β) i=1 [intractable] 395 / 785
  • 84. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model Links with probit usual vague prior for β, π(β) ∝ 1 Posterior given by n yi 1−yi exp(xiT β) exp(xiT β) π(β|y, X) ∝ 1− 1 + exp(xiT β) 1 + exp(xiT β) i=1 [intractable] Same Metropolis–Hastings sampler 396 / 785
  • 85. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model bank −1 0.6 0.0 0.4 0.8 −3 Same scale factor equal to 0.3 −5 0.0 τ = 1: slight increase in 0 4000 8000 −5 −4 −3 −2 −1 0 200 600 1000 the skewness of the 246 0.0 0.4 0.8 0.2 histograms of the βi ’s. 0.0 −2 0 4000 8000 −2 0 2 4 6 0 200 600 1000 0.4 246 0.0 0.4 0.8 Plug-in estimate of 0.2 0.0 −2 predictive probability of a 0 4000 8000 −2 0 2 4 6 0 200 600 1000 counterfeit 3.5 0.0 0.4 0.8 2.5 0.6 1.5 0.0 0 4000 8000 1.5 2.5 3.5 0 200 600 1000 exp (−2.5888xi1 + 1.9967xi2 + 2.1260xi3 + 2.1879xi4 ) pi = ˆ . 1 + exp (−2.5888xi1 + 1.9967xi2 + 2.1260xi3 + 2.1879xi4 ) 397 / 785
  • 86. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models The logit model G-priors for logit models Same story: Flat prior on β inappropriate for Bayes factors, to be replaced with hierarchical prior, β|σ 2 , X ∼ Nk 0k , σ 2 (X T X)−1 and π(σ 2 |X) ∝ σ −3/2 Example (bank) Estimate Post. var. log10(BF) X1 -2.3970 0.3286 4.8084 (****) X2 1.6978 1.2220 -0.2453 X3 2.1197 1.0094 -0.1529 X4 2.0230 0.1132 15.9530 (****) evidence against H0: (****) decisive, (***) strong, (**) subtantial, (*) poor 398 / 785
  • 87. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Loglinear models Introduction to loglinear models Example (airquality) Benchmark in R > air=data(airquality) Repeated measurements over 111 consecutive days of ozone u (in parts per billion) and maximum daily temperature v discretized into dichotomous variables month 5 6 7 8 9 ozone temp [1,31] [57,79] 17 4 2 5 18 (79,97] 0 2332 (31,168] [57,79] 6 1031 (79,97] 1 2 21 12 8 Contingency table with 5 × 2 × 2 = 20 entries 399 / 785
  • 88. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Poisson regression Observations/counts y = (y1 , . . . , yn ) are integers, so we can choose yi ∼ P(µi ) Saturated likelihood n 1 yi ℓ(µ|y) = µ exp(−µi ) µi ! i i=1 400 / 785
  • 89. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Poisson regression Observations/counts y = (y1 , . . . , yn ) are integers, so we can choose yi ∼ P(µi ) Saturated likelihood n 1 yi ℓ(µ|y) = µ exp(−µi ) µi ! i i=1 GLM constraint via log-linear link iT β log(µi ) = xiT β , yi |xi ∼ P ex 401 / 785
  • 90. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Categorical variables Special feature Incidence matrix X = (xi ) such that its elements are all zeros or ones, i.e. covariates are all indicators/dummy variables! 402 / 785
  • 91. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Categorical variables Special feature Incidence matrix X = (xi ) such that its elements are all zeros or ones, i.e. covariates are all indicators/dummy variables! Several types of (sub)models are possible depending on relations between categorical variables. 403 / 785
  • 92. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Categorical variables Special feature Incidence matrix X = (xi ) such that its elements are all zeros or ones, i.e. covariates are all indicators/dummy variables! Several types of (sub)models are possible depending on relations between categorical variables. Re-special feature Variable selection problem of a specific kind, in the sense that all indicators related with the same association must either remain or vanish at once. Thus much fewer submodels than in a regular variable selection problem. 404 / 785
  • 93. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Parameterisations Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K. 405 / 785
  • 94. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Parameterisations Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K. Simplest non-constant model is I J K u v w log(µτ ) = βb Ib (uτ ) + βb Ib (vτ ) + βb Ib (wτ ) , b=1 b=1 b=1 that is, u v w log(µl(i,j,k) ) = βi + βj + βk , where index l(i, j, k) corresponds to u = i, v = j and w = k. 406 / 785
  • 95. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Parameterisations Example of three variables 1 ≤ u ≤ I, 1 ≤ v ≤ j and 1 ≤ w ≤ K. Simplest non-constant model is I J K u v w log(µτ ) = βb Ib (uτ ) + βb Ib (vτ ) + βb Ib (wτ ) , b=1 b=1 b=1 that is, u v w log(µl(i,j,k) ) = βi + βj + βk , where index l(i, j, k) corresponds to u = i, v = j and w = k. Saturated model is uvw log(µl(i,j,k) ) = βijk 407 / 785
  • 96. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Log-linear model (over-)parameterisation Representation log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw , i j ij k ik jk ijk as in Anova models. 408 / 785
  • 97. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Log-linear model (over-)parameterisation Representation log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw , i j ij k ik jk ijk as in Anova models. λ appears as the overall or reference average effect λu appears as the marginal discrepancy (against the reference i effect λ) when u = i, λuv as the interaction discrepancy (against the added effects ij λ + λu + λv ) when (u, v) = (i, j) i j and so on... 409 / 785
  • 98. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Example of submodels if both v and w are irrelevant, then 1 u log(µl(i,j,k) ) = λ + λi , if all three categorical variables are mutually independent, then 2 log(µl(i,j,k) ) = λ + λi + λv + λw , u j k if u and v are associated but are both independent of w, then 3 log(µl(i,j,k) ) = λ + λi + λv + λw + λuv , u j k ij if u and v are conditionally independent given w, then 4 log(µl(i,j,k) ) = λ + λi + λv + λw + λuw + λvw , u j k ik jk if there is no three-factor interaction, then 5 log(µl(i,j,k) ) = λ + λi + λv + λw + λuv + λuw + λvw u j k ij ik jk [the most complete submodel] 410 / 785
  • 99. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Identifiability Representation log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw , i j ij k ik jk ijk not identifiable but Bayesian approach handles non-identifiable settings and still estimate properly identifiable quantities. 411 / 785
  • 100. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Identifiability Representation log(µl(i,j,k) ) = λ + λu + λv + λw + λuv + λuw + λvw + λuvw , i j ij k ik jk ijk not identifiable but Bayesian approach handles non-identifiable settings and still estimate properly identifiable quantities. Customary to impose identifiability constraints on the parameters: set to 0 parameters corresponding to the first category of each variable, i.e. remove the indicator of the first category. E.g., if u ∈ {1, 2} and v ∈ {1, 2}, constraint could be λu = λv = λuv = λuv = λuv = 0 . 1 1 11 12 21 412 / 785
  • 101. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Inference under a flat prior Noninformative prior π(β) ∝ 1 gives posterior distribution n yi exp(xiT β) exp{− exp(xiT β)} π(β|y, X) ∝ i=1 n n yi xiT β − exp(xiT β) = exp i=1 i=1 413 / 785
  • 102. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models Inference under a flat prior Noninformative prior π(β) ∝ 1 gives posterior distribution n yi exp(xiT β) exp{− exp(xiT β)} π(β|y, X) ∝ i=1 n n yi xiT β − exp(xiT β) = exp i=1 i=1 Use of same random walk M-H algorithm as in probit and logit cases, starting with MLE evaluation > mod=summary(glm(y~-1+X,family=poisson())) 414 / 785
  • 103. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models airquality Effect Post. mean Post. var. λ 2.8041 0.0612 λu -1.0684 0.2176 2 λv -5.8652 1.7141 2 λw -1.4401 0.2735 2 λw -2.7178 0.7915 3 Identifiable non-saturated model λw -1.1031 0.2295 4 involves 16 parameters λw -0.0036 0.1127 5 λuv Obtained with 10, 000 MCMC 3.3559 0.4490 22 λuw -1.6242 1.2869 iterations with scale factor 22 λuw - 0.3456 0.8432 τ 2 = 0.5 23 λuw -0.2473 0.6658 24 λuw -1.3335 0.7115 25 λvw 4.5493 2.1997 22 λvw 6.8479 2.5881 23 λvw 4.6557 1.7201 24 λvw 3.9558 1.7128 25 415 / 785
  • 104. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models airquality: MCMC output 0.0 −2 0.5 3.0 −1.5 −1.0 −6 −3.0 −10 −2.5 2.0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 6 1.0 0.0 −2 5 4 −1.5 0.0 −4 3 −6 −1.0 2 −3.0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 0 0 1 1 −2 −2 −1 −1 −4 −5 −3 −3 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 12 8 8 8 6 6 8 4 4 4 2 4 2 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 416 / 785
  • 105. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Generalized linear models Loglinear models airquality: MCMC output 1400 800 0.0 −2 0.5 800 3.0 −1.5 400 −1.0 −6 400 400 600 −3.0 −10 −2.5 2.0 0 0 0 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 2.0 3.0 −2.5 −1.0 0.5 −10 −6 −2 −3.0 −1.5 0.0 1000 1000 6 1.0 0.0 800 −2 5 400 4 −1.5 0.0 400 −4 400 400 3 −6 −1.0 2 −3.0 0 0 0 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 −7 −5 −3 −1 −3.0 −1.0 0.5 −1.0 0.0 1.0 2 4 6 800 800 800 0 0 1 1 400 400 −2 400 400 −2 −1 −1 −4 −5 −3 −3 0 0 0 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 −5 −2 0 −3 −1 1 3 −3 −1 1 3 −4 −2 0 1200 1000 12 8 500 500 8 8 6 6 600 8 400 4 200 200 4 4 2 4 2 0 0 0 0 0 0 4000 10000 0 4000 10000 0 4000 10000 0 4000 10000 0 4 8 4 8 12 2468 2468 417 / 785