top of page
Search
Writer's pictureDR.GEEK

DBM Mean Field Inference

(4th-Dec-2020)


• The conditional distribution over one DBM layer given the neighboring layers is factorial. In the example of the DBM with two hidden layers, these distributions are P(v h | (1)), P(h(1) | v h , (2)) and P(h(2) | h(1)). The distribution over all hidden layers generally does not factorize because of interactions between layers. In the example with two hidden layers, P (h(1),h(2) | v) does not factorize due due to the interaction weights W (2) between h(1) and h(2) which render these variables mutually dependent. As was the case with the DBN, we are left to seek out methods to approximate the DBM posterior distribution. However, unlike the DBN, the DBM posterior distribution over their hidden units—while complicated—is easy to approximate with a variational approximation (as discussed in section ), specifically a 19.4 mean field approximation. The mean field approximation is a simple form of variational inference, where we restrict the approximating distribution to fully factorial distributions. In the context of DBMs, the mean field equations capture the bidirectional interactions between layers. In this section we derive the iterative approximate inference procedure originally introduced in Salakhutdinov and Hinton ( ).

Learning in the DBM must confront both the challenge of an intractable partition function, using the techniques from chapter , and the challenge of an intractable 18 posterior distribution, using the techniques from chapter . 19 As described in section ,variational inference allows the construction of 20.4.2 a distribution Q(h v | ) that approximates the intractable P(h v | ). Learning then proceeds by maximizing L(v θ ,Q, ), the variational lower bound on the intractable log-likelihood,


This expression still contains the log partition function, logZ(θ). Because a deep Boltzmann machine contains restricted Boltzmann machines as components, the hardness results for computing the partition function and sampling that apply to restricted Boltzmann machines also apply to deep Boltzmann machines. This means that evaluating the probability mass function of a Boltzmann machine requires approximate methods such as annealed importance sampling. Likewise, training the model requires approximations to the gradient of the log partition function. See chapter for a general description of these methods. DBMs are typically trained 18 using stochastic maximum likelihood. Many of the other techniques described in chapter are not applicable. Techniques such as pseudolikelihood require the 18 ability to evaluate the unnormalized probabilities, rather than merely obtain a variational lower bound on them. Contrastive divergence is slow for deep Boltzmann machines because they do not allow efficient sampling of the hidden units given the visible units—instead, contrastive divergence would require burning in a Markov chain every time a new negative phase sample is needed.

4 views0 comments

Recent Posts

See All

Comments


bottom of page