Layer-Wise Pretraining

DR.GEEK
Dec 5, 2020
1 min read

(5th-Dce-2020)

• Unfortunately, training a DBM using stochastic maximum likelihood (as described above) from a random initialization usually results in failure. In some cases, the model fails to learn to represent the distribution adequately. In other cases, the DBM may represent the distribution well, but with no higher likelihood than could be obtained with just an RBM. A DBM with very small weights in all but the ﬁrst layer represents approximately the same distribution as an RBM. Various techniques that permit joint training have been developed and are described in section . However, the original and most popular method for 20.4.5 overcoming the joint training problem of DBMs is greedy layer-wise pretraining. In this method, each layer of the DBM is trained in isolation as an RBM. The ﬁrst layer is trained to model the input data. Each subsequent RBM is trained to model samples from the previous RBM’s posterior distribution.

Algorithm 20.1 The variational stochastic maximum likelihood algorithm for training a DBM with two hidden layers.

Monologue of

Dr. GEEK

Daily Blog by Dr. GEEK

Layer-Wise Pretraining

Recent Posts

Comments