Neural Auto-Regressive Networks

DR.GEEK
Dec 19, 2020
2 min read

(19th-December-2020)

• Neural auto-regressive networks ( , , ) have the same Bengio and Bengio 2000a b left-to-right graphical model as logistic auto-regressive networks (ﬁgure ) but 20.8 employ a diﬀerent parametrization of the conditional distributions within that graphical model structure. The new parametrization is more powerful in the sense that its capacity can be increased as much as needed, allowing approximation of any joint distribution. The new parametrization can also improve generalization by introducing a parameter sharing and feature sharing principle common to deep learning in general. The models were motivated by the objective of avoiding the curse of dimensionality arising out of traditional tabular graphical models, sharing the same structure as ﬁgure . In tabular discrete probabilistic models, each 20.8 conditional distribution is represented by a table of probabilities, with one entry and one parameter for each possible conﬁguration of the variables involved. By using a neural network instead, two advantages are obtained:

1. The parametrization of each P(xi | xi−1,...,x1) by a neural network with (i− 1)×k inputs and k outputs (if the variables are discrete and take k values, encoded one-hot) allows one to estimate the conditional probability without requiring an exponential number of parameters (and examples), yet still is able to capture high-order dependencies between the random variables.

2. Instead of having a diﬀerent neural network for the prediction of each xi, a connectivity illustrated in ﬁgure allows one to merge all left-to-right 20.9 the neural networks into one. Equivalently, it means that the hidden layer features computed for predicting xi can be reused for predicting xi k + (k > 0). The hidden units are thus organized in groups that have the particularity that all the units in the i-th group only depend on the input values x1,...,xi. The parameters used to compute these hidden units are jointly optimized to improve the prediction of all the variables in the sequence. This is an instance of the reuse principle that recurs throughout deep learning in scenarios ranging from recurrent and convolutional network architectures to multi-task and transfer learning.

Monologue of

Dr. GEEK

Daily Blog by Dr. GEEK

Neural Auto-Regressive Networks

Recent Posts

Comments