Diﬀerentiable Generator Nets

DR.GEEK
Dec 14, 2020
2 min read

(14th-December-2020)

• Many generative models are based on the idea of using a diﬀerentiable generator network. The model transforms samples of latent variables z to samples x or to distributions over samples x using a diﬀerentiable function g(z;θ( ) g ) which is typically represented by a neural network. This model class includes variationalautoencoders, which pair the generator net with an inference net, generative adversarial networks, which pair the generator network with a discriminator network, and techniques that train generator networks in isolation. Generator networks are essentially just parametrized computational procedures for generating samples, where the architecture provides the family of possible distributions to sample from and the parameters select a distribution from within that family. As an example, the standard procedure for drawing samples from a normal distribution with mean µ and covariance Σ is to feed samples z from a normal distribution with zero mean and identity covariance into a very simple generator network. This generator network contains just one aﬃne layer:

• x z Lz = ( g ) = + µ (20.71)

• where is given by the Cholesky decomposition of .

• The variationalautoencoder or VAE ( , ; , ) is a Kingma 2013 Rezende et al. 2014 directed model that uses learned approximate inference and can be trained purely with gradient-based methods. To generate a sample from the model, the VAE ﬁrst draws a sample z from the code distribution pmodel(z). The sample is then run through a diﬀerentiable generator network g(z). Finally, x is sampled from a distribution pmodel(x;g(z)) = pmodel(x z | ). However, during training, the approximate inference network (or encoder) q(z x | ) is used to obtain z and pmodel(x z | ) is then viewed as a decoder network. The key insight behind variationalautoencoders is that they may be trained by maximizing the variational lower bound associated with data point :

• In equation , we recognize the ﬁrst term as the joint log-likelihood of the visible 20.76 and hidden variables under the approximate posterior over the latent variables (just like with EM, except that we use an approximate rather than the exact posterior). We recognize also a second term, the entropy of the approximate posterior. When q is chosen to be a Gaussian distribution, with noise added to a predicted mean value, maximizing this entropy term encourages increasing the standard deviation of this noise. More generally, this entropy term encourages the variational posterior to place high probability mass on many z values that could have generated x, rather than collapsing to a single point estimate of the most likely value. In equation , we recognize the ﬁrst term as the reconstruction 20.77 log-likelihood found in other autoencoders. The second term tries to make the approximate posterior distribution q(z | x) and the model prior pmodel(z) approach each other.

Monologue of

Dr. GEEK

Daily Blog by Dr. GEEK

Diﬀerentiable Generator Nets

Recent Posts

Comments