Description Length

DR.GEEK
Jun 14, 2020
2 min read

(15th-June-2020)

The negative of the logarithm (base 2) of Formula (7.5.1) is

(- log 2 P(data|model))+( - log 2 P(model)).

This can be interpreted in terms of information theory. The left-hand side of this expression is the number of bits it takes to describe the data given the model. The right-hand side is the number of bits it takes to describe the model. A model that minimizes this sum is a minimum description length (MDL) model. The MDL principle is to choose the model that minimizes the number of bits it takes to describe both the model and the data given the model.

One way to think about the MDL principle is that the aim is to communicate the data as succinctly as possible. The use of the model is to make communication shorter. To communicate the data, first communicate the model, then communicate the data in terms of the model. The number of bits it takes to communicate the data using a model is the number of bits it takes to communicate the model plus the number of bits it takes to communicate the data in terms of the model. The MDL principle is used to choose the model that lets us communicate the data in as few bits as possible.

Cross Validation

• The problem with the previous methods is that they require a notion of simplicity to be known before the agent has seen any data. It would seem as though an agent should be able to determine, from the data, how complicated a model needs to be. Such a method could be used when the learning agent has no prior information about the world.

• The idea of cross validation is to split the training set into two: a set of examples to train with, and a validation set. The agent trains using the new training set. Prediction on the validation set is used to determine which model to use.

Monologue of

Dr. GEEK

Daily Blog by Dr. GEEK

Description Length

Recent Posts

Comments