(14th-Jnue-2020)
• To understand MAP learning, consider how it can be used to learn decision trees. If there are no examples with the same values for the input features and different values for the target features, there are always decision trees that fit the data perfectly. If the training examples do not cover all of the assignments to the input variables, multiple trees will fit the data perfectly. However, with noise, none of these may be the best model. Not only do we want to compare the models that fit the data perfectly; we also want to compare those models with the models that do not necessarily fit the data perfectly. MAP learning provides a way to compare these models.
• Suppose there are multiple decision trees that accurately fit the data. If model denotes one of those decision trees, P(data|model)=1. The preference for one decision tree over another depends on the prior probabilities of the decision trees; the prior probability encodes the learning bias. The preference for simpler decision trees over more complicated decision trees occurs because simpler decision trees have a higher prior probability.
• Bayes' rule gives a way to trade off simplicity and ability to handle noise. Decision trees can handle noisy data by having probabilities at the leaves. When there is noise, larger decision trees fit the training data better, because the tree can account for random regularities (noise) in the training data. In decision-tree learning, the likelihood favors bigger decision trees; the more complicated the tree, the better it can fit the data. The prior distribution can favor smaller decision trees. When there is a prior distribution over decision trees, Bayes' rule specifies how to trade off model complexity and accuracy: The posterior probability of the model given the data is proportional to the product of the likelihood and the prior.
Commenti