(16th-June-2020)
• For this method to work, a distance metric is required that measures the closeness of two examples. First define a metric for the domain of each feature, in which the values of the features are converted to a numerical scale that can be used to compare values. Suppose val(e,Xi) is a numerical representation of the value of feature Xi for the example e. Then (val(e1,Xi) - val(e2,Xi)) is the difference between example e1 and e2 on the dimension defined by feature Xi. The Euclidean distance, the square root of the sum of the squares of the dimension differences, can be used as the distance between two examples. One important issue is the relative scales of different dimensions; increasing the scale of one dimension increases the importance of that feature. Let wi be a non-negative real-valued parameter that specifies the weight of feature Xi. The distance between examples e1 and e2 is then
d(e1,e2) =sqrt(∑i wi ×(val(e1,Xi) - val(e2,Xi))2)
The feature weights can be provided as input. It is also possible to learn these weights. The learning agent can try to find a parameter setting that minimizes the error in predicting the value of each element of the training set, based on every other instance in the training set. This is called the leave-one-out cross-validation error measure.
• So far, learning is either choosing the best representation - for example, the best decision tree or the best values for parameters in a neural network - or predicting the value of the target features of a new case from a database of previous cases. This section considers a different notion of learning, namely learning as delineating those hypotheses that are consistent with the examples. Rather than choosing a hypothesis, the aim is to find all hypotheses that are consistent. This investigation will shed light on the role of a bias and provide a mechanism for a theoretical analysis of the learning problem.
• We make three assumptions:
• There is a single target feature, Y, that is Boolean. This is not really a restriction for classification, because any discrete feature can be made into Boolean features using indicator variables.
• The hypotheses make definitive predictions, predicting true or false for each example, rather than probabilistic prediction.
Comments