(18th-April-2021)
• cybernetics in the 1940s–1960s, deep learning known as
connectionism in the 1980s–1990s, and the current resurgence under the name deep learning beginning in 2006. This is quantitatively illustrated in figure 1.7.
The figure shows two of the three historical waves of artificial neural nets
The earliest predecessors of modern deep learning were simple linear models.
The McCulloch-Pitts Neuron (McCulloch and Pitts, 1943) was an early model of brain function. This linear model could recognize two different categories of inputs by testing whether f(x, w) is positive or negative.
The training algorithm used to adapt the weights of the ADALINE was a special case of an algorithm called stochastic gradient descent.
Models based on the f (x, w) used by the perceptron and ADALINE are called linear models
. These models remain some of the most widely used machine learning models, though in many cases they are trained in different ways than the
original models were trained.
Linear models have many limitations. Most famously, they cannot learn the XOR function, where f ([0,1], w) = 1 and f ([1,0], w) = 1 but f ([1,1], w)= 0 and f ([0,0], w) = 0.
Today, neuroscience is regarded as an important source of inspiration for deep learning researchers, but it is no longer the predominant guide for the field.
The main reason for the diminished role of neuroscience in deep learning research today is that we simply do not have enough information about the brain to use it as a guide.
To obtain a deep understanding of the actual algorithms used by the brain, we would need to be able to monitor the activity of (at the very least) thousands of interconnected neurons simultaneously.
In the 1980s, the second wave of neural network research emerged in greatpart via a movement called connectionism or parallel distributed processing (Rumelhart et al., 1986c; McClelland et al., 1995).
Several key concepts arose during the connectionism movement of the 1980s that remain central to today’s deep learning.
One of these concepts is that of distributed representation (Hinton et al.,1986). This is the idea that each input to a system should be represented by many features, and each feature should be involved in the representation of many possible inputs.
Comments