Historical Perspective

DR.GEEK
Nov 25, 2020
3 min read

(25th-Nov-2020)

• The idea of distributed representations for symbols was introduced by Rumelhart et al. ( ) in one of the ﬁrst explorations of back-propagation, with symbols 1986a corresponding to the identity of family members and the neural network capturing the relationships between family members, with training examples forming triplets such as (Colin, Mother, Victoria). The ﬁrst layer of the neural network learned a representation of each family member. For example, the features for Colin might represent which family tree Colin was in, what branch of that tree he was in, what generation he was from, etc. One can think of the neural network as computing learned rules relating these attributes together in order to obtain the desired predictions. The model can then make predictions such as inferring who is the mother of Colin. The idea of forming an embedding for a symbol was extended to the idea of an embedding for a word by Deerwester 1990 et al. ( ). These embeddings were learned using the SVD. Later, embeddings would be learned by neural networks. The history of natural language processing is marked by transitions in the popularity of diﬀerent ways of representing the input to the model. Following this early work on symbols or words, some of the earliest applications of neural networks to NLP ( , ; Miikkulainen and Dyer 1991 Schmidhuber 1996 , ) represented the input as a sequence of characters.

• One of the major families of applications of machine learning in the information technology sector is the ability to make recommendations of items to potential users or customers. Two major types of applications can be distinguished: online advertising and item recommendations (often these recommendations are still for the purpose of selling a product). Both rely on predicting the association between a user and an item, either to predict the probability of some action (the user buying the product, or some proxy for this action) or the expected gain (which may depend on the value of the product) if an ad is shown or a recommendation is made regarding that product to that user. The internet is currently ﬁnanced in great part by various forms of online advertising. There are major parts of the economy that rely on online shopping. Companies including Amazon and eBay use machine learning, including deep learning, for their product recommendations. Sometimes, the items are not products that are actually for sale. Examples include selecting posts to display on social network news feeds, recommending movies to watch, recommending jokes, recommending advice from experts, matching players for video games, or matching people in dating services. Often, this association problem is handled like a supervised learning problem: given some information about the item and about the user, predict the proxy of interest (user clicks on ad, user enters a rating, user clicks on a “like” button, user buys product, user spends some amount of money on the product, user spends time visiting a page for the product, etc). This often ends up being either a regression problem (predicting some conditional expected value) or a probabilistic classiﬁcation problem (predicting the conditional probability of some discrete event).

• Aa matrix with user embeddings in its rows and B a matrix with item embeddings in its columns. Let b and c be vectors that contain respectively a kind of bias for each user (representing how grumpy or positive that user is in general) and for each item (representing its general popularity). The bilinear prediction is thus obtained as follows.

Monologue of

Dr. GEEK

Daily Blog by Dr. GEEK

Historical Perspective

Recent Posts

Comments