(24th-Nov-2020)
• We can think of an attention-based system as having three components:
• 1. A process that “reads” raw data (such as source words in a source sentence), and converts them into distributed representations, with one feature vector associated with each word position.
• 2. A list of feature vectors storing the output of the reader. This can be understood as a “ ” containing a sequence of facts, which can be memory retrieved later, not necessarily in the same order, without having to visit all of them.
• 3. A process that “ ” the content of the memory to sequentially perform exploits a task, at each time step having the ability put attention on the content of one memory element (or a few, with a different weight).
Comments