(7th-January-2021)
• A current approach views AI as an agent interacting with an environment (Russell and Norvig 2010; Sutton and Barto, 1998). The term agent applies to AI systems as well as to humans, animals, and even plants. We assume that the agent interacts with its environment in a discrete series of time steps t ∈ {0, 1, 2, ...}.
• This series of time steps may be finite with a last time step T, or may be infinite with no last time step. At time step t, the agent sends an action at ∈ A to the environment and receives an observation ot∈ O from the environment, where A and O are finite sets. We use h = (a1, o1, ...,at, ot) to denote an interaction history during which the environment produces observation oi in response to action ai for 1 ≤ i ≤ t.
• Let H be the set of all finite histories so that h∈ H, and define |h| = t as the length of the history h.
• An agent interacting with its environment.
• The agent's predictions of possible observations are uncertain. Thus the agent's environment model takes the form of a probability distribution over interaction histories:
•
•
• ρ : H → [0, 1].
•
• Here [0, 1] is the closed interval of real numbers between 0 and 1. The probability of history h is denoted ρ (h). Given h = (a1, o1, ..., at, ot) let ha denote (a1, o1, ..., at, ot, a) and hao denote (a1, o1, ..., at, ot, a, o). Then we can define a conditional probability:
•
• (2.1) ρ(o | ha) = ρ(hao) / ρ(ha) = ρ(hao) / ∑o'∈O ρ(hao').
•
• This equation is the agent's prediction of the probability of observation o in response to its action a, following history h. The histories hao' are mutually exclusive for all the o'∈ O, so the probability of one particular observation o following ha is simply the probability of hao divided by the sum of the probabilities of hao' for all o' ∈ O.
Comments