Universal AI

DR.GEEK
Jan 10, 2021
1 min read

(10th-January-2021)

• Let U be a reference UTM and let Q be the infinite set of all programs for U. These programs are finite bit strings in some prefix-free set. Hutter assumed that the U is deterministic, which means that for any given state and tape contents under read/write heads, there is exactly one successor state and one action for each tape. Let h = (a1, o1, ..., at, ot) ∈ H be an interaction history. Given a program q ∈ Q, we write o(h) = U(q, a(h)), where o(h) = (o1, ...,ot) and a(h) = (a1, ..., at), to mean that q produces the observations oi as output on a tape, in response to the actions ai as input on a tape, for 1 ≤ i ≤ t. We assign the prior probability ξ (q) = 2-|q| to program q where |q| is the length of q in bits. Then we define the prior probability of history h as:

•

• (3.1)

• ρ (h) = ∑q:o(h)=U(q, a(h)) ξ (q).

•

If we use this ρ (h) in equations (2.3)−(2.5), and define u(h) as the reward from the environment at time step |h|, then the agent π is Hutter's universal AI. That is, each observation oi is factored into an ordinary observation o'i and a reward ri as oi = (o'i, ri) with u(h) = r|h|. We assume 0 ≤ ri ≤ 1 to ensure that values converge in equations (2.3)−(2.5).

Monologue of

Dr. GEEK

Daily Blog by Dr. GEEK

Universal AI

Recent Posts

Commentaires