top of page
Search
Writer's pictureDR.GEEK

Universal AI

(10th-January-2021)



• Let U be a reference UTM and let Q be the infinite set of all programs for U. These programs are finite bit strings in some prefix-free set. Hutter assumed that the U is deterministic, which means that for any given state and tape contents under read/write heads, there is exactly one successor state and one action for each tape. Let h = (a1, o1, ..., at, ot) ∈ H be an interaction history. Given a program q ∈ Q, we write o(h) = U(q, a(h)), where o(h) = (o1, ...,ot) and a(h) = (a1, ..., at), to mean that q produces the observations oi as output on a tape, in response to the actions ai as input on a tape, for 1 ≤ i ≤ t. We assign the prior probability ξ (q) = 2-|q| to program q where |q| is the length of q in bits. Then we define the prior probability of history h as:

• (3.1)

• ρ (h) = ∑q:o(h)=U(q, a(h)) ξ (q).

If we use this ρ (h) in equations (2.3)−(2.5), and define u(h) as the reward from the environment at time step |h|, then the agent π is Hutter's universal AI. That is, each observation oi is factored into an ordinary observation o'i and a reward ri as oi = (o'i, ri) with u(h) = r|h|. We assume 0 ≤ ri ≤ 1 to ensure that values converge in equations (2.3)−(2.5).

4 views0 comments

Recent Posts

See All

コメント


bottom of page