(15th-January-2021)
• A simplified example of a Markov decision process (MDP), not showing outputs. There are three states, S1, S2, and S3, and two inputs, a0 and a1. The arcs are labeled with probabilities of transitioning from inputs to successor states.
• Rather than using TMs, agents can model their environments with finite stochastic loop programs. These programs have a finite memory limit−no matter how long the program runs it can never use more memory than its limit. Furthermore they are stochastic, meaning they sometimes make random decisions about what to do next. Such programs can be expressed in an ordinary procedural programming language restricted to only static array declarations, no recursive functions, only loops whose number of iterations is set before iteration begins (so no while loops), and a generator for truly random numbers. Prohibiting recursive functions prevents stack memory from growing without limit. As a result, the total memory use of these program is known at compile time.
• Markov decision processes (MDPs) are a formal model for finite stochastic loop programs (Puterman 1994; Sutton and Barto 1998; Russell and Norvig 2010). An MDP has a finite set of states including a start state, a finite set of inputs, a finite set of outputs, and a table of probabilities for the next state and output as a function of the current state and input. An MDP's only memory is storage for the value of the current state. Thus an MDP with n bits.
Comments