Python Reinforcement Learning
上QQ阅读APP看书,第一时间看更新

Markov Decision Process

MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.

MDP is represented by five important elements: 

  • A set of states  the agent can actually be in.
  • A set of actions that can be performed by an agent, for moving from one state to another.
  • A transition probability (), which is the probability of moving from one state  to another state by performing some action .
  • A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state  by performing some action .
  • A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.