上QQ阅读APP看书，第一时间看更新

Markov Decision Process

MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.

MDP is represented by five important elements:

A set of states the agent can actually be in.
A set of actions that can be performed by an agent, for moving from one state to another.
A transition probability (), which is the probability of moving from one state to another state by performing some action .
A reward probability (), which is the probability of a reward acquired by the agent for moving from one state to another state by performing some action .
A discount factor (), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.