From the course: Reinforcement Learning Foundations

Unlock this course with a free trial

Join today to access over 25,600 courses taught by industry experts.

Markov decision process

Markov decision process

- [Instructor] One very important topic left to discuss when describing a reinforcement learning problem, is the Markov decision process. You might have been wondering how everything discussed in the previous lesson is even possible mathematically, or even in code. Markov decision process, MDP in short, is how reinforcement learning problems are represented mathematically. Its variables include states, actions, rewards, one step dynamics of the environment, which is the states transition probability, and the discount factor. I know, I didn't mention discount factor before, so I'll do justice to that. Let's go back to the race example, but this time around we are running forever, definitely not possible. We are assuming this is a continuing task, as opposed to the hundred meter sprint, which is an episodic task. Now, because we're running forever, we are not very sure what the future holds, or whether our future rewards will be any better than the present. Due to this, we favor and…

Contents