上QQ阅读APP看书，第一时间看更新

What this book covers

Chapter 1, Introduction to Reinforcement Learning, helps us understand what reinforcement learning is and how it works. We will learn about various elements of reinforcement learning, such as agents, environments, policies, and models, and we will see different types of environments, platforms, and libraries used for reinforcement learning. Later in the chapter, we will see some of the applications of reinforcement learning.

Chapter 2, Getting Started with OpenAI and TensorFlow, helps us set up our machine for various reinforcement learning tasks. We will learn how to set up our machine by installing Anaconda, Docker, OpenAI Gym, Universe, and TensorFlow. Then we will learn how to simulate agents in OpenAI Gym, and we will see how to build a video game bot. We will also learn the fundamentals of TensorFlow and see how to use TensorBoard for visualizations.

Chapter 3, The Markov Decision Process and Dynamic Programming, starts by explaining what a Markov chain and a Markov process is, and then we will see how reinforcement learning problems can be modeled as Markov Decision Processes. We will also learn about several fundamental concepts, such as value functions, Q functions, and the Bellman equation. Then we will see what dynamic programming is and how to solve the frozen lake problem using value and policy iteration.

Chapter 4, Gaming with Monte Carlo Methods, explains Monte Carlo methods and different types of Monte Carlo prediction methods, such as first visit MC and every visit MC. We will also learn how to use Monte Carlo methods to play blackjack. Then we will explore different on-policy and off-policy Monte Carlo control methods.

Chapter 5, Temporal Difference Learning, covers temporal-difference (TD) learning, TD prediction, and TD off-policy and on-policy control methods such as Q learning and SARSA. We will also learn how to solve the taxi problem using Q learning and SARSA.

Chapter 6, Multi-Armed Bandit Problem, deals with one of the classic problems of reinforcement learning, the multi-armed bandit (MAB) or k-armed bandit problem. We will learn how to solve this problem using various exploration strategies, such as epsilon-greedy, softmax exploration, UCB, and Thompson sampling. Later in the chapter, we will see how to show the right ad banner to the user using MAB.

Chapter 7, Playing Atari Games, will get us creating our first deep RL algorithm to play ATARI games.

Chapter 8, Atari Games with Deep Q Network, covers one of the most widely used deep reinforcement learning algorithms, which is called the deep Q network (DQN). We will learn about DQN by exploring its various components, and then we will see how to build an agent to play Atari games using DQN. Then we will look at some of the upgrades to the DQN architecture, such as double DQN and dueling DQN.

Chapter 9, Playing Doom with a Deep Recurrent Q Network, explains the deep recurrent Q network (DRQN) and how it differs from a DQN. We will see how to build an agent to play Doom using a DRQN. Later in the chapter, we will learn about the deep attention recurrent Q network, which adds the attention mechanism to the DRQN architecture.

Chapter 10, The Asynchronous Advantage Actor Critic Network, explains how the Asynchronous Advantage Actor Critic (A3C) network works. We will explore the A3C architecture in detail, and then we will learn how to build an agent for driving up the mountain using A3C.

Chapter 11, Policy Gradients and Optimization, covers how policy gradients help us find the right policy without needing the Q function. We will also explore the deep deterministic policy gradient method. Later in the chapter, we will see state of the art policy optimization methods such as trust region policy optimization and proximal policy optimization.

Chapter 12, Balancing CartPole, will have us implement our first RL algorithms in Python and TensorFlow to solve the cart pole balancing problem.

Chapter 13, Simulating Control Tasks, provides a brief introduction to actor-critic algorithms for continuous control problems. We will learn how to simulate classic control tasks, look at how to implement basic actor-critic algorithms, and understand the state-of-the-art algorithms for control.

Chapter 14, Building Virtual Worlds in Minecraft, takes the advanced concepts covered in previous chapters and applies them to Minecraft, a game more complex than those found on ATARI.

Chapter 15, Learning to Play Go, will have us building a model that can play Go, the popular Asian board game that is considered one of the world's most complicated games.

Chapter 16, Creating a Chatbot, will teach us how to apply deep RL in natural language processing. Our reward function will be a future-looking function, and we will learn how to think in terms of probability when creating this function.

Chapter 17, Generating a Deep Learning Image Classifier, introduces one of the latest and most exciting advancements in RL: generating deep learning models using RL. We explore the cutting-edge research produced by Google Brain and implement the algorithms introduced.

Chapter 18, Predicting Future Stock Prices, discusses building an agent that can predict stock prices.

Chapter 19, Capstone Project – Car Racing Using DQN, provides a step-by-step approach for building an agent to win a car racing game using dueling DQN.

Chapter 20, Looking Ahead, concludes the book by discussing some of the real-world applications of reinforcement learning and introducing potential areas of future academic work.