Reinforcement Learning:
An Introduction
Richard S. Sutton and Andrew G. Barto
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
Cambridge, Massachusetts
London, England
In memory of A. Harry Klopf
- Contents
- I. The Problem
- 1. Introduction
- 2. Evaluative Feedback
- 2.1 An -Armed Bandit Problem
- 2.2 Action-Value Methods
- 2.3 Softmax Action Selection
- 2.4 Evaluation Versus Instruction
- 2.5 Incremental Implementation
- 2.6 Tracking a Nonstationary Problem
- 2.7 Optimistic Initial Values
- 2.8 Reinforcement Comparison
- 2.9 Pursuit Methods
- 2.10 Associative Search
- 2.11 Conclusions
- 2.12 Bibliographical and Historical Remarks
- 3. The Reinforcement Learning Problem
- 3.1 The Agent-Environment Interface
- 3.2 Goals and Rewards
- 3.3 Returns
- 3.4 Unified Notation for Episodic and Continuing Tasks
- 3.5 The Markov Property
- 3.6 Markov Decision Processes
- 3.7 Value Functions
- 3.8 Optimal Value Functions
- 3.9 Optimality and Approximation
- 3.10 Summary
- 3.11 Bibliographical and Historical Remarks
- II. Elementary Solution Methods
- 4. Dynamic Programming
- 5. Monte Carlo Methods
- 5.1 Monte Carlo Policy Evaluation
- 5.2 Monte Carlo Estimation of Action Values
- 5.3 Monte Carlo Control
- 5.4 On-Policy Monte Carlo Control
- 5.5 Evaluating One Policy While Following Another
- 5.6 Off-Policy Monte Carlo Control
- 5.7 Incremental Implementation
- 5.8 Summary
- 5.9 Bibliographical and Historical Remarks
- 6. Temporal-Difference Learning
- 6.1 TD Prediction
- 6.2 Advantages of TD Prediction Methods
- 6.3 Optimality of TD(0)
- 6.4 Sarsa: On-Policy TD Control
- 6.5 Q-Learning: Off-Policy TD Control
- 6.6 Actor-Critic Methods
- 6.7 R-Learning for Undiscounted Continuing Tasks
- 6.8 Games, Afterstates, and Other Special Cases
- 6.9 Summary
- 6.10 Bibliographical and Historical Remarks
- III. A Unified View
- 7. Eligibility Traces
- 7.1 -Step TD Prediction
- 7.2 The Forward View of TD()
- 7.3 The Backward View of TD()
- 7.4 Equivalence of Forward and Backward Views
- 7.5 Sarsa()
- 7.6 Q()
- 7.7 Eligibility Traces for Actor-Critic Methods
- 7.8 Replacing Traces
- 7.9 Implementation Issues
- 7.10 Variable
- 7.11 Conclusions
- 7.12 Bibliographical and Historical Remarks
- 8. Generalization and Function Approximation
- 9. Planning and Learning
- 10. Dimensions of Reinforcement Learning
- 11. Case Studies
- 7. Eligibility Traces
- Bibliography
Mark Lee 2005-01-04
No comments:
Post a Comment