298x Filetype PDF File size 1.00 MB Source: editorialexpress.com
Numerical Dynamic Programmingin Economics
John Rust
Yale University
Contents 1
1. Introduction
2. Markov Decision Processes (MDP’s) and the Theory of Dynamic Programming
2.1Definitions of MDP’s, DDP’s, and CDP’s
2.2Bellman’s Equation, Contraction Mappings, and Blackwell’s Theorem
2.3Error Bounds for Approximate Fixed Points of Approximate Bellman Operators
2.4AGeometric Series Representation for MDP’s
2.5Examples of Analytic Solutions to Bellman’s Equation for Specific “Test Problems”
2.6Euler Equations and Euler Operators
3. Computational Complexity and Optimal Algorithms
3.1Discrete Computational Complexity
3.2Continuous Computational Complexity
3.3Computational Complexity of the Approximation Problem
4. Numerical Methods for Contraction Fixed Points
5. Numerical Methods for MDP’s
5.1Discrete Finite Horizon MDP’s
5.2Discrete Infinite Horizon MDP’s
5.2.1Successive Approximation, Policy Iteration, and Related Methods
5.2.2Numerical Comparison of Methods in Auto Replacement Problem
5.2.3New Approaches to Solving Large Scale Problems
5.3Continuous Finite Horizon MDP’s
5.3.1Discrete Approximation Methods
5.3.2Smooth Approximation Methods
5.4Continuous Infinite Horizon MDP’s
5.4.1Discrete Approximation Methods
5.4.2Smooth Approximation Methods
6. Conclusions
1 RevisedNovember1994draftforHandbookofComputationalEconomics,H.Amman,D.KendrickandJ.Rust,(eds.).
I am grateful for helpful comments by Hans Amman, Ken Judd, David Kendrick, Martin Puterman, and two not very
anonymousreferees, Charles Tapiero and John Tsitsiklis.
7. References
2
1. Introduction
This chapter surveys numerical methods for solving dynamic programming (DP) problems.
The DP framework has been extensively used in economic modeling because it is sufficiently
rich to model almost any problem involving sequential decision making over time and under
uncertainty.2 By a simple re-definition of variables virtually any DP problem can be formulated as
aMarkovdecisionprocess(MDP)inwhichadecisionmakerwhoisinstatest attimet = 1,...,T
takes an action at that determines current utility u(st,at) and affects the distribution of next
period’s state st+1 via a Markov transition probability p(st+1|st,at). The problem is to determine
an optimal decision rule α that solves V (s) ≡ maxα EαnPT βtu(st,at)|s0 = so where Eα
t=0
denotes expectation with respect to the controlled stochastic process {st,at} induced by the
decision rule α ≡ {α1,...,αT}, and β ∈ (0,1) denotes the discount factor. What makes these
problemsespecially difficult is that instead of optimizing over an ex ante fixed sequence of actions
{a0,...,aT} one needs to optimize over sequences of functions {α0,...,αT} that allow the ex
post decision at to vary as a best response to the current state of the process, i.e. at = αt(st). The
method of dynamic programming provides a constructive, recursive procedure for computing α
usingthevaluefunctionV asa“shadowprice”todecentralizeacomplicatedstochastic/multiperiod
optimization problem into a sequence of simpler deterministic/static optimization problems. 3 In
infinite horizon problems V is the unique solution to a Bellman’s equation, V = Γ(V ), where the
Bellman operator Γ is defined by:
Γ(V)(s) = max [u(s,a)+βZ V(s′)p(ds′|s,a)], (1.1)
a∈A(s)
and the optimal decision rule can be recovered from V by finding a value α(s) ∈ A(s) that attains
themaximumin(1.1)foreachs ∈ S. WereviewthemaintheoreticalresultsaboutMDP’sinsection
2, providing an intuitive derivation Bellman’s equation in the infinite horizon case and showing
that the value function Vα corresponding to a policy α is simply a multidimensional generalization
of a geometric series. This implies that Vα can be computed as the solution to a system of linear
equations, the key step in the Bellman 1955, 1957 and Howard 1960 policy iteration algorithm.
The Bellman operator has a particularly nice mathematical property: Γ is a contraction mapping.
2 See Stokey and Lucas 1987 for examples of DP models in economic theory. See Rust 1994a, 1994b for examples of
of DP modelsin econometrics.
3 In finite horizon problems V actually denotes an entire sequence of value functions, V ≡ {VT,...,V T}, just as α
0 T
denotes a sequence of decision rules. In the infinite-horizon case, the solution (V,α) reduces to a pair of functions of
the current state s.
3
Alargenumberofnumericalsolutionmethodsexploitthecontractionpropertytoyieldconvergent,
numerically stable methodsfor computingapproximate solutionsto Bellman’sequation, including
the classic method of successive approximations. Since V can be recovered from α and vice versa,
the rest of this chapter focuses on methods for computing the value function V , with separate
discussions of numerical problems involved in approximating α where appropriate.4
Fromthestandpointofcomputation,thereisanimportantdistinctionbetweendiscreteMDP’s
whosestateandcontrolvariablescanassumeonlyafinitenumberofpossiblevalues,andcontinuous
MDP’s whose state and control variables can assume a continuum of possible values. The value
|S|
functions for discrete MDP’s live in a subset B of the finite-dimensional Euclidean space R
(where |S| is the number of elements in S), whereas the value functions of continuous MDP’s live
in a subset B of the infinite-dimensional Banach space B(S) of bounded, measurable real-valued
functions on S. Thus, discrete MDP problems can be solved exactly (modulo rounding error
in arithmetic operations), whereas the the solutions to continuous MDP’s can generally only be
approximated. Approximate solution methods may also be attractive for solving discrete MDP’s
with a large number of possible states |S| or actions |A|.
TherearetwobasicstrategiesforapproximatingthesolutiontoacontinuousMDP:1)discrete
approximation and 2) smooth approximation. Discrete approximation methods solve a finite state
MDPproblemthatapproximatestheoriginalcontinuousMDPproblemoverafinitegridofpointsin
the state space S and action space A. Since the methodsfor solvingdiscrete MDP’shave been well
developed and exposited in the operations research literature (e.g. Bertsekas, 1987, Porteus, 1980
andPuterman,1990,1994),thischapterwillonlybrieflyreviewthestandardapproaches,focusing
instead on recent research on discretization methods for solving continuous MDP’s. Although
smooth approximation methods also have a long history (dating back to Bellman, 1957 Bellman
and Dreyfus, 1962 and Bellman, Kalaba and Kotkin 1963), there has been a recent resurgence
of interest in these methods as a potentially more efficient alternative to discretization methods
(Christiano and Fisher, 1994, Johnson et. al. 1993, Judd and Solnick 1994, Marcet and Marshall,
1994, and Smith 1991). Smooth approximation methods treat the value function V and/or the
decision rule α as smooth, flexible functions of s and a finite-dimensional parameter vector θ:
examples include linear interpolation and a variety of more sophisticated approximation methods
4 Special care must be taken in problems with a continuum of actions, since discontinuities or kinks in numerical
estimates of the conditional expectation of V can create problems for numerical optimization algorithms used to
recover α. Also, certain solution methods focus on computing α directly rather than indirectly by first computing V .
Wewillprovidespecial discussions of these methods in section 5.
no reviews yet
Please Login to review.