Awi Federgruen

The asymptotic behavior of undiscounted value iteration in Markov decision problems

Coauthor(s): Paul Schweitzer.

Download:

Adobe Acrobat PDF

Abstract:
This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n approaches infinity for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.

Source: Mathematics of Operations Research
Exact Citation:
Schweitzer, Paul, and Awi Federgruen. "The asymptotic behavior of undiscounted value iteration in Markov decision problems." Mathematics of Operations Research 2, no. 4 (November 1977): 360-381.
Volume: 2
Number: 4
Pages: 360-381
Date: 11 1977