The asymptotic behavior of undiscounted value iteration in Markov decision problems
Coauthor(s): Paul Schweitzer.
Adobe Acrobat PDF
This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n approaches infinity for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.
Source: Mathematics of Operations Research
Schweitzer, Paul, and Awi Federgruen. "The asymptotic behavior of undiscounted value iteration in Markov decision problems." Mathematics of Operations Research 2, no. 4 (November 1977): 360-381.