ArticleOriginal scientific text

Title

Nonparametric adaptive control for discrete-time Markov processes with unbounded costs under average criterion

Authors 1

Affiliations

  1. Departamento de Matemáticas, Universidad de Sonora, Rosales s/n Col. Centro, C.P. 83000, Hermosillo, Son., México

Abstract

We introduce average cost optimal adaptive policies in a class of discrete-time Markov control processes with Borel state and action spaces, allowing unbounded costs. The processes evolve according to the system equations xt+1=F(xt,at,ξt), t=1,2,..., with i.i.d. k-valued random vectors ξt, which are observable but whose density ϱ is unknown.

Keywords

Markov control process, discounted and average cost criterion, adaptive policy

Bibliography

  1. D. Blackwell, Discrete dynamic programming, Ann. Math. Statist. 33 (1962), 719-726.
  2. E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes, Springer, New York, 1979.
  3. E. I. Gordienko, Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504-518.
  4. E. I. Gordienko and O. Hernández-Lerma, Average cost Markov control processes with weighted norms: existence of canonical policies, Appl. Math. (Warsaw) 23 (1995), 199-218.
  5. E. I. Gordienko and J. A. Minjárez-Sosa, Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion, Kybernetika 34 (1998), no. 2, 217-234.
  6. E. I. Gordienko and J. A. Minjárez-Sosa, Adaptive control for discrete-time Markov processes with unbounded costs: average criterion, Math. Methods Oper. Res. 48 (1998), 37-55.
  7. R. Hasminskii and I. Ibragimov, On density estimation in the view of Kolmogorov's ideas in approximation theory, Ann. Statist. 18 (1990), 999-1010.
  8. O. Hernández-Lerma, Adaptive Markov Control Processes, Springer, New York, 1989.
  9. O. Hernández-Lerma, Infinite-horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality, Reporte Interno 165, Departamento de Matemáticas, CINVESTAV-IPN, México, 1994.
  10. O. Hernández-Lerma and R. Cavazos-Cadena, Density estimation and adaptive control of Markov processes: average and discounted criteria, Acta Appl. Math. 20 (1990), 285-307.
  11. S. A. Lippman, On dynamic programming with unbounded rewards, Manag. Sci. 21 (1975), 1225-1233.
  12. P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60.
  13. U. Rieder, Measurable selection theorems for optimization problems, Manuscripta Math. 24 (1978), 115-131.
  14. J. A. E. E. Van Nunen and J. Wessels, A note on dynamic programming with unbounded rewards, Manag. Sci. 24 (1978), 576-580.
Pages:
267-280
Main language of publication
English
Received
1998-08-04
Published
1999
Exact and natural sciences