ArticleOriginal scientific text

Title

Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

Authors 1, 2

Affiliations

  1. Department of Computer Science and Automation Indian Institute of Science Bangalore 560012, India
  2. OA Division (SW Team) Samsung Electronics Co. Ltd. Suwon, P.O.B. 105, Kyungki-Do South Korea 440600

Abstract

This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.

Keywords

time-averaged cost, adaptive control, asymptotic optimality, cost-biased estimate, Bayesian estimation

Bibliography

  1. R. Agrawal, D. Teneketzis and V. Anantharam, Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space, IEEE Trans. Automatic Control AC-34 (1989), 1249-1259.
  2. A. Barron, Are Bayes rules consistent in information?, in: Problems in Communication and Computation, T. M. Cover and B. Gopinath (eds.), Springer, New York, 1987, 85-91.
  3. R. N. Bhattacharya, Asymptotic behaviour of several dimensional diffusions, in: Stochastic Nonlinear Systems, L. Arnold and R. Lefever (eds.), Springer, New York, 1981, 86-91.
  4. D. Blackwell and L. Dubins, Merging of opinions with increasing information, Ann. Math. Statist. 33 (1962), 882-887.
  5. V. S. Borkar, Control of Markov chains with long run average cost criterion, in: Stochastic Differential Systems, Stochastic Control Theory and Applications, W. H. Fleming and P. L. Lions (eds.), Springer, New York, 1987, 57-77.
  6. V. S. Borkar, The Kumar-Becker-Lin scheme revisited, J. Optim. Theory Appl. 66 (1990), 289-309.
  7. V. S. Borkar, Self-tuning control of diffusions without the identifiability condition, ibid. 68 (1991), 117-137.
  8. V. S. Borkar, On the Milito-Cruz adaptive control scheme for Markov chains, ibid. 77 (1993), 387-397.
  9. V. S. Borkar, A modified self-tuner for controlled diffusions with an unknown parameter, in: Mathematical Theory of Control (Bombay, 1990), A. V. Balakrishnan and M. C. Joshi (eds.), Marcel Dekker, 1992, 57-67.
  10. V. S. Borkar and M. K. Ghosh, Ergodic and adaptive control of nearest neighbour motions, Math. Control Signals and Systems 4 (1991), 81-98.
  11. V. S. Borkar and M. K. Ghosh, Ergodic control of multidimensional diffusions II: adaptive control, Appl. Math. Optim. 21 (1990), 191-220.
  12. V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains I: finite parameter case, IEEE Trans. Automatic Control 24 (1979), 953-957.
  13. V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains, SIAM J. Control Optim. 20 (1982), 470-488.
  14. E. K. P. Chong and P. J. Ramadge, Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automatic Control 39 (1994), 1400-1410.
  15. Y. S. Chow and H. Teicher, Probability Theory: Independence, Interchangeability, Martingales, Springer, New York, 1979.
  16. G. B. Di Masi and Ł. Stettner, Bayesian ergodic adaptive control of discrete time Markov processes, Stochastics Stochastic Reports 54 (1995), 301-316.
  17. B. Doshi and S. E. Shreve, Randomized self-tuning control of Markov chains, J. Appl. Probab. 17 (1980), 726-734.
  18. B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications, Adv. Appl. Probab. 14 (1982), 502-525.
  19. P. R. Kumar and A. Becker, A new family of optimal adaptive controllers for Markov chains, IEEE Trans. Automatic Control 27 (1982), 137-142.
  20. P. R. Kumar and W. Lin, Optimal adaptive controllers for Markov chains, ibid. 27 (1982), 756-774.
  21. P. R. Kumar and P. P. Varaiya, Stochastic Systems--Estimation, Identification and Adaptive Control, Prentice-Hall, 1986.
  22. P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60.
  23. R. Milito and J. B. Cruz, Jr., An optimization oriented approach to adaptive control of Markov chains, IEEE Trans. Automatic Control 32 (1987), 754-762.
  24. J. N. Tsitsiklis, Asynchronous stochastic approaximation and Q-learning, Machine Learning 16 (1994), 195-202.
  25. K. Van Hee, Bayesian Control of Markov Chains, Math. Center Tracts, 95, Math. Center, Amsterdam, 1978.
Pages:
339-358
Main language of publication
English
Received
1997-08-20
Accepted
1998-01-06
Published
1998
Exact and natural sciences