Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

Borkar, V. S.; Mundra Associate, S. M.

Artykuł - szczegóły

Czasopismo

Applicationes Mathematicae

1998-1999 | 25 | 3 | 339-358

Tytuł artykułu

Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

Autorzy

V. S. Borkar , S. M. Mundra Associate

Treść / Zawartość

Pełne teksty:

http://matwbn.icm.edu.pl/ksiazki/zm/zm25/zm2538.pdf [zdalny]

Warianty tytułu

Języki publikacji

EN

Abstrakty

EN

This paper considers Bayesian parameter estimation and an associated adaptive control scheme for controlled Markov chains and diffusions with time-averaged cost. Asymptotic behaviour of the posterior law of the parameter given the observed trajectory is analyzed. This analysis suggests a "cost-biased" estimation scheme and associated self-tuning adaptive control. This is shown to be asymptotically optimal in the almost sure sense.

Słowa kluczowe

EN

time-averaged cost adaptive control asymptotic optimality cost-biased estimate Bayesian estimation

Wydawca

Institute of Mathematics Polish Academy of Sciences

Czasopismo

Applicationes Mathematicae

Rocznik

1998-1999

Tom

25

Numer

3

Strony

339-358

Opis fizyczny

Daty

wydano

1998

otrzymano

1997-08-20

poprawiono

1998-01-06

Twórcy

autor

V. S. Borkar

Department of Computer Science and Automation Indian Institute of Science Bangalore 560012, India

autor

S. M. Mundra Associate

OA Division (SW Team) Samsung Electronics Co. Ltd. Suwon, P.O.B. 105, Kyungki-Do South Korea 440600

Bibliografia

[1] R. Agrawal, D. Teneketzis and V. Anantharam, Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space, IEEE Trans. Automatic Control AC-34 (1989), 1249-1259.
[2] A. Barron, Are Bayes rules consistent in information?, in: Problems in Communication and Computation, T. M. Cover and B. Gopinath (eds.), Springer, New York, 1987, 85-91.
[3] R. N. Bhattacharya, Asymptotic behaviour of several dimensional diffusions, in: Stochastic Nonlinear Systems, L. Arnold and R. Lefever (eds.), Springer, New York, 1981, 86-91.
[4] D. Blackwell and L. Dubins, Merging of opinions with increasing information, Ann. Math. Statist. 33 (1962), 882-887.
[5] V. S. Borkar, Control of Markov chains with long run average cost criterion, in: Stochastic Differential Systems, Stochastic Control Theory and Applications, W. H. Fleming and P. L. Lions (eds.), Springer, New York, 1987, 57-77.
[6] V. S. Borkar, The Kumar-Becker-Lin scheme revisited, J. Optim. Theory Appl. 66 (1990), 289-309.
[7] V. S. Borkar, Self-tuning control of diffusions without the identifiability condition, ibid. 68 (1991), 117-137.
[8] V. S. Borkar, On the Milito-Cruz adaptive control scheme for Markov chains, ibid. 77 (1993), 387-397.
[9] V. S. Borkar, A modified self-tuner for controlled diffusions with an unknown parameter, in: Mathematical Theory of Control (Bombay, 1990), A. V. Balakrishnan and M. C. Joshi (eds.), Marcel Dekker, 1992, 57-67.
[10] V. S. Borkar and M. K. Ghosh, Ergodic and adaptive control of nearest neighbour motions, Math. Control Signals and Systems 4 (1991), 81-98.
[11] V. S. Borkar and M. K. Ghosh, Ergodic control of multidimensional diffusions II: adaptive control, Appl. Math. Optim. 21 (1990), 191-220.
[12] V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains I: finite parameter case, IEEE Trans. Automatic Control 24 (1979), 953-957.
[13] V. S. Borkar and P. P. Varaiya, Identification and adaptive control of Markov chains, SIAM J. Control Optim. 20 (1982), 470-488.
[14] E. K. P. Chong and P. J. Ramadge, Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automatic Control 39 (1994), 1400-1410.
[15] Y. S. Chow and H. Teicher, Probability Theory: Independence, Interchangeability, Martingales, Springer, New York, 1979.
[16] G. B. Di Masi and Ł. Stettner, Bayesian ergodic adaptive control of discrete time Markov processes, Stochastics Stochastic Reports 54 (1995), 301-316.
[17] B. Doshi and S. E. Shreve, Randomized self-tuning control of Markov chains, J. Appl. Probab. 17 (1980), 726-734.
[18] B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications, Adv. Appl. Probab. 14 (1982), 502-525.
[19] P. R. Kumar and A. Becker, A new family of optimal adaptive controllers for Markov chains, IEEE Trans. Automatic Control 27 (1982), 137-142.
[20] P. R. Kumar and W. Lin, Optimal adaptive controllers for Markov chains, ibid. 27 (1982), 756-774.
[21] P. R. Kumar and P. P. Varaiya, Stochastic Systems--Estimation, Identification and Adaptive Control, Prentice-Hall, 1986.
[22] P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40-60.
[23] R. Milito and J. B. Cruz, Jr., An optimization oriented approach to adaptive control of Markov chains, IEEE Trans. Automatic Control 32 (1987), 754-762.
[24] J. N. Tsitsiklis, Asynchronous stochastic approaximation and Q-learning, Machine Learning 16 (1994), 195-202.
[25] K. Van Hee, Bayesian Control of Markov Chains, Math. Center Tracts, 95, Math. Center, Amsterdam, 1978.

Artykuł - szczegóły

Czasopismo

Applicationes Mathematicae

Tytuł artykułu

Bayesian parameter estimation and adaptive control of Markov processes with time-averaged cost

Autorzy

Treść / Zawartość

Warianty tytułu

Języki publikacji

Abstrakty

Słowa kluczowe

Wydawca

Czasopismo

Rocznik

Tom

Numer

Strony

Opis fizyczny

Daty

Twórcy

Bibliografia

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA