On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Drabik, Ewa

Artykuł - szczegóły

Czasopismo

Applicationes Mathematicae

1995-1996 | 23 | 4 | 449-473

Tytuł artykułu

On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Autorzy

Ewa Drabik

Treść / Zawartość

Pełne teksty:

http://matwbn.icm.edu.pl/ksiazki/zm/zm23/zm2347.pdf [zdalny]

Warianty tytułu

Języki publikacji

EN

Abstrakty

EN

Two kinds of strategies for a multiarmed Markov bandit problem with controlled arms are considered: a strategy with forcing and a strategy with randomization. The choice of arm and control function in both cases is based on the current value of the average cost per unit time functional. Some simulation results are also presented.

Słowa kluczowe

EN

selfoptimizing strategies adaptative control invariant measure multiarmed bandit stochastic control

Wydawca

Institute of Mathematics Polish Academy of Sciences

Czasopismo

Applicationes Mathematicae

Rocznik

1995-1996

Tom

23

Numer

4

Strony

449-473

Opis fizyczny

Daty

wydano

1996

otrzymano

1995-03-21

poprawiono

1995-11-23

Twórcy

autor

Ewa Drabik

Institute of Computer Science, Białystok Technical University, Wiejska 45a, 15-351 Białystok, Poland

Bibliografia

[1] R. Agrawal, Minimizing the learning loss in adaptative control of Markov chains under the weak accessibility condition, J. Appl. Probab. 28 (1991), 779-790.
[2] R. Agrawal and D. Teneketzis, Certainty equivalence control with forcing: revisited, Systems Control Lett. 13 (1989), 405-412.
[3] V. Anantharam, P. Varaiya and J. Warland, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part I: i.i.d. rewards, IEEE Trans. Automat. Control AC-32 (11) (1987), 969-977.
[4] V. Anantharam, P. Varaiya and J. Warland, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part II: Markovian rewards, ibid., 977-983.
[5] W. Feller, An Introduction to Probability Theory and its Applications, Vol. II, Wiley, New York, 1966.
[6] J. C. Gittins, Multi-armed Bandit Allocation Indices, Wiley, 1989.
[7] K. D. Glazebrook, On a sufficient condition for superprocesses due to Whittle, J. Appl. Probab. 19 (1982), 99-110.
[8] O. Hernández-Lerma, Adaptative Markov Control Processes, Springer, 1989.
[9] Ł. Stettner, On nearly self-optimizing strategies for a discrete-time uniformly ergodic adaptative model, Appl. Math. Optim. 27 (1993), 161-177.

Artykuł - szczegóły

Czasopismo

Applicationes Mathematicae

Tytuł artykułu

On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Autorzy

Treść / Zawartość

Warianty tytułu

Języki publikacji

Abstrakty

Słowa kluczowe

Wydawca

Czasopismo

Rocznik

Tom

Numer

Strony

Opis fizyczny

Daty

Twórcy

Bibliografia

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA