PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
1995-1996 | 23 | 4 | 449-473
Tytuł artykułu

On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms

Autorzy
Treść / Zawartość
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Two kinds of strategies for a multiarmed Markov bandit problem with controlled arms are considered: a strategy with forcing and a strategy with randomization. The choice of arm and control function in both cases is based on the current value of the average cost per unit time functional. Some simulation results are also presented.
Rocznik
Tom
23
Numer
4
Strony
449-473
Opis fizyczny
Daty
wydano
1996
otrzymano
1995-03-21
poprawiono
1995-11-23
Twórcy
autor
  • Institute of Computer Science, Białystok Technical University, Wiejska 45a, 15-351 Białystok, Poland
Bibliografia
  • [1] R. Agrawal, Minimizing the learning loss in adaptative control of Markov chains under the weak accessibility condition, J. Appl. Probab. 28 (1991), 779-790.
  • [2] R. Agrawal and D. Teneketzis, Certainty equivalence control with forcing: revisited, Systems Control Lett. 13 (1989), 405-412.
  • [3] V. Anantharam, P. Varaiya and J. Warland, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part I: i.i.d. rewards, IEEE Trans. Automat. Control AC-32 (11) (1987), 969-977.
  • [4] V. Anantharam, P. Varaiya and J. Warland, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays - Part II: Markovian rewards, ibid., 977-983.
  • [5] W. Feller, An Introduction to Probability Theory and its Applications, Vol. II, Wiley, New York, 1966.
  • [6] J. C. Gittins, Multi-armed Bandit Allocation Indices, Wiley, 1989.
  • [7] K. D. Glazebrook, On a sufficient condition for superprocesses due to Whittle, J. Appl. Probab. 19 (1982), 99-110.
  • [8] O. Hernández-Lerma, Adaptative Markov Control Processes, Springer, 1989.
  • [9] Ł. Stettner, On nearly self-optimizing strategies for a discrete-time uniformly ergodic adaptative model, Appl. Math. Optim. 27 (1993), 161-177.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.bwnjournal-article-zmv23i4p449bwm
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.