PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2013 | 23 | 3 | 623-635
Tytuł artykułu

Epoch-incremental reinforcement learning algorithms

Autorzy
Treść / Zawartość
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.
Rocznik
Tom
23
Numer
3
Strony
623-635
Opis fizyczny
Daty
wydano
2013
otrzymano
2012-07-10
poprawiono
2013-03-13
poprawiono
2013-06-20
Twórcy
autor
  • Faculty of Electrical and Computer Engineering, Rzeszów University of Technology, Al. Powstańców Warszawy 12, 35-959 Rzeszów, Poland
Bibliografia
  • Atiya, A.F., Parlos, A.G. and Ingber, L. (2003). A reinforcement learning method based on adaptive simulated annealing, Proceedings of the 46th International Midwest Symposium on Circuits and Systems, Cairo, Egypt, pp. 121-124.
  • Barto, A., Sutton, R. and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning problem, IEEE Transactions on Systems, Man, and Cybernetics 13(5): 834-847.
  • Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of T D(λ) for reinforcement learning, Journal of Artificial Intelligence Research 2: 287-318.
  • Crook, P. and Hayes, G. (2003). Learning in a state of confusion: Perceptual aliasing in grid world navigation, Technical Report EDI-INF-RR-0176, University of Edinburgh, Edinburgh.
  • Ernst, D., Geurts, P. and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning, Journal of Machine Learning Research 6: 503-556.
  • Forbes, J. R. N. (2002). Reinforcement Learning for Autonomous Vehicles, Ph.D. thesis, University of California, Berkeley, CA.
  • Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in UCT, Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, pp. 273-280.
  • Kaelbing, L.P., Litman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence 4(1): 237-285.
  • Krawiec, K., Jaśkowski, W.G. and Szubert, M.G. (2011). Evolving small-board Go players using coevolutionary temporal difference learning with archives, International Journal of Applied Mathematics and Computer Science 21(4): 717-731, DOI: 10.2478/v10006-011-0057-3.
  • Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration, Journal of Machine Learning Research 4: 1107-1149.
  • Lanzi, P. (2000). Adaptive agents with reinforcement learning and internal memory, From Animals to Animats 6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, USA, pp. 333-342.
  • Lin, L.-J. (1993). Reinforcement Learning for Robots Using Neural Networks, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
  • Markowska-Kaczmar, U. and Kwaśnicka, H. (2005). Neural Networks Applications, Wrocław University of Technology Press, Wrocław, (in Polish).
  • Moore, A. and Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning 13(1): 103-130, DOI: 10.1007/BF00993104.
  • Moriarty, D., Schultz, A. and Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning, Journal of Artificial Intelligence Research 11: 241-276.
  • Peng, J. and Williams, R. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior 1(4): 437-454.
  • Reynolds, S. (2002). Experience stack reinforcement learning for off-policy control, Technical Report CSRP-02-1, University of Birmingham, Birmingham, ftp://ftp.cs.bham.ac.uk/pub/tech-reports/2002/CSRP-02-01.ps.gz.
  • Riedmiller, M. (2005). Neural reinforcement learning to swing-up and balance a real pole, Proceedings of the IEEE 2005 International Conference on Systems, Man and Cybernetics, Big Island, HI, USA, pp. 3191-3196.
  • Rummery, G. and Niranjan, M. (1994). On-line q-learning using connectionist systems, Technical Report CUED/FINFENG/TR 166, Cambridge University, Cambridge.
  • Smart, W. and Kaelbing, L. (2002). Effective reinforcement learning for mobile robots, Proceedings of the International Conference on Robotics and Automation, Washington, DC, USA, pp. 3404-3410.
  • Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, USA, pp. 216-224.
  • Sutton, R. (1991). Planning by incremental dynamic programming, Proceedings of the 8th International Workshop on Machine Learning, Evanston, IL, USA, pp. 353-357.
  • Sutton, R. and Barto, A. (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.
  • Vanhulsel, M., Janssens, D. and Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications 36(4): 8032-8039.
  • Watkins, C. (1989). Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge.
  • Whiteson, S. (2012). Evolutionary computation for reinforcement learning, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, Berlin, pp. 325-358.
  • Whiteson, S. and Stone, P. (2006). Evolutionary function approximation for reinforcement learning, Journal of Machine Learning Research 7: 877-917.
  • Ye, C., Young, N.H.C. and Wang, D. (2003). A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 33(1): 17-27.
  • Zajdel, R. (2012). Fuzzy epoch-incremental reinforcement learning algorithm, in L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh and J.M. Zurada (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer-Verlag, Berlin/Heidelberg, pp. 359-366.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.bwnjournal-article-amcv23z3p623bwm
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.