Epoch-incremental reinforcement learning algorithms

Zajdel, Roman

doi:10.2478/amcs-2013-0047

Artykuł - szczegóły

Czasopismo

International Journal of Applied Mathematics and Computer Science

2013 | 23 | 3 | 623-635

Tytuł artykułu

Epoch-incremental reinforcement learning algorithms

Autorzy

Roman Zajdel

Treść / Zawartość

Pełne teksty:

http://matwbn.icm.edu.pl/ksiazki/amc/amc23/amc23311.pdf [zdalny]

Warianty tytułu

Języki publikacji

EN

Abstrakty

EN

In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.

Słowa kluczowe

EN

reinforcement learning epoch-incremental algorithm grid world

Wydawca

University of Zielona Gora Press

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2013

Tom

23

Numer

3

Strony

623-635

Opis fizyczny

Daty

wydano

2013

otrzymano

2012-07-10

poprawiono

2013-03-13

poprawiono

2013-06-20

Twórcy

autor

Roman Zajdel

Faculty of Electrical and Computer Engineering, Rzeszów University of Technology, Al. Powstańców Warszawy 12, 35-959 Rzeszów, Poland

Bibliografia

Atiya, A.F., Parlos, A.G. and Ingber, L. (2003). A reinforcement learning method based on adaptive simulated annealing, Proceedings of the 46th International Midwest Symposium on Circuits and Systems, Cairo, Egypt, pp. 121-124.
Barto, A., Sutton, R. and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning problem, IEEE Transactions on Systems, Man, and Cybernetics 13(5): 834-847.
Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of T D(λ) for reinforcement learning, Journal of Artificial Intelligence Research 2: 287-318.
Crook, P. and Hayes, G. (2003). Learning in a state of confusion: Perceptual aliasing in grid world navigation, Technical Report EDI-INF-RR-0176, University of Edinburgh, Edinburgh.
Ernst, D., Geurts, P. and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning, Journal of Machine Learning Research 6: 503-556.
Forbes, J. R. N. (2002). Reinforcement Learning for Autonomous Vehicles, Ph.D. thesis, University of California, Berkeley, CA.
Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in UCT, Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, pp. 273-280.
Kaelbing, L.P., Litman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence 4(1): 237-285.
Krawiec, K., Jaśkowski, W.G. and Szubert, M.G. (2011). Evolving small-board Go players using coevolutionary temporal difference learning with archives, International Journal of Applied Mathematics and Computer Science 21(4): 717-731, DOI: 10.2478/v10006-011-0057-3.
Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration, Journal of Machine Learning Research 4: 1107-1149.
Lanzi, P. (2000). Adaptive agents with reinforcement learning and internal memory, From Animals to Animats 6: Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, USA, pp. 333-342.
Lin, L.-J. (1993). Reinforcement Learning for Robots Using Neural Networks, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
Markowska-Kaczmar, U. and Kwaśnicka, H. (2005). Neural Networks Applications, Wrocław University of Technology Press, Wrocław, (in Polish).
Moore, A. and Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning 13(1): 103-130, DOI: 10.1007/BF00993104.
Moriarty, D., Schultz, A. and Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning, Journal of Artificial Intelligence Research 11: 241-276.
Peng, J. and Williams, R. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior 1(4): 437-454.
Reynolds, S. (2002). Experience stack reinforcement learning for off-policy control, Technical Report CSRP-02-1, University of Birmingham, Birmingham, ftp://ftp.cs.bham.ac.uk/pub/tech-reports/2002/CSRP-02-01.ps.gz.
Riedmiller, M. (2005). Neural reinforcement learning to swing-up and balance a real pole, Proceedings of the IEEE 2005 International Conference on Systems, Man and Cybernetics, Big Island, HI, USA, pp. 3191-3196.
Rummery, G. and Niranjan, M. (1994). On-line q-learning using connectionist systems, Technical Report CUED/FINFENG/TR 166, Cambridge University, Cambridge.
Smart, W. and Kaelbing, L. (2002). Effective reinforcement learning for mobile robots, Proceedings of the International Conference on Robotics and Automation, Washington, DC, USA, pp. 3404-3410.
Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, USA, pp. 216-224.
Sutton, R. (1991). Planning by incremental dynamic programming, Proceedings of the 8th International Workshop on Machine Learning, Evanston, IL, USA, pp. 353-357.
Sutton, R. and Barto, A. (1998). Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.
Vanhulsel, M., Janssens, D. and Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications 36(4): 8032-8039.
Watkins, C. (1989). Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge.
Whiteson, S. (2012). Evolutionary computation for reinforcement learning, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, Berlin, pp. 325-358.
Whiteson, S. and Stone, P. (2006). Evolutionary function approximation for reinforcement learning, Journal of Machine Learning Research 7: 877-917.
Ye, C., Young, N.H.C. and Wang, D. (2003). A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 33(1): 17-27.
Zajdel, R. (2012). Fuzzy epoch-incremental reinforcement learning algorithm, in L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh and J.M. Zurada (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer-Verlag, Berlin/Heidelberg, pp. 359-366.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

10.2478/amcs-2013-0047

Identyfikator YADDA

bwmeta1.element.bwnjournal-article-amcv23z3p623bwm

Artykuł - szczegóły

Czasopismo

International Journal of Applied Mathematics and Computer Science

Tytuł artykułu

Epoch-incremental reinforcement learning algorithms

Autorzy

Treść / Zawartość

Warianty tytułu

Języki publikacji

Abstrakty

Słowa kluczowe

Wydawca

Czasopismo

Rocznik

Tom

Numer

Strony

Opis fizyczny

Daty

Twórcy

Bibliografia

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA