Journal of South China University of Technology(Natural Science Edition) ›› 2012, Vol. 40 ›› Issue (6): 70-75.

• Electronics, Communication & Automation Technology • Previous Articles     Next Articles

Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse

Hao Chuan-chuanFang ZhouLi Ping2   

  1. 1. Department of Control Science and Engineering,Zhejiang University,Hangzhou 310027,Zhejiang,China; 2. School of Aeronautics and Astronautics,Zhejiang University,Hangzhou 310027,Zhejiang,China
  • Received:2011-09-30 Revised:2012-03-09 Online:2012-06-25 Published:2012-05-03
  • Contact: 方舟(1980-) ,男,博士,副教授,主要从事无人机导航制导与控制、先进学习控制方法的研究. E-mail: zfang@zju.edu.cn E-mail:cchao@ iipc.zju.edu.cn
  • About author:郝钏钏(1984-) ,男,博士生,主要从事无人机建模与控制、强化学习控制的研究.
  • Supported by:

    国家自然科学基金青年科学基金资助项目( 61004066) ; 浙江省科技计划项目( 2011C23106)

Abstract:

Though eNAC ( episodic Natural Actor-Critic) algorithm,an episode-based reinforcement learning control algorithm,is theoretically of excellent learning performance,it is inefficient in learning because many episodes are required to obtain a good policy. In order to solve this problem,a new algorithm named ER-eNAC,which introduces the episode reuse mechanism in eNAC algorithm,is proposed. In ER-eNAC,some of the past episodes are
reused in the estimation procedure of current natural policy gradient for the purpose of using the experience more efficiently,and the reused episodes are weighted in an exponential decay according to the number of policy updates that they have undergone for the purpose of describing their fitness to the current policy. The proposed algorithm is then applied to the inverted pendulum control. Simulated results show that,as compared with eNAC algorithm,ER-eNAC algorithm is more effective because it significantly reduces the number of episodes for learning and remarkably improves the learning efficiency.

Key words: reinforcement learning, natural policy gradient, experience reuse, inverted pendulum control

CLC Number: