Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse

Journal of South China University of Technology(Natural Science Edition) ›› 2012, Vol. 40 ›› Issue (6): 70-75.

• Electronics, Communication & Automation Technology • Previous Articles Next Articles

Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse

Hao Chuan-chuan¹Fang Zhou²Li Ping²

1. Department of Control Science and Engineering,Zhejiang University,Hangzhou 310027,Zhejiang,China; 2. School of Aeronautics and Astronautics,Zhejiang University,Hangzhou 310027,Zhejiang,China

Received:2011-09-30 Revised:2012-03-09 Online:2012-06-25 Published:2012-05-03
Contact: 方舟(1980-) ，男，博士，副教授，主要从事无人机导航制导与控制、先进学习控制方法的研究． E-mail: zfang@zju．edu．cn E-mail:cchao@ iipc.zju.edu.cn
About author:郝钏钏(1984-) ，男，博士生，主要从事无人机建模与控制、强化学习控制的研究．
Supported by:
国家自然科学基金青年科学基金资助项目( 61004066) ; 浙江省科技计划项目( 2011C23106)

Abstract

Abstract:

Though eNAC ( episodic Natural Actor-Critic) algorithm,an episode-based reinforcement learning control algorithm,is theoretically of excellent learning performance,it is inefficient in learning because many episodes are required to obtain a good policy. In order to solve this problem,a new algorithm named ER-eNAC,which introduces the episode reuse mechanism in eNAC algorithm,is proposed. In ER-eNAC,some of the past episodes are
reused in the estimation procedure of current natural policy gradient for the purpose of using the experience more efficiently,and the reused episodes are weighted in an exponential decay according to the number of policy updates that they have undergone for the purpose of describing their fitness to the current policy. The proposed algorithm is then applied to the inverted pendulum control. Simulated results show that,as compared with eNAC algorithm,ER-eNAC algorithm is more effective because it significantly reduces the number of episodes for learning and remarkably improves the learning efficiency.

Key words: reinforcement learning, natural policy gradient, experience reuse, inverted pendulum control

CLC Number:

TP273.22

Hao Chuan-chuan Fang Zhou Li Ping. Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse[J]. Journal of South China University of Technology(Natural Science Edition), 2012, 40(6): 70-75.

[1]	WANG Fujian, CHENG Huiling, MA Dongfang, et al. Reconstruction of Urban Vehicle Path Chain Based on Deep Inverse Reinforcement Learning [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(7): 120-128.
[2]	CHEN Feng, MAO Haobin, CAI Jiling, et al.. Multidimensional Cross-Layer Bandwidth Prediction for Low-Latency Real-Time Video [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(11): 18-27.
[3]	XU Lunhui, YU Jiaxin, PEI Mingyang, et al. Repositioning Strategy for Ride-Hailing Vehicles Based on Geometric Road Network Structure and Reinforcement Learning [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(10): 99-109.
[4]	WANG Gao, CHEN Xiaohong, LIU Ning, et al. A Robot Grasping Policy Based on Viewpoint Selection Experience Enhancement Algorithm [J]. Journal of South China University of Technology(Natural Science Edition), 2022, 50(9): 126-137.
[5]	YAN Junwei HUANG Qi ZHOU Xuan . Energy-saving Optimization Operation of Central Air-conditioning System Based on Double-DQN Algorithm [J]. Journal of South China University of Technology (Natural Science Edition), 2019, 47(1): 135-144.
[6]	Xu Yu- bin Chen Jia- mei Ma Lin. Q- Learning- Based Network Selection Strategy for Access Control in WLAN/WIMAX [J]. Journal of South China University of Technology (Natural Science Edition), 2013, 41(8): 41-46,60.
[7]	Yu Tao Hu Xi-bing Liu Jing. Multi-Objective Optimal Power Flow Calculation Based on Multi-Step Q（λ） Learning Algorithm [J]. Journal of South China University of Technology (Natural Science Edition), 2010, 38(10): 139-145.
[8]	Bian Jian-yong Xu Jian-min Pei Hai-long . Video Vehicle Tracking Based on Reinforcement Learning [J]. Journal of South China University of Technology (Natural Science Edition), 2008, 36(10): 57-60,66.

Efficient Reinforcement-Learning Control Algorithm Using Experience Reuse

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 8

Recommended Articles

Metrics

Comments