Journal of South China University of Technology (Natural Science Edition) ›› 2015, Vol. 43 ›› Issue (12): 9-17.doi: 10.3969/j.issn.1000-565X.2015.12.002

• Power & Electrical Engineering • Previous Articles     Next Articles

Reinforcement Learning Method Applied to Multiobjective Emergency Control of Transient Voltage Security

Deng Zhuo-ming Liu Ming-bo   

  1. School of Electric Power,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2015-01-22 Revised:2015-04-27 Online:2015-12-25 Published:2015-11-01
  • Contact: 邓卓明(1992-),男,博士生,主要从事电力系统优化与控制研究 E-mail:1274503618@qq.com
  • About author:邓卓明(1992-),男,博士生,主要从事电力系统优化与控制研究
  • Supported by:
    Supported by the National Natural Science Fondation of China(51277078)

Abstract: Transient voltage collapse poses a serious threat to the security of power grid,which results in an urgent need for emergency control.In this paper,first,by taking the reference value increments of generator terminal volt- ages and the reactive power outputs of capacitors/reactors as the control variables,a multiobjective emergency con- trol model for ensuring transient voltage security is constructed by using trajectory sensitivity.In the model,the de- viation of voltages at key load nodes,the control cost and the variance of the reactive power output ratio of genera- tors are minimized respectively in two stages.Next,the proposed model is solved by means of the reduced rein- forcement learning method which resets the state functions of solution space and adjusts the magnitude of actions,and the state sensitivity is introduced to solve the conflicts between exploration and application.Then,the feasible region is divided into small zones,so that the probability that there may be optimal solutions in each zone can be judged alone and the search range is thus narrowed.Moreover,the quality of pareto frontier is further improved by optimizing the searching strategies,and the weights corresponding to the objective functions are determined accord- ing to the actual operating status,with the compromise optimal solution being given.Finally,the time-domain sim- ulation is performed on a provincial power grid.It is found that the proposed method can restore the transient volt- age security and is superior to the normal boundary intersection method in terms of the solution efficiency and the quality of PF.

Key words: transient voltage security, emergency control, trajectory sensitivity, multiobjective optimization, rein- forcement learning