华南理工大学学报(自然科学版) ›› 2015, Vol. 43 ›› Issue (12): 9-17.doi: 10.3969/j.issn.1000-565X.2015.12.002

• 动力与电气工程 • 上一篇    下一篇

求解多目标暂态电压紧急控制的强化学习方法

邓卓明 刘明波   

  1. 华南理工大学 电力学院,广东 广州 510640
  • 收稿日期:2015-01-22 修回日期:2015-04-27 出版日期:2015-12-25 发布日期:2015-11-01
  • 通信作者: 邓卓明(1992-),男,博士生,主要从事电力系统优化与控制研究 E-mail:1274503618@qq.com
  • 作者简介:邓卓明(1992-),男,博士生,主要从事电力系统优化与控制研究
  • 基金资助:
    国家自然科学基金资助项目(51277078)

Reinforcement Learning Method Applied to Multiobjective Emergency Control of Transient Voltage Security

Deng Zhuo-ming Liu Ming-bo   

  1. School of Electric Power,South China University of Technology,Guangzhou 510640,Guangdong,China
  • Received:2015-01-22 Revised:2015-04-27 Online:2015-12-25 Published:2015-11-01
  • Contact: 邓卓明(1992-),男,博士生,主要从事电力系统优化与控制研究 E-mail:1274503618@qq.com
  • About author:邓卓明(1992-),男,博士生,主要从事电力系统优化与控制研究
  • Supported by:
    Supported by the National Natural Science Fondation of China(51277078)

摘要: 暂态电压崩溃事故严重威胁电网安全,迫切需要采取相应紧急控制. 以发电机端电压参考值调节量和容抗器无功投切量为控制变量,利用轨迹灵敏度搭建多目标暂态电压安全紧急控制模型,分两阶段最小化关键负荷节点电压偏差、控制代价和发电机无功出力比例的方差. 采用简化强化学习方法求解该模型,重设解空间状态函数并调整动作幅度,引进状态敏感度解决探索和应用的矛盾. 将可行域划分为若干小区域,单独评判它们存在最优解的可能,缩小搜索范围. 通过优化搜索策略进一步提高帕累托前沿质量,并依据实际运行状况拟定目标函数权重并确定折中解. 在某省级电网进行时域仿真,结果表明,所提出方法能将暂态电压纠正到安全状态,且在求解效率和帕累托前沿质量方面比法线边界交叉法优越.

关键词: 暂态电压安全, 紧急控制, 轨迹灵敏度, 多目标优化, 强化学习

Abstract: Transient voltage collapse poses a serious threat to the security of power grid,which results in an urgent need for emergency control.In this paper,first,by taking the reference value increments of generator terminal volt- ages and the reactive power outputs of capacitors/reactors as the control variables,a multiobjective emergency con- trol model for ensuring transient voltage security is constructed by using trajectory sensitivity.In the model,the de- viation of voltages at key load nodes,the control cost and the variance of the reactive power output ratio of genera- tors are minimized respectively in two stages.Next,the proposed model is solved by means of the reduced rein- forcement learning method which resets the state functions of solution space and adjusts the magnitude of actions,and the state sensitivity is introduced to solve the conflicts between exploration and application.Then,the feasible region is divided into small zones,so that the probability that there may be optimal solutions in each zone can be judged alone and the search range is thus narrowed.Moreover,the quality of pareto frontier is further improved by optimizing the searching strategies,and the weights corresponding to the objective functions are determined accord- ing to the actual operating status,with the compromise optimal solution being given.Finally,the time-domain sim- ulation is performed on a provincial power grid.It is found that the proposed method can restore the transient volt- age security and is superior to the normal boundary intersection method in terms of the solution efficiency and the quality of PF.

Key words: transient voltage security, emergency control, trajectory sensitivity, multiobjective optimization, rein- forcement learning