集成机器学习和元启发式算法的靶点抑制剂活性预测

doi:10.12141/j.issn.1000-565X.250020

华南理工大学学报(自然科学版) ›› 2026, Vol. 54 ›› Issue (2): 91-101.doi: 10.12141/j.issn.1000-565X.250020

集成机器学习和元启发式算法的靶点抑制剂活性预测

凌飞, 顾学荣

华南理工大学生物科学与工程学院/广东省发酵与酶工程重点实验室，广东广州 510006

收稿日期:2025-01-17 出版日期:2026-02-25 发布日期:2025-09-19
通信作者: 顾学荣（1997—），男，硕士生，主要从事靶点药物活性预测研究。 E-mail:202120124398@mail.scut.edu.cn
作者简介:凌飞（1972—），女，博士，教授，主要从事单细胞转录组、药物设计研究。E-mail： fling@scut.edu.cn
基金资助:
国家自然科学基金项目(12322119);国家自然科学基金项目(12401630)

Prediction of Target Inhibitor Activity by Integrating Machine Learning and Metaheuristic Algorithms

LING Fei, GU Xuerong

School of Biology and Biological Engineering/ Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering，South China University of Technology，Guangzhou 510006，Guangdong，China

Received:2025-01-17 Online:2026-02-25 Published:2025-09-19
Contact: 顾学荣（1997—），男，硕士生，主要从事靶点药物活性预测研究。 E-mail:202120124398@mail.scut.edu.cn
About author:凌飞（1972—），女，博士，教授，主要从事单细胞转录组、药物设计研究。E-mail： fling@scut.edu.cn
Supported by:
the National Natural Science Foundation of China(12322119)

摘要/Abstract

摘要：

传统的机器学习（ML）和深度学习（DL）在预测靶点抑制剂的选择性方面发挥着关键作用。许多基于现有数据集的模型可用于预测化合物的生物活性，但对于ML和DL用于此类活性预测任务的性能孰优孰劣仍存在争议。该文基于不同分子特征构建数据集，运用10种元启发式算法优化11种ML和DL模型的超参数，旨在系统比较模型的预测性能，识别最优模型。结果表明，基于元启发式超参数优化算法的ML和DL模型，在预测性能上显著优于采用传统网格搜索优化超参数的ML和DL模型。此外，在低维特征空间中，基于分子图的DL模型（如SSA-GAT和SSA-Attentive FP）能够通过端到端的学习机制，自动从数据中提取有效特征，其性能优于ML模型；而在高维特征空间（如RDKit计算的ECFP、AtomPairs、MACCS指纹组合形成的特征空间）中，ML方法借助信息互补的分子特征和元启发式优化算法的高阶寻优能力，能够有效捕捉特征之间的复杂交互关系，通常在高维建模中展现出更高的准确性与鲁棒性。这些发现为指导选择ML和DL方法用于靶点抑制剂的活性预测提供了有用的信息。

关键词: 元启发式优化算法, 机器学习, 深度学习, 靶点抑制剂活性, 分子指纹, 分子图

Abstract:

Traditional machine learning (ML) and deep learning (DL) play a key role in predicting the activity of target inhibitors. Many models based on existing datasets can predict compound bioactivity. However, debate persists regarding whether ML or DL performs better for such prediction tasks. In this study, datasets were constructed based on different molecular representations. Ten metaheuristic algorithms were applied to optimize the hyperparameters of eleven ML and DL models, aiming to systematically compare their predictive performance and identify the optimal ones. The results show that ML and DL models whose hyperparameters were optimized by metaheuristic algorithms significantly outperformed those optimized using the traditional grid search method. Furthermore, in low-dimensional feature spaces, graph-based DL models, such as SSA-GAT and SSA-Attentive FP, can automatically extract informative features from data via an end-to-end learning mechanism, yielding better performance than ML models. In contrast, in high-dimensional feature spaces (e.g., the feature space formed by combining RDKit descriptors with ECFP, AtomPairs, and MACCS fingerprints), ML methods, leveraging the complementary information in molecular features and the powerful optimization capability of metaheuristic algorithms, can effectively capture complex feature interactions. Consequently, ML methods often demonstrate higher accuracy and robustness in high-dimensional modeling. These findings provide valuable guidance for selecting between ML and DL approaches for target inhibitor activity prediction.

Key words: metaheuristic optimization algorithm, machine learning, deep learning, target inhibitor activity, molecular fingerprints, molecular graph

中图分类号:

TP18

凌飞, 顾学荣. 集成机器学习和元启发式算法的靶点抑制剂活性预测[J]. 华南理工大学学报(自然科学版), 2026, 54(2): 91-101.

LING Fei, GU Xuerong. Prediction of Target Inhibitor Activity by Integrating Machine Learning and Metaheuristic Algorithms[J]. Journal of South China University of Technology(Natural Science Edition), 2026, 54(2): 91-101.

图/表 10

表1

表2

图1

图2

图3

图4

图5

图6

表3

表 4

参考文献 26

[1]	FENG B， LIU Z， HUANG N，et al ．A bioactivity foundation model using pairwise meta-learning［J］．Nature Machine Intelligence，2024，6（8）：962-974.
[2]	ZHANG H， MAO J， QI H Z，et al ．Developing novel computational prediction models for assessing chemical-induced neurotoxicity using naïve Bayes classifier technique［J］．Food and Chemical Toxicology，2020，143：111513/1-11
[3]	PRABHA N K， SHARMA A， SANDHU H，et al ．TNFipred：a classification model to predict TNF-α inhibitors［J］．Molecular Diversity，2024，28（3）：1697-1707.
[4]	LI B， KANG X， ZHAO D，et al ．Machine learning models combined with virtual screening and molecular docking to predict human topoisomerase I inhibitors［J］．Molecules，2019，24（11）：2107/1-16.
[5]	SHI J， ZHAO G， WEI Y ．Computational QSAR model combined molecular descriptors and fingerprints to predict HDAC1 inhibitors［J］．Medecine Sciences，2018，34：52-58.
[6]	KANG M G， KANG N S ．Predictive model for drug-induced liver injury using deep neural networks based on substructure space［J］．Molecules，2021，26（24）：7548/1-16
[7]	DENG J， YANG Z， WANG H，et al ．A systematic study of key elements underlying molecular property prediction［J］．Nature Communications，2023，14（1）：6395/1-20．
[8]	LV Q， CHEN G， YANG Z，et al ．Meta learning with graph attention networks for low-data drug discovery［J］．IEEE Transactions on Neural Networks and Learning Systems，2023，35（8）：11218-11230.
[9]	YANG K， SWANSON K， JIN W，et al ．Analyzing learned molecular representations for property prediction［J］．Journal of Chemical Information and Modeling，2019，59（8）：3370-3388.
[10]	MASTROPIETRO A， PASCULLI G， BAJORETH J ．Learning characteristics of graph neural networks predicting protein-ligand affinities［J］．Nature Machine Intelligence，2023，5（12）：1427-1436.
[11]	MAYR A， KLAMBAUER G， UNTERTHINER T，et al ．Large-scale comparison of machine learning methods for drug target prediction on ChEMBL［J］．Chemical Science，2018，9：5441-5451.
[12]	WAINER J， FONSECA P ．How to tune the RBF SVM hyperparameters？An empirical evaluation of 18 search algorithms［J］．Artificial Intelligence Review，2021，54（6）：4771-4797.
[13]	BERGSTRA J， BENGIO Y ．Random search for hyper-parameter optimization［J］．Journal of Machine Lear-ning Research，2012，13（2）：281-305.
[14]	WANG X， JIN Y， SCHMITT S，et al ．Recent advances in Bayesian optimization［J］．ACM Computing Surveys，2023，55（13s）：1-36.
[15]	YAMASHITA R， NISHIO M， DO R K G，et al ．Convolutional neural networks：an overview and application in radiology［J］．Insights into Imaging，2018，9：611-629.
[16]	CERETO-MASSAGUÉ A， OJEDA M J， VALLS C，et al ．Molecular fingerprint similarity search in virtual screening［J］．Methods，2015，71：58-63.
[17]	CARHART R E， SMITH D H， VENKATARAGHAVAN R ．Atom pairs as molecular features in structure-activity studies：definition and applications［J］．Journal of Chemical Information and Computer Sciences，1985，25（2）：64-73.
[18]	O’ BOYLE N M， BANCK M， JAMES C A，et al ．Open Babel：an open chemical toolbox［J］．Journal of Cheminformatics，2011，3：33/1-14.
[19]	GOBBI A， POPPINGER D ．Genetic optimization of combinatorial libraries［J］．Biotechnology and Bioengineering，1998，61（1）：47-54.
[20]	SIEG J， FELDMANN C W， HEMMERICH J，et al ．MolPipeline：a Python package for processing mole-cules with RDKit in scikit-learn［J］．Journal of Chemical Information and Modeling，2024，64（24）：9027-9033.
[21]	DUVENAUD D K， MACLAURIN D， IPARRAGUIRRE J，et al ．Convolutional networks on graphs for learning molecular fingerprints［C］∥ Proceedings of the 29th International Conference on Neural Information Proce-ssing Systems．Cambridge：MIT Press，2015：2224-2232.
[22]	ZHAO W， WANG L， MIRJALILI S ．Artificial hummingbird algorithm：a new bio-inspired optimizer with its engineering applications［J］．Computer Methods in Applied Mechanics and Engineering，2022，388：114194/1-45.
[23]	DORIGO M， MANIEZZO V， COLORNI A ．Ant system：optimization by a colony of cooperating agents［J］．IEEE Transactions on Systems，Man，and Cybernetics，Part B （Cybernetics），1996，26（1）：29-41.
[24]	WU J， CHEN Y， WU J，et al ．Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors［J］．Journal of Cheminformatics，2024，16：13/1-22.
[25]	JIANG D， WU Z， HSIEH C Y，et al ．Could graph neural networks learn better molecular representation for drug discovery？A comparison study of descriptor-based and graph-based models［J］．Journal of Cheminformatics，2021，13：12/1-23.
[26]	DE P， KAR S， AMBURE P，et al ．Prediction reliability of QSAR models：an overview of various validation tools［J］．Archives of Toxicology，2022，96：1279-1295.

类型	名字	维度/位	模块
描述符	RDKit	210	rdkit.Chem.Descriptors
分子指纹	ECFP	1 024	deepchem.feat.CircularFingerprint
分子指纹	MACCS	167	rdkit.Chem.MACCSkeys
分子指纹	AtomParis	1 024	rdkit.Chem.AtomPairs
分子指纹	FP2	1 024	rdkit.Chem.RDKFingerprint
分子指纹	PharmacoPFP	38	rdkit.Chem.Pharm2D
分子图			MolGraphConvMolFeaturizer
分子图			ConvMolFeaturizer

名称	基本思想
遗传算法（GA）	模拟自然选择和遗传机制，通过选择、交叉和变异等操作逐步逼近最优解
差分进化算法（DE）	通过个体之间的差分变异和重组合来探索解空间和实现全局最优
朴素贝叶斯算法（NB）	基于贝叶斯定理和条件独立假设，通过计算后验概率进行分类决策
粒子群算法（PSO）	模拟群体觅食行为，通过粒子的速度和位置更新，在解空间中搜索最优解
模拟退火算法（SA）	模仿物理退火过程，通过控制温度逐步减少系统能量，以寻找全局最优解
蚁群算法（ACO）	模仿蚂蚁觅食过程，通过信息素的传播与更新，在解空间中逐步逼近最优解
麻雀搜索算法（SSA）	模拟麻雀群体觅食行为，结合探索与开发策略实现全局最优化
海鸥算法（SOA）	模拟海鸥群体飞行和觅食行为，通过局部与全局搜索的相互结合优化解空间
鲸鱼算法（WOA）	模仿鲸鱼围捕猎物的过程，通过包围与螺旋更新策略寻找到最优解
飞蛾扑火算法（MFO）	模拟飞蛾趋光的行为，通过光源的吸引力引导搜索以找到全局最优解

模型	AUC		F₁		BA		时间/h
模型	Grid	SSA	Grid	SSA	Grid	SSA	Grid	SSA
GAT	0.78	0.87	0.78	0.91	0.67	0.89	1.58	7.47
GCN	0.74	0.83	0.73	0.77	0.66	0.73	5.11	10.85
MPNN	0.80	0.82	0.65	0.83	0.69	0.75	2.90	25.53
Attentive FP	0.70	0.89	0.67	0.89	0.65	0.79	3.02	24.75

方法（特征）	AUC		F₁		BA
方法（特征）	外部数据集	原数据集	外部数据集	原数据集	外部数据集	原数据集
RDKit+MACCS	0.83	0.74	0.86	0.84	0.73	0.81
RDKitDes+ECFP	0.81	0.78	0.86	0.83	0.76	0.82
RDKit+ECFP+MACCS	0.85	0.80	0.84	0.88	0.92	0.86
RDKit+MACCS+AtomPairs	0.85	0.77	0.85	0.83	0.84	0.88
RDKit+ECFP+AtomPairs+MACCS	0.88	0.79	0.87	0.86	0.77	0.86

集成机器学习和元启发式算法的靶点抑制剂活性预测

Prediction of Target Inhibitor Activity by Integrating Machine Learning and Metaheuristic Algorithms

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 26

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李安然, 潘芋燕, 徐震林, 等. 基于Transformer的智能网联车辆预测性运动规划研究[J]. 华南理工大学学报(自然科学版), 2026, 54(3): 52-64.
[2]	王德弘, 张子轩. 基于改进YOLOv5s的输电塔螺栓松动检测[J]. 华南理工大学学报(自然科学版), 2026, 54(2): 25-37.
[3]	杨俊美, 张邦成, 杨璐, 曾徳炉. 一种基于时域全面注意力机制的单通道语音分离模型[J]. 华南理工大学学报(自然科学版), 2026, 54(1): 70-82.
[4]	陈城, 王淼, 王馨瑶, 高志明, 周璇, 闫军威. 基于LSTM-AE的办公建筑照明插座多工况能耗异常检测方法[J]. 华南理工大学学报(自然科学版), 2025, 53(9): 117-126.
[5]	岳永恒, 赵志浩. 基于深度学习的车道线检测算法[J]. 华南理工大学学报(自然科学版), 2025, 53(9): 22-30.
[6]	温惠英, 黄坤火, 陈喆, 赵胜, 胡宇晴, 黄俊达. 基于ExiD的高速公路合流汇入特性与安全性研究[J]. 华南理工大学学报(自然科学版), 2025, 53(8): 50-60.
[7]	左彬, 董天航, 张泽辉, 王华珺, 霍为炜, 宫文峰, 程军圣. 基于深度学习的质子交换膜燃料电池故障预测方法[J]. 华南理工大学学报(自然科学版), 2025, 53(7): 21-30.
[8]	温惠英, 马肇良, 赵胜, 巫立明, 黄坤火. 山区高速公路货车事故影响因素分析[J]. 华南理工大学学报(自然科学版), 2025, 53(7): 93-103.
[9]	刘文硕, 钟明锋, 周博, 吕方舟. 基于机器学习的高速铁路斜拉桥钢箱梁温度模式研究[J]. 华南理工大学学报(自然科学版), 2025, 53(6): 25-33.
[10]	胡广华, 代志刚, 王清辉. 基于图神经网络的B-Rep模型加工特征识别方法[J]. 华南理工大学学报(自然科学版), 2025, 53(5): 20-31.
[11]	孙尊强, 田一淳, 苏楠, 郑成航, 张振, 杨宏旻, 高翔. 基于多源数据及机器学习的电厂CO₂排放计量与预测[J]. 华南理工大学学报(自然科学版), 2025, 53(11): 52-61.
[12]	胡习之, 崔博非, 王琴, 刘鸿. 基于记忆泊车场景的视觉SLAM算法[J]. 华南理工大学学报(自然科学版), 2024, 52(6): 1-11.
[13]	刘昊, 元辉, 陈晨, 高伟. 基于采样的点云几何编码框架[J]. 华南理工大学学报(自然科学版), 2024, 52(6): 148-156.
[14]	巩忠文, 熊二刚, 王文翔, 等. 基于裂缝滑移模型的无腹筋RC梁抗剪承载力计算[J]. 华南理工大学学报(自然科学版), 2024, 52(5): 114-126.
[15]	赵建东, 许慧玲, 吕行, 等. 考虑代价敏感的高速公路偷逃费行为识别模型[J]. 华南理工大学学报(自然科学版), 2024, 52(5): 10-19.