集成机器学习和元启发式算法的靶点抑制剂活性预测
Prediction of target inhibitor activity by integrating machine learning and metaheuristic algorithms
School of Biological Science and Engineering/ Guangdong Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China
Online published: 2025-11-27
传统的机器学习(ML)和深度学习(DL)在预测小分子抑制剂的选择性方面发挥着关键作用。许多基于现有数据集的模型可用于预测化合物的生物活性,但对于ML和DL用于此类任务的性能孰优孰劣仍存在争议。本研究基于不同的分子表征,运用10种元启发式算法优化11种ML和DL模型的超参数,旨在系统比较模型预测性能,识别最优模型。结果表明,基于元启发式超参数优化算法的ML和DL模型在预测性能上显著优于传统网格搜索模型。此外,在低维特征空间中,基于分子图的DL模型(如SSA-GAT和SSA-Attentive FP)能够通过端到端的学习机制,自动从数据中提取有效特征,其性能优于ML模型;而在高维特征空间(如RDKit+ECFP+AtomPairs+Maccs-XGBoost)中ML方法借助信息互补的分子特征和元启发式优化算法的高阶寻优能力,能够有效捕捉特征之间的复杂交互关系,通常在高维建模中展现出更高的准确性与鲁棒性。这些发现为指导选择ML和DL方法用于靶点抑制剂的活性预测提供了有用的信息。
凌飞, 顾学荣 . 集成机器学习和元启发式算法的靶点抑制剂活性预测[J]. 华南理工大学学报(自然科学版), 0 : 1 . DOI: 10.12141/j.issn.1000-565X.250020
Traditional machine learning (ML) and deep learning (DL) play a key role in the prediction of the selectivity of small molecule inhibitors. Many models based on existing datasets are available for predicting the bioactivity of compounds. However, there remains controversy regarding the relative performance of ML and DL for such predictions. In this study, ten metaheuristic algorithms are applied to optimize the hyperparameters of eleven ML and DL models based on different molecular characterizations, aiming at systematically comparing the model prediction performance and identifying the optimal models. The results demonstrate that the ML and DL models optimized using metaheuristic hyperparameter optimization algorithms significantly outperform traditional grid search models in terms of predictive performance. Additionally, in low-dimensional feature spaces, molecular graph-based DL models, such as SSA-GAT and SSA-Attentive FP, are capable of autonomously extracting relevant features from data through an end-to-end learning mechanism, which outperforms the ML models. Conversely, in high-dimensional feature spaces (e.g., RDKit+ECFP+AtomPairs+Maccs-XGBoost), ML methods leverage the complementarity of molecular features along with the high-order optimization capabilities of metaheuristic algorithms to effectively capture intricate feature interactions, often leading to higher accuracy and robustness in high-dimensional modeling. These findings provide valuable information to guide the selection of ML and DL methods for activity prediction of target inhibitors.
/
| 〈 |
|
〉 |