A Method for Software Vulnerability Detection via Path Representations and Pretrained Model

doi:10.12141/j.issn.1000-565X.240131

Journal of South China University of Technology(Natural Science Edition) ›› 2025, Vol. 53 ›› Issue (5): 56-65.doi: 10.12141/j.issn.1000-565X.240131

• Computer Science & Technology • Previous Articles

A Method for Software Vulnerability Detection via Path Representations and Pretrained Model

#br#

LU Lu^1,2 WAN Tong¹

1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, Guangdong, China;
2. Pengcheng Laboratory, Shenzhen 518000, Guangdong, China

Online:2025-05-25 Published:2024-06-21

Abstract

Abstract:

Software vulnerabilities represent critical vulnerabilities that can compromise system security and are susceptible to exploitation by attackers for unauthorized control. Contemporary deep learning-based vulnerability detection approaches largely suffer from limitations due to their reliance on single code representations, failing to fully capture the complementary nature of code semantics and structural information. This research introduces an innovative method for software vulnerability detection, termed VDPPM (Software Vulnerability Detection via Path Representations and Pretrained Model), which addresses this issue. The proposed framework integrates path representations extracted from Abstract Syntax Tree (AST), Control Flow Graph (CFG), and Program Dependency Graphs (PDG), thereby offering a more comprehensive view of code characteristics. The VDPPM framework employs SimCodeBERT, a model refined through contrastive learning framework SimCSE, enhancing its ability to interpret code semantics. In the experimental phase, we initially construct a corpus using path representations and train a Doc2vec model to generate general-purpose embedding models, converting sequence of paths into vector representations. Subsequently, a pretrained CodeBERT model is integrated, which, after training under the contrastive learning framework, gains increased precision in capturing deep semantic features within the code. Ultimately, the fusion of vector representations generated by both Doc2vec and the enhanced SimCodeBERT enables the effective execution of vulnerability detection. Empirical studies demonstrate that across multiple publicly available benchmark datasets for vulnerability detection tasks, the VDPPM framework outperforms mainstream methods, showing significant improvements in several performance metrics. This convincingly validates the effectiveness and superiority of the proposed methodology.

Key words:

software vulnerability, path representation, pre-training, contrastive learning

LU Lu, WAN Tong. A Method for Software Vulnerability Detection via Path Representations and Pretrained Model[J]. Journal of South China University of Technology(Natural Science Edition), 2025, 53(5): 56-65.

[1]	CAI Xiaodong, DONG Lifang, HUANG Yeyang, ZHOU Li. Contrastive Learning Model Based on Text-Visual and Information Entropy Minimization [J]. Journal of South China University of Technology(Natural Science Edition), 2025, 53(3): 50-56.
[2]	YE Feng, CHEN Biao, LAI Yizong. Contrastive Knowledge Distillation Method Based on Feature Space Embedding [J]. Journal of South China University of Technology(Natural Science Edition), 2023, 51(5): 13-23.

A Method for Software Vulnerability Detection via Path Representations and Pretrained Model

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 2

Recommended Articles

Metrics

Comments