收稿日期: 2013-03-10
网络出版日期: 2013-06-01
基金资助
国家自然科学基金资助项目(61070092)
A Plagiarism Detection Method Based on Semantic Matching
Received date: 2013-03-10
Online published: 2013-06-01
Supported by
Supported by the National Natural Science Foundation of China (61070092)
邹杜 陈育青 张凌 . 基于语义匹配的抄袭检测方法[J]. 华南理工大学学报(自然科学版), 2013 , 41(7) : 131 -136 . DOI: 10.3969/j.issn.1000-565X.2013.07.022
The existing plagiarism detection methods mostly use the similarity to determine whether there is pla-giarism between two documents.Unlike the case in common duplication detection,in plagiarism detection,a small segment of duplicate text without any references may be identified as plagiarism.However,due to the effects of document size,duplicate text length and interferences,the existing plagiarism detection methods are all of relatively poor performance.In order to solve this problem,the relationship between the text semantics and the fingerprint order is analyzed,and a semantic matching method,which projects the fingerprint vector into a binary sequence to reduce the dimension and remain the position information of the fingerprint,is pro-posed.Then,the method is compared with the Jaccard distance method and the Hamming distance method through the test on the PAN public corpus.The results show that the proposed method is of the highest recall and precision.
Key words: semantic matching; plagiarism detection; fingerprint; text semantics
/
| 〈 |
|
〉 |