Journal of South China University of Technology (Natural Science Edition) ›› 2013, Vol. 41 ›› Issue (7): 131-136.doi: 10.3969/j.issn.1000-565X.2013.07.022

• Computer Science & Technology • Previous Articles     Next Articles

A Plagiarism Detection Method Based on Semantic Matching

Zou Du1 Chen Yu- qing2† Zhang Ling2   

  1. 1.Information Network Engineering and Research Center,South China University of Technology,Guangzhou510640,Guangdong,China; 2.School of Computer Science and Engineering,South China University ofTechnology,Guangzhou 510006,Guangdong,China
  • Received:2013-03-10 Online:2013-07-25 Published:2013-06-01
  • Contact: Chen Yu- qing(born in 1973),male,engineer,mainly researches on computer application. E-mail:yqchen@scut.edu.cn
  • About author:Zou Du(born in 1973),male,senior engineer,mainly researches on computer application and information retrieval.E- mail:duzou@scut.edu.cn
  • Supported by:

    Supported by the National Natural Science Foundation of China (61070092)

Abstract:

The existing plagiarism detection methods mostly use the similarity to determine whether there is pla-giarism between two documents.Unlike the case in common duplication detection,in plagiarism detection,a small segment of duplicate text without any references may be identified as plagiarism.However,due to the effects of document size,duplicate text length and interferences,the existing plagiarism detection methods are all of relatively poor performance.In order to solve this problem,the relationship between the text semantics and the fingerprint order is analyzed,and a semantic matching method,which projects the fingerprint vector into a binary sequence to reduce the dimension and remain the position information of the fingerprint,is pro-posed.Then,the method is compared with the Jaccard distance method and the Hamming distance method through the test on the PAN public corpus.The results show that the proposed method is of the highest recall and precision.

Key words: semantic matching, plagiarism detection, fingerprint, text semantics