Advances in Computer and Communication

Detection of Summary Obfuscation Plagiarism Using an Aggregation Approach

Mohsen Safari, Elham Ghanbari*

Department of Computer Engineering, Yadegar-e-Imam Khomeini (RAH) Shahr-e-Rey Branch, Islamic Azad University, Tehran, Iran.

*Corresponding author: Elham Ghanbari

Published: June 28,2022


Plagiarism is considered a field of text analysis, and it refers to the copying of text from an original source without referencing it. Plagiarism, which appears at various scientific and academic levels, computer programming, etc., is not only considered a fraudulent act, but it also destroys the sense of creativity and ingenuity that may otherwise develop. Creating obfuscation through text summarization and compression is a type of plagiarism in which the perpetrator replaces the words used in the sentences of an original manuscript with a synonymous word. Accordingly, the detection of this type of plagiarism is a complex task, and it is made even more challenging by having a plagiarized phrase that is shorter in length than the original one. The proposed system in this work comprises 3 main steps: preprocessing, phrase selection, and filtering. In this approach, by customizing the Okapi Best Matching (BM25) technique and detecting semantic similarities by WordNet, the levels of two sentences in a dubious document and source document are equalized to a large extent and the scores of several similarity measures are combined by using the proposed aggregation approach; and then based on the outcome of this approach, it can be decided whether the examined text is plagiarized or not. By testing the proposed model on the PAN data set, 78% of the documents plagiarized through summarization were detected correctly.


