Advances in Computer and Communication

Downloads: 18318 Total View: 180012
Frequency: bimonthly ISSN Online: 2767-2875 CODEN: ACCDC3
Email: acc@hillpublisher.com
Article http://dx.doi.org/10.26855/acc.2022.06.004

Detection of Summary Obfuscation Plagiarism Using an Aggregation Approach

Mohsen Safari, Elham Ghanbari*

Department of Computer Engineering, Yadegar-e-Imam Khomeini (RAH) Shahr-e-Rey Branch, Islamic Azad University, Tehran, Iran.

*Corresponding author: Elham Ghanbari

Published: June 28,2022

Abstract

Plagiarism is considered a field of text analysis, and it refers to the copying of text from an original source without referencing it. Plagiarism, which appears at various scientific and academic levels, computer programming, etc., is not only considered a fraudulent act, but it also destroys the sense of creativity and ingenuity that may otherwise develop. Creating obfuscation through text summarization and compression is a type of plagiarism in which the perpetrator replaces the words used in the sentences of an original manuscript with a synonymous word. Accordingly, the detection of this type of plagiarism is a complex task, and it is made even more challenging by having a plagiarized phrase that is shorter in length than the original one. The proposed system in this work comprises 3 main steps: preprocessing, phrase selection, and filtering. In this approach, by customizing the Okapi Best Matching (BM25) technique and detecting semantic similarities by WordNet, the levels of two sentences in a dubious document and source document are equalized to a large extent and the scores of several similarity measures are combined by using the proposed aggregation approach; and then based on the outcome of this approach, it can be decided whether the examined text is plagiarized or not. By testing the proposed model on the PAN data set, 78% of the documents plagiarized through summarization were detected correctly.

References

[1] M. S. Anderson and N. H. Stenec. (2011). “The problem of plagiarism,” in Urologic oncology: Seminars and original investigations, 2011, vol. 29, no. 1, pp. 90-94.

[2] M. Muhr, M. Zechner, R. Kern, and M. Granitzer. (2009). “External and Intrinsic Plagiarism Detection using Vector Space Models,” Sepln 2009, pp. 47-55, 2009.

[3] G. Oberreuter and J. D. Vel squez. (2013). “Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style,” Expert Syst. Appl., vol. 40, no. 9, pp. 3756-3763, 2013.

[4] J. D. Velásquez, Y. Covacevich, F. Molina, E. Marrese-Taylor, C. Rodríguez, and F. Bravo-Marquez. (2016). “DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources,” Inf. Fusion, vol. 27, pp. 64-75, 2016.

[5] A. M. El Tahir Ali, H. M. Dahwa Abdulla, and V. Snášel. (2011). “Overview and comparison of plagiarism detection tools,” in CEUR Workshop Proceedings, 2011, vol. 706, pp. 161-172.

[6] U. N. Dulhare, K. Ahmad, and K. A. Ahmad. (2020). “Machine Learning and Big Data: Concepts, Algorithms, Tools and Applications”. John Wiley & Sons, 2020.

[7] N. Shenoy and M. A. Potey. “Semantic Similarity Search Model for Obfuscated Plagiarism Detection in Marathi Language using Fuzzy and Naïve Bayes Approaches,” IOSR J. Comput. Eng. e-ISSN, pp. 661-2278.

[8] G. Oberreuter, G. L’Huillier, S. A. Rıos, and J. D. Velásquez. (2011). “Approaches for intrinsic and external plagiarism detection,” Proc. PAN, 2011.

[9] M. Zechner, M. Muhr, R. Kern, and M. Granitzer. (2009). “External and Intrinsic Plagiarism Detection Using Vector Space Models,” Proc. 3rd Work. Uncovering Plagiarism, Authorsh. Soc. Softw. Misuse 1st Int. Compet. Plagiarism Detect., pp. 47-55, 2009.

[10] K. Kohler and D. Weber-Wul. (2010). “Plagiarism detection test 2010,” Technical report, HTW Berlin, 2010.

[11] C. Grozea and M. Popescu. (2010). “Who’s the thief? Automatic detection of the direction of plagiarism,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010, vol. 6008 LNCS, pp. 700-710.

[12] C. Vania and M. Adriani. (2010). “Automatic external plagiarism detection using passage similarities,” in CEUR Workshop Proceedings, 2010, vol. 1176.

[13] S. L. Devi, P. R. K. Rao, V. S. Ram, and A. Akil. (2010). “External Plagiarism Detection Lab Report for PAN at CLEF 2010,” 2010.

[14] A. Ekbal, S. Saha, and G. Choudhary. (2012). “Plagiarism detection in text using Vector Space Model,” in Hybrid Intelligent Systems (HIS), 2012 12th International Conference on, 2012, pp. 366-371.

[15] R. Naseem and S. Kurian. (2013). “Extrinsic plagiarism detection in text combining vector space model and fuzzy semantic similarity scheme,” Int. J. Adv. Comput. Eng. Appl. (IJACEA), ISSN, 2013.

[16] M. A. Sanchez-Perez, G. Sidorov, and A. F. Gelbukh. (2014). “A Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014.,” in CLEF (Working Notes), 2014, pp. 1004-1011.

[17] S. F. Hussain and A. Suryani. (2015). “On retrieving intelligently plagiarized documents using semantic similarity,” Eng. Appl. Artif. Intell., vol. 45, pp. 246-258, 2015.

[18] A. Abdi, S. M. Shamsuddin, N. Idris, R. M. Alguliyev, and R. M. Aliguliyev. (2017). “A linguistic treatment for automatic external plagiarism detection,” Knowledge-Based Syst., vol. 135, pp. 135-146, 2017.

[19] Y. Palkovskii and A. Belov. (2014). “Developing High-Resolution Universal Multi-Type N-Gram Plagiarism Detector,” Work. Notes Pap. CLEF 2014 Eval. Labs, pp. 984-989, 2014.

[20] “2. Accessing Text Corpora and Lexical Resources.” [Online]. Available: http://www.nltk.org/book/ch02.html. [Accessed: 23-Nov-2017].

[21] O. Vechtomova. (2009). “Introduction to Information Retrieval Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze (Stanford University, Yahoo! Research, and University of Stuttgart) Cambridge: Cambridge University Press, 2008, xxi+ 482 pp; hardbound, ISBN 978-0-521-8.” MIT Press, 2009.

[22] Doug Turnbull. (2015). “BM25 The Next Generation of Lucene Relevance,” 2015. [Online]. Available: http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/. [Accessed: 24-Nov-2017].

[23] H. A. N. Jiawei, K. Micheline, and M. Data. (2007). “Concepts and Techniques.” Morgan Kaufmann, 2007.

[24] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. (1990). “Introduction to WordNet: An on-line lexical database,” Int. J. Lexicogr., vol. 3, no. 4, pp. 235-244, 1990.

[25] D. Lin. (1998). “An information-theoretic definition of similarity.,” in Icml, 1998, vol. 98, no. 1998, pp. 296-304.

[26] Y. Jiang, W. Bai, X. Zhang, and J. Hu. (2017). “Wikipedia-based information content and semantic similarity computation,” Inf. Process. Manag., vol. 53, no. 1, pp. 248-265, 2017.

[27] N. Seco, T. Veale, and J. Hayes. (2004). “An intrinsic information content metric for semantic similarity in Word-Net,” in Proceedings of the 16th European conference on artificial intelligence, 2004, pp. 1089-1090.

[28] M. Potthast, M. Hagen, M. Völske, and B. Stein. (2013). “Crowdsourcing Interaction Logs to Understand Text Reuse from the Web.,” in ACL (1), 2013, pp. 1212-1221.

[29] M. Potthast, B. Stein, A. Barrón-Cedeño, and P. Rosso. (2010). “An evaluation framework for plagiarism detection,” in Proceedings of the 23rd international conference on computational linguistics: Posters, 2010, pp. 997-1005.

[30] R. M. A. Nawab, M. Stevenson, and P. Clough. (2017). “An IR-Based Approach Utilizing Query Expansion for Plagiarism Detection in Medline,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 14, no. 4, pp. 796-804, 2017.

How to cite this paper

Detection of Summary Obfuscation Plagiarism Using an Aggregation Approach

How to cite this paper: Mohsen Safari, Elham Ghanbari. (2022) Detection of Summary Obfuscation Plagiarism Using an Aggregation Approach. Advances in Computer and Communication3(1), 34-52.

DOI: http://dx.doi.org/10.26855/acc.2022.06.004