عنوان المقالة:إسناد تأليف النصوص العربية التاريخية القصيرة باستخدام ميزات قياس الأسلوب ومصنف KNN مع بيانات تدريب محدودة Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data
فاطمة هويدي | Fatma Howedi | 525
- نوع النشر
- مجلة علمية
- المؤلفون بالعربي
- فاطمة هويدي ، مازنيزا موحد، زهرة أبوراوي ، صلاح جوان
- المؤلفون بالإنجليزي
- Fatma Howedi and Masnizah Mohd and Zahra Aborawi Aborawi and Salah A. Jowan
- الملخص الانجليزي
- Abstract Authorship Attribution (AA) is a task that aims to recognize the authorship of unknown texts based on writing style. Out of the various approaches to solve the AA problem, Stylometry is a promising one. This paper explores the use of a K-Nearest Neighbor (KNN) classifier combined with stylometry features to perform AA. This study indicates the robustness of KNN in performing AA on short historical Arabic texts written by different authors. To classify the texts according to the author, KNN was trained with a set of stylometry features including rare words, count characters and 2-, 3- and 4-grams character levels. Various feature set sizes ranging from 34 to 2000 were tested in the experiment. The experiments were conducted on limited training data with datasets consisting of 3 short texts per the author’s book. This method proved to be at least as effective as Information Gain (IG) when selecting the most significant n-grams. Moreover, the KNN classifier achieved high accuracy results with the best classification accuracy of up to 90%, except for the 5-KK using the 4-gram character level. This work contributes towards utilizing KNN for identifying the distinctive stylometry feature for robust AA identification in short historical Arabic texts.
- تاريخ النشر
- 03/10/2020
- الناشر
- Journal of Computer Science
- رقم المجلد
- 16
- رقم العدد
- ISSN/ISBN
- 1549-3636/ 1552-6607
- رابط DOI
- https://doi.org/10.3844/jcssp.2020.1334.1345
- الصفحات
- 1334-1345
- رابط خارجي
- https://thescipub.com/pdf/jcssp.2020.1334.1345.pdf
- الكلمات المفتاحية
- Arabic, Authorship Attribution, Character Features, KNN, Lexical Stylometry