احمد عبد الله احمد، محمد صبيح حمود التميمي، عمر اسماعيل السنجري، غزالي سولونج
المؤلفون بالإنجليزي
Ahmed Abdullah Ahmed, Mohammed Sabbih Al-Tamimi, Omar Ismael Al-Sanjary, Ghazali Sulong
الملخص الانجليزي
Arabic text categorization for pattern recognitions is challenging. We propose for the first time a novel holistic method based on clustering for classifying Arabic writer. The categorization is accomplished stage-wise. Firstly, these document images are sectioned into lines, words, and characters. Secondly, their structural and statistical features are obtained from sectioned portions. Thirdly, F-Measure is used to evaluate the performance of the extracted features and their combination in different linkage methods for each distance measures and different numbers of groups. Finally, experiments are conducted on the standard KHATT dataset of Arabic handwritten text comprised of varying samples from 1000 writers. The results in the generation step are obtained from multiple runs of individual clustering methods for each distance measures. The best results are achieved when intensity, lines slope and their combination set of features are applied. It is demonstrated that different numbers
of clusters having good set of features can deliver significant improvements for the handwritten structures clustering.