Home Journal List Pakistan Journal of Engineering and Applied Sciences VOLUME 23 Article Details

Finding Topics in Urdu: A Study of Applicability of Document Clustering on Urdu Language

Abstract

In this research, we present the results of a study conducted to ascertain the applicability of document clustering techniques on Urdu language corpus. This study, which is first of its kind, employs a fully probabilistic Bayesian method, Latent Dirichlet Allocation, for clustering Urdu language corpus by using the features collected from the documents. Results obtained are compared with those obtained from a simplistic classification technique. Analysis of the results shows that supervised and unsupervised techniques for grouping documents perform reasonably well on this corpus. Results further indicate that Urdu document clustering technique outperforms document classification technique in some cases with an accuracy of above 90%.

Download

Keywords

Cite this article

Toqeer Ehsan, H. M. Shahzad Asif. (2018) Finding Topics in Urdu: A Study of Applicability of Document Clustering on Urdu Language, Pakistan Journal of Engineering and Applied Sciences, VOLUME 23, Issue 1.

Views 2619
Downloads 325

Article Details

Journal

Pakistan Journal of Engineering and Applied Sciences

Volume

VOLUME 23

Issue

Issue 1

Type

Regular

Language

English

Recent Volumes

Volume-28 ( 2021)

Volume 26 ( 2020)

Volume 27 ( 2020)