Abstract
Named Entity Recognition (NER) is the process of identifying names of Persons, Organizations, Locations and other miscellaneous information like number, date, and measure in a given text. In this paper, we describe the development of a NER system for Urdu Language using Hidden Markov Model (HMM). First, we show a comparison of IOB2 and IOE2 tagging schemes. Second, we show the preprocessing of Urdu before feeding data to the HMM model for training using the IOE2 tagging scheme. Finally, we use the Part of Speech (POS) information, gazetteers, and rules to improve the accuracy of the system. Our system yields 66.71%, 71.70%, and 69.12% as the values for precision, recall, and f-measure, respectively. This system will help us improve the results of Urdu Information Retrieval, Machine Translation, and Questing and Answering systems.

Muhammad Kamran Malik, Syed Mansoor Sarwar. (2017) Urdu Named Entity Recognition System using Hidden Markov Model, Pakistan Journal of Engineering and Applied Sciences, VOLUME 21, Issue 1.
  • Views 2135
  • Downloads 260

Article Details

Volume
Issue
Type
Language