Abstract
The paper describes a two pass POS-tagging system for the extraction of first name and surname from a Pakistani (full) name string. The full name in Pakistan does not follow a single fixed pattern. The order of its component is flexible, and the simple pattern of first-name middle-name last-name is not applicable. There are many peculiarities e.g. in the absence of family name, the middle-name serve as the surname. To extract first name and surname, two sets of POS tags are designed. The first tagset consists of personal-name, family-name, religious-middle-name, particle and title. The second tagset consists of first-name, surname, title and middle-name. The output of the first pos tagging subsystem is fed to the second subsystem. The evaluation gives 90+% accuracy by using POS tagging.

Tafseer Ahmed, Naila Ata. (2014) What's in a name? Automatic extraction of lexical and functional units of Pakistani names, Conference on Language and Technology 2014.
  • Viewed 1625
  • Downloads 0
Publisher
Center for Language Engineering
Country
Pakistan
City
Karachi
From
13-11-2014
To
15-11-2014