Abstract
Part-of-Speech (POS) tagging is process of assigning unique grammatical tags to every word in a sentence. POS tagset is primary requirement of POS tagging process. This research paper discusses various grammatical classes of Sindhi with reference to POS tagset design and tagging. Various issues like tagset design considerations, tagset size and granularity, part of speech types, subtypes and their attributes for tagging are discussed in detail. General guidelines for designing Sindhi POS tagset of any possible size and granularity are given. Obligatory and proposed tagsets for Sindhi are presented which provide basis for further research in part of speech tagging, tagged corpus, chunking, syntax analysis, information retrieval, part of speech usage analysis and other natural language processing applications.

Mutee U Rahman. (2012) Developing a Part of Speech Tagset for Sindhi, Conference on Language and Technology 2012.
  • Viewed 1485
  • Downloads 942
Publisher
Center for Language Engineering
Country
Pakistan
City
Lahore
From
09-11-2012
To
10-11-2012