Abstract
Part-of-Speech (POS) tagging is process of
assigning unique grammatical tags to every word in a
sentence. POS tagset is primary requirement of POS
tagging process. This research paper discusses various
grammatical classes of Sindhi with reference to POS
tagset design and tagging. Various issues like tagset
design considerations, tagset size and granularity, part
of speech types, subtypes and their attributes for
tagging are discussed in detail. General guidelines for
designing Sindhi POS tagset of any possible size and
granularity are given. Obligatory and proposed tagsets
for Sindhi are presented which provide basis for
further research in part of speech tagging, tagged
corpus, chunking, syntax analysis, information
retrieval, part of speech usage analysis and other
natural language processing applications.
Mutee U Rahman. (2012) Developing a Part of Speech Tagset for Sindhi, Conference on Language and Technology 2012.
-
Viewed
1512 -
Downloads
954