Abstract
One of the important resources required for various Natural Language Processing (NLP) applications like machine
translation, information retrieval and text mining, is annotated text corpora. Text corpora annotation process requires parts of speech
(POS) tags to mark different parts of text with grammatical annotations in order to identify linguistic properties of a word, sentence or
discourse. The process of marking text items is based on two main features 1) grammatical category and 2) context of text (word, sentence
or discourse) i.e. relationship with adjacent and related text.Saraiki being one of oldest languages is still resource scarce language in
recorded literature as well as in computational context. According to our study, at present, there is no tagset defined for Saraiki
language. This work presents first hierarchical POS (MPOST) tag set for the Saraiki language which is designed to be used in
morphological, syntactic and lexical annotations of Saraiki language corpora.
Farrukh Javed Saleemi, Muhammad Nabeel Asghar, Sajid Iqbal, Muhammad Umar Chaudhry, Muhammad Yasir, Sibghat Ullah Bazai, Muhammad Qasim Khan. (2021) A Novel Parts of Speech (POS) Tagset for morphological, syntactic and lexical annotations of Saraiki language, Journal of Applied and Emerging Sciences, Volume-11, Issue-1.
-
Views
889 -
Downloads
100
Article Details
Volume
Issue
Type
Language
Received At
Accepted At