A Novel Parts of Speech (POS) Tagset for morphological, syntactic and lexical annotations of Saraiki language

تلخیص

One of the important resources required for various Natural Language Processing (NLP) applications like machine translation, information retrieval and text mining, is annotated text corpora. Text corpora annotation process requires parts of speech (POS) tags to mark different parts of text with grammatical annotations in order to identify linguistic properties of a word, sentence or discourse. The process of marking text items is based on two main features 1) grammatical category and 2) context of text (word, sentence or discourse) i.e. relationship with adjacent and related text.Saraiki being one of oldest languages is still resource scarce language in recorded literature as well as in computational context. According to our study, at present, there is no tagset defined for Saraiki language. This work presents first hierarchical POS (MPOST) tag set for the Saraiki language which is designed to be used in morphological, syntactic and lexical annotations of Saraiki language corpora.

Download

کلیدی الفاظ

برائے حوالہ

Farrukh Javed Saleemi, Muhammad Nabeel Asghar, Sajid Iqbal, Muhammad Umar Chaudhry, Muhammad Yasir, Sibghat Ullah Bazai, Muhammad Qasim Khan. (2021) A Novel Parts of Speech (POS) Tagset for morphological, syntactic and lexical annotations of Saraiki language, , Volume-11, Issue-1.

Views 974
Downloads 120

پچھلا مقالہ

اگلا مقالہ

مقالے کی معلومات

تازہ ترین جلد

A Novel Parts of Speech (POS) Tagset for morphological, syntactic and lexical annotations of Saraiki language

کلیدی الفاظ

برائے حوالہ