Home Conference List Conference on Language and Technology Conference on Language and Technology 2014 Article Details

Extracting Arguments and Collocations for Urdu Complex Predicates

Abstract

The paper presents the automated extraction of arguments and collocations for Noun+Verb (N+V) e.g. safAI kar- ‘cleaning.noun do’ and Adjective+Verb (A+V) e.g. sAf kar- ‘clean.adj do’ complex predicates (cp) of Urdu. An automatically POS tagged corpus of 97 million words was processed, and the pseudo- relations of nouns and complex predicates are extracted by a devised algorithm (without using deep parsing or chunking). The words of pseudo-relations are processed to suggest the collocations for each complex predicate. For a given cp, the commonly used words in subject, object, genitive modifier of N+V, non-canonical second argument (NCSA) and V+V light verbs are extracted, if the argument exists for (or relevant to) that cp. In the absence of big and freely available Urdu treebank, the paper describe an alternate method to get argument structures and collocation of complex predicates. The pseudo-relation extractor can also be further used in information extraction tasks.

Cite this article

Tafseer Ahmed. (2014) Extracting Arguments and Collocations for Urdu Complex Predicates, Conference on Language and Technology 2014.

Viewed 1583
Downloads 0

Publisher

Center for Language Engineering

Country

Pakistan

City

Karachi

From

13-11-2014

15-11-2014