Abstract
The paper presents the automated extraction of arguments and collocations for Noun+Verb (N+V) e.g. safAI kar- ‘cleaning.noun do’ and Adjective+Verb (A+V) e.g. sAf kar- ‘clean.adj do’ complex predicates (cp) of Urdu. An automatically POS tagged corpus of 97 million words was processed, and the pseudo- relations of nouns and complex predicates are extracted by a devised algorithm (without using deep parsing or chunking). The words of pseudo-relations are processed to suggest the collocations for each complex predicate. For a given cp, the commonly used words in subject, object, genitive modifier of N+V, non-canonical second argument (NCSA) and V+V light verbs are extracted, if the argument exists for (or relevant to) that cp. In the absence of big and freely available Urdu treebank, the paper describe an alternate method to get argument structures and collocation of complex predicates. The pseudo-relation extractor can also be further used in information extraction tasks.

Tafseer Ahmed. (2014) Extracting Arguments and Collocations for Urdu Complex Predicates, Conference on Language and Technology 2014.
  • Viewed 1563
  • Downloads 0
  Previous Article
Publisher
Center for Language Engineering
Country
Pakistan
City
Karachi
From
13-11-2014
To
15-11-2014