Abstract
We propose a simple rules embedded matrix based method to split input sentences into their constituents and phrases. Splitting a sentence into phrases is a preprocess of machine translation for overcoming the problem of handling long sentences and improving quality of automatic translation. An effort is made to remove or at least minimize the problem of recursion that is faced during the process of phrase splitting thereby saving a lot of time. The system is dynamic in design and theoretically would work for any language that has some type of word order. However we have tested the system on Pashto language and this paper would describe the system in the perspective of Pashto language. The system can achieve more than 90% results keeping in view the Phrase Rules are carefully captured in a table.

Zaheer Ahmad, Mohammad Abid Khan, Jehan Zeb Khan Orakzai, Rahman Ali, Ibrar Ahmad. (2012) A Computational Multilingual Text Constituent Splitter and Phrasing: A Case of Pashto Language, Conference on Language and Technology 2012.
  • Viewed 1487
  • Downloads 543
  Next Article
Publisher
Center for Language Engineering
Country
Pakistan
City
Lahore
From
09-11-2012
To
10-11-2012