Abstract
We propose a simple rules embedded matrix
based method to split input sentences into their
constituents and phrases. Splitting a sentence into
phrases is a preprocess of machine translation for
overcoming the problem of handling long sentences
and improving quality of automatic translation. An
effort is made to remove or at least minimize the
problem of recursion that is faced during the process
of phrase splitting thereby saving a lot of time. The
system is dynamic in design and theoretically would
work for any language that has some type of word
order. However we have tested the system on Pashto
language and this paper would describe the system in
the perspective of Pashto language. The system can
achieve more than 90% results keeping in view the
Phrase Rules are carefully captured in a table.
Zaheer Ahmad, Mohammad Abid Khan, Jehan Zeb Khan Orakzai, Rahman Ali, Ibrar Ahmad. (2012) A Computational Multilingual Text Constituent Splitter and Phrasing: A Case of Pashto Language, Conference on Language and Technology 2012.
-
Viewed
1517 -
Downloads
545
Next Article