Home Conference List Conference on Language and Technology Conference on Language and Technology 2014 Article Details

Multitier Annotation of Urdu Speech Corpus

Abstract

This paper describes the multi-level annotation process of Urdu speech corpus and its quality assessment using PRAAT. The annotation of speech corpus has been done at phoneme, word, syllable and break index levels. Phoneme, word and break index level annotation has been done manually by trained linguists whereas syllable-tier annotation has been done automatically using template matching algorithm. On average the accuracy achieved at phoneme and break-index tiers is 79% and 89% respectively. The quality assessment of word and syllable tiers is still under investigation.

Cite this article

Benazir Mumtaz, Amen Hussain, Sarmad Hussain, Afia Mahmood, Rashida Bhatti, Mahwish Farooq, Sahar Rauf. (2014) Multitier Annotation of Urdu Speech Corpus, Conference on Language and Technology 2014.

Viewed 1600
Downloads 0

Publisher

Center for Language Engineering

Country

Pakistan

City

Karachi

From

13-11-2014

15-11-2014