Home Conference List Conference on Language and Technology Conference on Language and Technology 2014 Article Details

Design of Speech Corpus for Open Domain Urdu Text to Speech System Using Greedy Algorithm

Abstract

Unit selection speech synthesis is one of the most widely used techniques for high quality text to speech (TTS) systems. A unit selection text to speech system requires a large database of recorded and annotated speech which contains both phonetic and prosodic variations. Designing phonetically rich and balanced speech corpora with minimum number of utterances is an intricate task. Several optimization methods are used for this purpose and "Greedy algorithm" is one of them. This paper introduces a greedy algorithm which maximizes the coverage of high frequency unigrams, bigrams and trigrams while selecting minimal number of sentences from input corpus. The algorithm has been applied on different corpora collected from different domains and a speech corpus for Urdu TTS system is designed. A significant coverage of tri-phone has also been achieved.

Cite this article

Wajiha Habib, Rida Hijab Basit, Sarmad Hussain , Farah Adeeba. (2014) Design of Speech Corpus for Open Domain Urdu Text to Speech System Using Greedy Algorithm, Conference on Language and Technology 2014.

Viewed 1562
Downloads 0

Publisher

Center for Language Engineering

Country

Pakistan

City

Karachi

From

13-11-2014

15-11-2014