Abstract
Generally phonetically rich and balanced corpora are popular for training speech recognition
system but these corpora are costly to develop. Different greedy algorithms have been develop to
collect such corpora. A significant effort is required to record and transcribe such speech corpora.
Therefore there is motivation to further reduce their size. This paper demonstrates such an algorithm.
Earlier work shows that different amount of training data is required to train different phonemes. The
current work further develops these findings to reduce phonetically rich training data. Experiments
show that this algorithm reduces the size of an Urdu speech corpus by 56.49% without degradation in
accuracy.
Saad Irtza, Sarmad Hussain. (2015) An Efficient Algorithm To Collect Minimal Speech Corpora, Pakistan Journal of Engineering and Applied Sciences, VOLUME 17, Issue 1.
-
Views
2066 -
Downloads
135
Next Article
Article Details
Volume
Issue
Type
Language