First Automatic Fongbe Continuous Speech Recognition System: Development of Acoustic Models and Language Models

Fréjus Laleye, Laurent Besacier, Eugène C. Ezin, Cina Motamed

DOI: http://dx.doi.org/10.15439/2016F153

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 477–482 (2016)

Full text

Abstract. This paper reports our efforts toward an ASR system for a new under-resourced language (Fongbe). The aim of this work is to build acoustic models and language models for continuous speech decoding in Fongbe. The problem encountered with Fongbe (an African language spoken especially in Benin, Togo, and Nigeria) is that it does not have any language resources for an ASR system. As part of this work, we have first collected Fongbe text and speech corpora that are described in the following sections. Acoustic modeling has been worked out at a graphemic level and language modeling has provided two language models for performance comparison purposes. We also performed a vowel simplification by removing tones diacritics in order to investigate their impact on the language models.

References

J. K. Tamgno and E. Barnard and C. Lishou and M. Richomme, Wolof Speech Recognition Model of Digits and Limited-Vocabulary Based on HMM and ToolKit, in. 14th International Conference on Computer Modelling and Simulation (UKSim), pp. 389–395, 2012 UKSim.
Besacier, L., Barnard, E., Karpov, A., and Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56:85–100.
E. Gauthier and L.Besacier and S. Voisin and M. Melese and U. P. Elingui Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof in. 10th edition of the Language Resources and Evaluation Conference , 23-28 May 2016, Slovenia.
S. A. M. Yusof and A. F. Atanda and M. Hariharan, A review of Yorúbà Automatic Speech Recognition, in. System Engineering and Technology (ICSET), IEEE 3rd International Conference on, pp. 242–247, Aug.2013.
J. Greenberg, Languages of Africa, La Haye Mouton, pp. 177, 1966.
C. Lefebvre and A-M. Brousseau, A grammar of Fonge, De Gruyter Mouton, PP. 608, December 2001.
A. B. Akoha, Syntaxe et lexicologie du Fon-gbe: Bénin, Ed. L’harmattan, pp. 368, January 2010.
Blachon, D., Gauthier, E., Besacier, L., Kouarata, G.-N., Adda-Decker, M., and Rialland, A. (2016). Parallel speech collection for underresourced language studies using the LIG-Aikuma mobile device app. In Proceedings of SLTU (Spoken Language Technologies for UnderResourced Languages), Yogyakarta, Indonesia.
A. W. Black and T. Schultz, Rapid Language Adaptation Tools and Technologies for Multilingual Speech Processing, in Automatic Speech Recognition & Understanding, IEEE Workshop, pp. 51, 2009.
Sebastian Dziadzio, Aleksandra Nabożny, Aleksander Smywinski-Pohl and Bartosz Ziolko, Comparison of Language Models Trained on Written Texts and Speech Transcripts in the Context of Automatic Speech Recognition, in Proc. Proceedings of the IEEE Federated Conference on Computer Science and Information Systems, 5, pp. 193-197, Pologne 2015.
S. Seng and S. Sam and V. Bac Le and B. Bigi and L. Besacier, Which units for acoustic and language modeling for Khmer automatic speech recognition?, SLTU 2008.
J. Billa and all, Audio indexing of Arabic broadcast news, in Proc. IEEE International Conference on Acoustique, Speech and Signals Processing, pp. 5-8, Orlando 2002.
D. Povey and A. Ghoshal et al., The Kaldi Speech Recognition Toolkit, in IEEE ASRU, 2011.
D. Povey and G.Saon, Feature and model space speaker adaptation with full covariance Gaussians, in INTERSPEECH 2006 - ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 17-21, 2006
Lukasz Laszko, Word detection in recorded speech using textual queries, in Proc. Proceedings of the IEEE Federated Conference on Computer Science and Information Systems, 5, pp. 849-853, Pologne 2015.