Implementation and verification of speech database for unit selection speech synthesis

Krzysztof Szklanny; Sebastian Koszuta

Implementation and verification of speech database for unit selection speech synthesis

Krzysztof Szklanny, Sebastian Koszuta

DOI: http://dx.doi.org/10.15439/2017F395

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 1263–1267 (2017)

Full text

Abstract. The main aim of this study was to prepare a new speech database for the purpose of unit selection speech synthesis. The object was to design a database with improved parameters compared with the existing database [1], making use of the theses proved in studies [2]-[4]. The quality of the corpus, a selection of the suitable speaker, and the quality of the speech database are all crucially important for the quality of synthesized speech. The considerably larger text corpora used in the study as well as the broader multiple balancing of the database yielded a greater number of varied acoustic units. For the purpose of the recording, one voice talent was selected from among a group of 30 professional speakers. The next stage involved database segmentation. The resultant database was then verified with a prototype speech synthesizer. The quality of the synthetic speech was compared to that of synthetic speech obtained in other Polish unit selection speech synthesis systems. Consequently, the end result proved to be better than the one obtained in the previous study [4]. The database had been supplemented and extended, significantly enhancing the quality of synthesized speech.

References

D. Oliver, K. Szklanny, (2006). Creation and analysis of a Polish speech database for use in unit selection synthesis. In LREC-2006: Fifth International Conference on Language Resources and Evaluation.
K. Szklanny „Optymalizacja funkcji kosztu w korpusowej syntezie mowy polskiej”. Diss. Polsko-Japońska Wyższa Szkoła Technik Komputerowych, 2009.
K. Szklanny "System Korpusowej Syntezy Mowy Dla Języka Polskiego." XI International PhD Workshop OWD 2009, 17–20 October 2009
K. Szklanny (2014). “Multimodal Speech Synthesis for Polish Language. In Man-Machine Interactions 3 (pp. 325-333). Springer International Publishing.” http://dx.doi.org/10.1007/978-3-319-02309-0_35
B. Bozkurt, T. Dutoit, O. Ozturk: Text Design For TTS Speech Corpus Building Using A Modified Greedy Selection, Proc. Eurospeech, Geneva 2003, pp 277-280.
J.C. Wells (1997) SAMPA computer readable phonetic alphabet, in Gibbon, D., Moore, R. and Winski, R. (eds.), 1997. Handbook of Standards and Resources for Spoken Language Systems. Berlin and New York: Mouton de Gruyter. Part IV, section B.
D. Koržinek, K. Marasek, Ł. Brocki, 2016, Polish Speech Services, CLARIN-PL digital repository, http://hdl.handle.net/11321/296.
A. S. Bailador. 1998. CorpusCrt. Technical report, Polytechnic University of Catalonia (UPC).
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, K. Vesely, The Kaldi Speech Recognition Toolkit
Marasek, K., Koržinek, D. and Brocki, Ł. (2015). System for Automatic Transcription of Sessions of the Polish Senate. Archives of Acoustics, 39(4). http://dx.doi.org/https://doi.org/10.2478/aoa-2014-0054
A. J. Viterbi (1967) Error bounds for convolutional codes and an asymptotically optimal decoding algorithm, IEEE Transactions on Information Processing, 13:260-269
ITU-T reccomendation no P.85 (https://www.itu.int/rec/T-REC-P.85-199406-I/en).
E. Klabbers, K. Stöber, R. Veldhuis, P. Wagner, S. Breuer (2001 B) Speech synthesis development made easy: The Bonn Open Synthesis System, Eurospeech 2001, Aalborg,
G. Demenko, K. Klessa, M. Szymański, J. Bachan (2007) The design of Polish speech corpora for speech synthesis in BOSS system, Mat.XII Sympozjum Podstawowe Problemy Energoelektroniki, Elektromechaniki i Mechatroniki (PPEEm’2007), Wisła, Poland, pp. 253-258.
G. Demenko, A. Wagner (2007) Prosody annotation for unit selection text-to-speech synthesis, Archives of acoustics, 32(1):.25-40
G. Demenko, J. Bachan, B. Möbius, K. Klessa, M. Szymański, S. Grocholewski, (2008). Development and evaluation of Polish speech corpus for unit selection speech synthesis systems. In Ninth Annual Conference of the International Speech Communication Association.
M. Szymański, K. Kleesa, and G. Demenko. "Optimization of unit selection speech synthesis." Proceedings of 17th International Congress of Phonetic Sciences (ICPhS 2011). 2011.
G. Demenko, K. Klessa, M. Szymański, S. Breuer, & W. Hess, (2010). Polish unit selection speech synthesis with BOSS: extensions and speech corpora. International Journal of Speech Technology, 13(2), 85-99. http://dx.doi.org/10.1007/s10772-010-9071-3
M. Kaszczuk, L. Osowski. "Evaluating Ivona speech synthesis system for Blizzard Challenge 2006." Blizzard Workshop, Pittsburgh. 2006.
M. Kaszczuk, L. Osowski. "The IVO Software Blizzard 2007 Entry: Improving Ivona Speech Synthesis System." Sixth ISCA Workshop on Speech Synthesis, Bonn. 2007.
M. Kaszczuk, L Osowski. "The IVO software Blizzard Challenge 2009 entry: Improving IVONA text-to-speech." Blizzard Challenge Workshop. 2009.
R. Clark, K. Richmond, & S. King, (2007). Multisyn: Open-domain unit selection for the Festival speech synthesis system. Speech Communication, 49(4), 317-330. http://dx.doi.org/10.1016/j.specom.2007.01.014
K. Szklanny, M. Wojtowski, (2008, May). Automatic segmentation quality improvement for realization of unit selection speech synthesis. In 2008 Conference on Human System Interactions (pp. 251-256). IEEE, http://dx.doi.org/10.1109/HSI.2008.4581443.
ELDA: Evaluations and Language resources Distribution Agency. Online: http://www.elda.org/, accessed on 21 April 2017.