Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 15

Proceedings of the 2018 Federated Conference on Computer Science and Information Systems

Voice control in mixed reality

DOI: http://dx.doi.org/10.15439/2018F13

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 497500 ()

Full text

Abstract. The gameplay in the augmented or virtual reality is based on the use of external equipment such as glasses/telephones and possibly the use of additional sensors or controllers. In both cases, the interaction involves pressing the keys on the phone or controller. An interesting aspect is the control of objects created in a virtual way using voice commands. In this paper, I propose a solution to manipulate objects in the augmented reality using player's voice. The user can move the object using pre-programmed commands. The solution is based on speech processing and artificial neural networks. The technique has been tested and the results presented and discussed.


  1. A. G. LeBlanc and J.-P. Chaput, “Pokémon go: A game changer for the physical inactivity crisis?” Preventive medicine, vol. 101, pp. 235–237, 2017.
  2. B. Morschheuser, M. Riar, J. Hamari, and A. Maedche, “How games induce cooperation? a study on the relationship between game features and we-intentions in an augmented reality game,” Computers in human behavior, vol. 77, pp. 169–183, 2017.
  3. S. Chodarev, “Development of human-friendly notation for xml-based languages,” in Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on. IEEE, 2016, pp. 1565–1571.
  4. Z. Sroczyński, “Actiontracking for multi-platform mobile applications,” in Computer Science On-line Conference. Springer, 2017, pp. 339–348.
  5. L. Wang, F. Forni, R. Ortega, Z. Liu, and H. Su, “Immersion and invariance stabilization of nonlinear systems via virtual and horizontal contraction,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 4017–4022, 2017.
  6. C.-M. Wu, C.-W. Hsu, T.-K. Lee, and S. Smith, “A virtual reality keyboard with realistic haptic feedback in a fully immersive virtual environment,” Virtual Reality, vol. 21, no. 1, pp. 19–29, 2017.
  7. D. Cho, J. Ham, J. Oh, J. Park, S. Kim, N.-K. Lee, and B. Lee, “Detection of stress levels from biosignals measured in virtual reality environments using a kernel-based extreme learning machine,” Sensors, vol. 17, no. 10, p. 2435, 2017.
  8. K. Li, S. Mao, X. Li, Z. Wu, and H. Meng, “Automatic lexical stress and pitch accent detection for l2 english speech using multi-distribution deep neural networks,” Speech Communication, vol. 96, pp. 28–36, 2018.
  9. R. Maskeliunas, V. Raudonis, and R. Damasevicius, “Recognition of emotional vocalizations of canine,” Acta Acustica united with Acustica, vol. 104, no. 2, pp. 304–314, 2018.
  10. R. Shadiev, T.-T. Wu, and Y.-M. Huang, “Enhancing learning performance, attention, and meditation using a speech-to-text recognition application: Evidence from multiple data sources,” Interactive Learning Environments, vol. 25, no. 2, pp. 249–261, 2017.
  11. A. Venckauskas, A. Karpavicius, R. Damaševičius, R. Marcinkevičius, J. Kapočiūte-Dzikiené, and C. Napoli, “Open class authorship attribution of lithuanian internet comments using one-class classifier,” in Computer Science and Information Systems (FedCSIS), 2017 Federated Conference on. IEEE, 2017, pp. 373–382.
  12. M. S. Elmahdy and A. A. Morsy, “Subvocal speech recognition via close-talk microphone and surface electromyogram using deep learning,” in Computer Science and Information Systems (FedCSIS), 2017 Federated Conference on. IEEE, 2017, pp. 165–168.
  13. M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, “Subject independent facial expression recognition with robust face detection using a convolutional neural network,” Neural Networks, vol. 16, no. 5-6, pp. 555–559, 2003.
  14. M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring mid-level image representations using convolutional neural networks,” in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014, pp. 1717–1724.
  15. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.