Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 11

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

PitchKeywordExtractor: Prosody-based Automatic Keyword Extraction for Speech Content

, , , ,

DOI: http://dx.doi.org/10.15439/2017F326

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 265269 ()

Full text

Abstract. Keyword extraction is widely used for information indexing, compressing, summarizing, etc. Existing keyword extraction techniques apply various text-based algorithms and metrics to locate the keywords. At the same time, some types of audio and audiovisual content, e. g. lectures, talks, interviews and other speech-oriented information, allow to perform keyword search by prosodic accents made by a speaker. This paper presents PitchKeywordExtractor - an algorithm with its software prototype for prosody-based automatic keyword extraction in speech content. It operates together with a third-party automatic speech recognition system, handles speech prosody by a pitch detection algorithm and locates the keywords using pitch contour cross-correlation with four tone units taken from D. Brazil discourse intonation model.


  1. M. Scott and C. Tribble, Textual patterns: Key words and corpus analysis in language education. John Benjamins Publishing, 2006, vol. 22.
  2. B. Lott, “Survey of keyword extraction techniques,” UNM Education, 2012.
  3. S. K. Bharti and K. S. Babu, “Automatic keyword extraction for text summarization: A survey,” arXiv preprint https://arxiv.org/abs/1704.03242, 2017. [Online]. Available: http://adsabs.harvard.edu/abs/2017arXiv170403242B
  4. S. Rose, D. Engel, N. Cramer, and W. Cowley, “Automatic keyword extraction from individual documents,” Text Mining, pp. 1–20, 2010. http://dx.doi.org/10.1002/9780470689646.ch1
  5. Z. Xue, D. Zhang, J. Guo, and J. Hao, “Apparatus and method for extracting keywords from a single document,” Mar. 30 2017, uS Patent 20,170,091,318.
  6. T. Ö. Suzek, “Using latent semantic analysis for automated keyword extraction from large document corpora.”
  7. S. K. B. Reddy Naidu, K. S. Babu, and R. K. Mohapatra, “Text summarization with automatic keyword extraction in telugu e-newspapers.” http://dx.doi.org/10.1145/2980258.2980442. [Online]. Available: https://doi.org/10.1145/2980258.2980442
  8. T. Weerasooriya, N. Perera, and S. Liyanage, “A method to extract essential keywords from a tweet using nlp tools,” in Advances in ICT for Emerging Regions (ICTer), 2016 Sixteenth International Conference on. IEEE, 2016. http://dx.doi.org/10.1109/ICTER.2016.7829895 pp. 29–34. [Online]. Available: https://doi.org/10.1109/ICTER.2016.7829895
  9. W. I. Grosky and T. L. Ruas, “The continuing reinvention of content-based retrieval: Multimedia is not dead,” IEEE MultiMedia, vol. 24, no. 1, pp. 6–11, 2017. http://dx.doi.org/10.1109/MMUL.2017.7. [Online]. Available: https://doi.org/10.1109/MMUL.2017.7
  10. E. Pyshkin and V. Klyuev, “On document evaluation for better context-aware summary generation,” in Aware Computing (ISAC), 2010 2nd International Symposium on. IEEE, 2010. http://dx.doi.org/10.1109/ISAC.2010.5670465 pp. 116–121. [Online]. Available: https://doi.org/10.1109/ISAC.2010.5670465
  11. S. Beliga, “Keyword extraction techniques,” 2016.
  12. P. Meladianos, A. J.-P. Tixier, G. Nikolentzos, and M. Vazirgiannis, “Real-time keyword extraction from conversations,” EACL 2017, p. 462, 2017.
  13. K. Elakiya and A. Sahayadhas, “Keyword extraction from multiple words for report recommendations in media wiki,” in IOP Conference Series: Materials Science and Engineering, vol. 183, no. 1. IOP Publishing, 2017. http://dx.doi.org/10.1088/1757-899X/183/1/012029 p. 012029. [Online]. Available: http://dx.doi.org/10.1088/1757-899X/183/1/012029
  14. G. Alharbi, “Metadiscourse tagging in academic lectures,” Ph.D. disser tation, University of Sheffield, 2016.
  15. D. Brazil et al., Discourse intonation and language teaching. ERIC, 1980.
  16. D. Brazil, “Phonology: Intonation in discourse,” Handbook of discourse analysis, vol. 2, pp. 57–75, 1985.
  17. M. Coulthard and D. Brazil, The place of intonation in the description of interaction. Linguistic Agency University of Trier, 1981.
  18. J. Six, O. Cornelis, and M. Leman, “Tarsosdsp, a real-time audio processing framework in java,” in Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society, 2014. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=17089
  19. D. M. Chun, “Signal analysis software for teaching discourse intonation,” Language Learning & Technology, vol. 2, no. 1, pp. 61–77, 1998.
  20. A. Klapuri, “A method for visualizing the pitch content of polyphonic music signals.” in ISMIR. Citeseer, 2009, pp. 615–620.
  21. Á. Abuczki, “Annotation procedures, feature extraction and query options,” of Electronic Information and Document Processing, p. 81. http://dx.doi.org/10.1109/IEMBS.2008.4649799. [Online]. Available: https://doi.org/10.1109/IEMBS.2008.4649799
  22. P. Roach, “Techniques for the phonetic description of emotional speech,” in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, 2000. http://dx.doi.org/10.1016/S0167-6393(02)00070-5. [Online]. Available: http://dx.doi.org/10.1016/S0167-6393(02)00070-5
  23. A. Meftah, Y. Alotaibi, and S.-A. Selouani, “Emotional speech recognition: A multilingual perspective,” in Bio-engineering for Smart Technologies (BioSMART), 2016 International Conference on. IEEE, 2016. http://dx.doi.org/10.1109/BIOSMART.2016.7835600 pp. 1–4. [Online]. Available: https://doi.org/10.1109/BIOSMART.2016.7835600
  24. M. Warren, “A corpus-driven analysis of the use of intonation to assert dominance and control,” Language and Computers, vol. 52, no. 1, pp. 21–33, 2004. http://dx.doi.org/10.1163/9789004333772_003. [Online]. Available: https://doi.org/10.1163/9789004333772_003
  25. J. K. Bock and J. R. Mazzella, “Intonational marking of given and new information: Some consequences for comprehension,” Memory & Cognition, vol. 11, no. 1, pp. 64–76, 1983. http://dx.doi.org/10.3758/BF03197663. [Online]. Available: https://doi.org/10.3758/BF03197663
  26. A. De Cheveigné and H. Kawahara, “Yin, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917–1930, 2002. http://dx.doi.org/10.1121/1.1458024. [Online]. Available: https://doi.org/10.1121/1.1458024
  27. K.-F. Lee, H.-W. Hon, and R. Reddy, “An overview of the sphinx speech recognition system,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 1, pp. 35–45, 1990. doi: 10.1109/29.45616. [Online]. Available: https://doi.org/10.1109/29.45616
  28. T. Huang, G. Yang, and G. Tang, “A fast two-dimensional median filtering algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 1, pp. 13–18, 1979. http://dx.doi.org/10.1109/TASSP.1979.1163188. [Online]. Available: https://doi.org/10.1109/TASSP.1979.1163188