PitchKeywordExtractor: Prosody-based Automatic Keyword Extraction for Speech Content

Yurij Lezhenin; Artyom Zhuikov; Natalia Bogach; Elena Boitsova; Evgeny Pyshkin

PitchKeywordExtractor: Prosody-based Automatic Keyword Extraction for Speech Content

Yurij Lezhenin, Artyom Zhuikov, Natalia Bogach, Elena Boitsova, Evgeny Pyshkin

DOI: http://dx.doi.org/10.15439/2017F326

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 265–269 (2017)

Full text

Abstract. Keyword extraction is widely used for information indexing, compressing, summarizing, etc. Existing keyword extraction techniques apply various text-based algorithms and metrics to locate the keywords. At the same time, some types of audio and audiovisual content, e. g. lectures, talks, interviews and other speech-oriented information, allow to perform keyword search by prosodic accents made by a speaker. This paper presents PitchKeywordExtractor - an algorithm with its software prototype for prosody-based automatic keyword extraction in speech content. It operates together with a third-party automatic speech recognition system, handles speech prosody by a pitch detection algorithm and locates the keywords using pitch contour cross-correlation with four tone units taken from D. Brazil discourse intonation model.

References

M. Scott and C. Tribble, Textual patterns: Key words and corpus analysis in language education. John Benjamins Publishing, 2006, vol. 22.
B. Lott, “Survey of keyword extraction techniques,” UNM Education, 2012.
S. K. Bharti and K. S. Babu, “Automatic keyword extraction for text summarization: A survey,” arXiv preprint https://arxiv.org/abs/1704.03242, 2017. [Online]. Available: http://adsabs.harvard.edu/abs/2017arXiv170403242B
S. Rose, D. Engel, N. Cramer, and W. Cowley, “Automatic keyword extraction from individual documents,” Text Mining, pp. 1–20, 2010. http://dx.doi.org/10.1002/9780470689646.ch1
Z. Xue, D. Zhang, J. Guo, and J. Hao, “Apparatus and method for extracting keywords from a single document,” Mar. 30 2017, uS Patent 20,170,091,318.
T. Ö. Suzek, “Using latent semantic analysis for automated keyword extraction from large document corpora.”
S. K. B. Reddy Naidu, K. S. Babu, and R. K. Mohapatra, “Text summarization with automatic keyword extraction in telugu e-newspapers.” http://dx.doi.org/10.1145/2980258.2980442. [Online]. Available: https://doi.org/10.1145/2980258.2980442
T. Weerasooriya, N. Perera, and S. Liyanage, “A method to extract essential keywords from a tweet using nlp tools,” in Advances in ICT for Emerging Regions (ICTer), 2016 Sixteenth International Conference on. IEEE, 2016. http://dx.doi.org/10.1109/ICTER.2016.7829895 pp. 29–34. [Online]. Available: https://doi.org/10.1109/ICTER.2016.7829895
W. I. Grosky and T. L. Ruas, “The continuing reinvention of content-based retrieval: Multimedia is not dead,” IEEE MultiMedia, vol. 24, no. 1, pp. 6–11, 2017. http://dx.doi.org/10.1109/MMUL.2017.7. [Online]. Available: https://doi.org/10.1109/MMUL.2017.7
E. Pyshkin and V. Klyuev, “On document evaluation for better context-aware summary generation,” in Aware Computing (ISAC), 2010 2nd International Symposium on. IEEE, 2010. http://dx.doi.org/10.1109/ISAC.2010.5670465 pp. 116–121. [Online]. Available: https://doi.org/10.1109/ISAC.2010.5670465
S. Beliga, “Keyword extraction techniques,” 2016.
P. Meladianos, A. J.-P. Tixier, G. Nikolentzos, and M. Vazirgiannis, “Real-time keyword extraction from conversations,” EACL 2017, p. 462, 2017.
K. Elakiya and A. Sahayadhas, “Keyword extraction from multiple words for report recommendations in media wiki,” in IOP Conference Series: Materials Science and Engineering, vol. 183, no. 1. IOP Publishing, 2017. http://dx.doi.org/10.1088/1757-899X/183/1/012029 p. 012029. [Online]. Available: http://dx.doi.org/10.1088/1757-899X/183/1/012029
G. Alharbi, “Metadiscourse tagging in academic lectures,” Ph.D. disser tation, University of Sheffield, 2016.
D. Brazil et al., Discourse intonation and language teaching. ERIC, 1980.
D. Brazil, “Phonology: Intonation in discourse,” Handbook of discourse analysis, vol. 2, pp. 57–75, 1985.
M. Coulthard and D. Brazil, The place of intonation in the description of interaction. Linguistic Agency University of Trier, 1981.
J. Six, O. Cornelis, and M. Leman, “Tarsosdsp, a real-time audio processing framework in java,” in Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society, 2014. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=17089
D. M. Chun, “Signal analysis software for teaching discourse intonation,” Language Learning & Technology, vol. 2, no. 1, pp. 61–77, 1998.
A. Klapuri, “A method for visualizing the pitch content of polyphonic music signals.” in ISMIR. Citeseer, 2009, pp. 615–620.
Á. Abuczki, “Annotation procedures, feature extraction and query options,” of Electronic Information and Document Processing, p. 81. http://dx.doi.org/10.1109/IEMBS.2008.4649799. [Online]. Available: https://doi.org/10.1109/IEMBS.2008.4649799
P. Roach, “Techniques for the phonetic description of emotional speech,” in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion, 2000. http://dx.doi.org/10.1016/S0167-6393(02)00070-5. [Online]. Available: http://dx.doi.org/10.1016/S0167-6393(02)00070-5
A. Meftah, Y. Alotaibi, and S.-A. Selouani, “Emotional speech recognition: A multilingual perspective,” in Bio-engineering for Smart Technologies (BioSMART), 2016 International Conference on. IEEE, 2016. http://dx.doi.org/10.1109/BIOSMART.2016.7835600 pp. 1–4. [Online]. Available: https://doi.org/10.1109/BIOSMART.2016.7835600
M. Warren, “A corpus-driven analysis of the use of intonation to assert dominance and control,” Language and Computers, vol. 52, no. 1, pp. 21–33, 2004. http://dx.doi.org/10.1163/9789004333772_003. [Online]. Available: https://doi.org/10.1163/9789004333772_003
J. K. Bock and J. R. Mazzella, “Intonational marking of given and new information: Some consequences for comprehension,” Memory & Cognition, vol. 11, no. 1, pp. 64–76, 1983. http://dx.doi.org/10.3758/BF03197663. [Online]. Available: https://doi.org/10.3758/BF03197663
A. De Cheveigné and H. Kawahara, “Yin, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917–1930, 2002. http://dx.doi.org/10.1121/1.1458024. [Online]. Available: https://doi.org/10.1121/1.1458024
K.-F. Lee, H.-W. Hon, and R. Reddy, “An overview of the sphinx speech recognition system,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 1, pp. 35–45, 1990. doi: 10.1109/29.45616. [Online]. Available: https://doi.org/10.1109/29.45616
T. Huang, G. Yang, and G. Tang, “A fast two-dimensional median filtering algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 1, pp. 13–18, 1979. http://dx.doi.org/10.1109/TASSP.1979.1163188. [Online]. Available: https://doi.org/10.1109/TASSP.1979.1163188