Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 11

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

A Hierarchical Approach for Sentiment Analysis and Categorization of Turkish Written Customer Relationship Management Data

,

DOI: http://dx.doi.org/10.15439/2017F204

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 361365 ()

Full text

Abstract. Today, large scale companies are receiving tens of thousands of feedback from their customers every day, which makes it impossible for them to evaluate the feedbacks manually. As sentiments expressed by the customers are vitally important for companies, an accurate and swift analysis is needed. In this paper, a hierarchical approach is proposed for sentiment analysis and further categorization of Turkish written customer feedback to a private airline company. First, the word embeddings of customer feedbacks are computed by using Word2Vec then averaged in proportion with the inverse of their frequency in the document. For binary sentiment analysis, i.e determination of 'positive' and 'negative' sentiments, an extreme gradient boosting (xgboost) classifier is trained on averaged review vectors and an overall accuracy of 92.5\% is obtained which is 16.8\% higher than that of the baseline model. For further categorization of negative sentiments in one of twelve pre determined classes, an xgboost classifier is trained upon document embeddings of negatively classified comments, which were calculated using Doc2Vec. An overall accuracy of 71.16\% is obtained for the task of categorization of 12 different classes using the Doc2Vec approach, thereby yielding a classification accuracy 19.1\% higher than that of the baseline model.

References

  1. Y.-C. Ku, C.-P. Wei, and H.-W. Hsiao, “To whom should i listen? finding reputable reviewers in opinion-sharing communities,” Decision Support Systems, vol. 53, no. 3, pp. 534–542, 2012.
  2. L. D. Peters, A. D. Pressey, and P. Greenberg, “The impact of crm 2.0 on customer insight,” Journal of Business & Industrial Marketing, vol. 25, no. 6, pp. 410–419, 2010.
  3. T. Miyoshi and Y. Nakagami, “Sentiment classification of customer reviews on electric products,” in Systems, Man and Cybernetics, 2007. ISIC. IEEE International Conference on. IEEE, 2007, pp. 2028–2033.
  4. P. Gunarathne, H. Rui, and A. Seidmann, “Customer service on social media: The effect of customer popularity and sentiment on airline response,” in System Sciences (HICSS), 2015 48th Hawaii International Conference on. IEEE, 2015, pp. 3288–3297.
  5. U. Erogul, “Sentiment analysis in turkish,” Middle East Technical University, Ms Thesis, Computer Engineering, 2009.
  6. M. Kaya, G. Fidan, and I. H. Toroslu, “Sentiment analysis of turkish political news,” in Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology-Volume 01. IEEE Computer Society, 2012, pp. 174–180.
  7. A. G. Vural, B. B. Cambazoglu, P. Senkul, and Z. O. Tokgoz, “A framework for sentiment analysis in turkish: Application to polarity detection of movie reviews in turkish,” in Computer and Information Sciences III. Springer, 2013, pp. 437–445.
  8. C. Türkmenoglu and A. C. Tantug, “Sentiment analysis in turkish media,” in Proceedings of Workshop on Issues of Sentiment Discovery and Opinion Mining, International Conference on Machine Learning (ICML), Beijing, China, 2014.
  9. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint https://arxiv.org/abs/1301.3781, 2013.
  10. Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 1188–1196.
  11. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  12. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
  13. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
  14. T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 785–794.
  15. M. U. Çakir and S. Güldamlasioglu, “Text mining analysis in turkish language using big data tools,” in Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, vol. 1. IEEE, 2016, pp. 614–618.
  16. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
  17. A. A. Akın and M. D. Akın, “Zemberek, an open source nlp framework for turkic languages,” Structure, vol. 10, pp. 1–5, 2007.
  18. Baturman, “Lemmatization in turkish language,” https://github.com/baturman/turkish-lemmatizer/wiki/Lemmatization-in-Turkish-Language, 2013.
  19. Y. Goldberg and O. Levy, “word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method,” arXiv preprint https://arxiv.org/abs/1402.3722, 2014.
  20. R. Řhůřek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA, May 2010, pp. 45–50, http://is.muni.cz/publication/884893/en.
  21. Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2-3, pp. 146–162, 1954.