Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 18

Proceedings of the 2019 Federated Conference on Computer Science and Information Systems

An Approach to Customer Community Discovery

, ,

DOI: http://dx.doi.org/10.15439/2019F308

Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 675683 ()

Full text

Abstract. In the paper, a new multi-level hybrid method of community detection combining a density-based clustering with a label propagation method is proposed. Many algorithms have been applied to preprocess, visualize, cluster, and interpret the data describing customer behavior, among others DBSCAN, RFM, k-NN, UMAP, LPA. In the paper, two key algorithms have been detailed: DBSCAN and LPA. DBSCAN is a density-based clustering algorithm. However, managers usually find the clustering results too difficult to interpret and apply. To enhance the business value of clustering and create customer communities, the label propagation algorithm (LPA) has been proposed due to its quality and low computational complexity. The approach is validated on real life marketing database using advanced analytics platform Upsaily.


  1. Barber M. J. (2007). Modularity and community detection in bipartite networks, Physical Review E.,76(6):066102, http://dx.doi.org/10.1103/PhysRevE.76.066102
  2. Codaasco G., Gargano L. (2011). Label propagation algorithm: A semi-synchronous approach, Internat. Journal of Social Network Mining, 1(1):, pp.3-26, http://dx.doi.org/1504/IJSNM.2012.045103
  3. Gregory S. (2010). Finding overlapping communities in networks by label propagation, New J. Pys., 12, 103018, http://dx.doi.org/10.1088/1367-2630/12/10/103018
  4. Han J., Li W., Su Z, Zhao L. and Deng W. (2016). Community detection by label propagation with compression of flow, e-print https://arxiv.org/abs/161202463v1, http://dx.doi.org/10.1140/epjb/e2016-70264-6
  5. Liu W., Jiang X., Pellegrini M., Wang X. (2016). Discovering communities in complex networks by edge label propagation, Scientific Reports 6, http://dx.doi.org/10.1038/srep22470
  6. Rossetti G., Cazabet R. (2017). Community Discovery in Dynamic Networks: A Survey, https://arxiv.org/abs/1707.03186, http://dx.doi.org/10.1145/3172867
  7. Wu Z.H. et al. (2012). Balanced multi-label propagation for overlapping community detection in social networks, Journal of Comp. Sc. And technology, 27(3), pp. 468-479, http://dx.doi.org/10.1007/s11390-012-1236-x
  8. Subelj L., Bajec M. (2014). Group detection in complex networks: An algorithm and comparison of the state of the art, Physica A: statistical Mechanics and its Applications, 397, pp. 144-156, http://dx.doi.org/10.1016/j.physa.2013.12.003
  9. Gordon S., Linoff M., Berry J.A. (2011). Data Mining Techniques for Marketing, Sales, and Customer Relationship, Wiley, ISBN:978-0470650936
  10. Aggarwal C.C., Reddy C.K. (2013). Data Clustering: Algorithms and Applications, Chapman & Hall / CRC, ISBN:978-1466558212
  11. Gan G., Ma C., Wu J. (2007). Data Clustering: Theory, Algorithms, and Applications, SIAM Series, http://dx.doi.org/10.1137/1.9780898718348
  12. Pondel M., Korczak J. (2018). Recommendations Based on Collective Intelligence–Case of Customer Segmentation. In Information Technology for Management: Emerging Research and Applications (pp. 73-92). Springer, Cham, http://dx.doi.org/10.1007/978-3-030-15154-6_5
  13. Witten I.H. et al. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, ISBN:978-0128042915
  14. Raghavan U.N., Albert R., Kumara S. (2007). Near linear time algorithm to detect community structures in large-scale networks, Phys. Rev. E. 76,036106, http://dx.doi.org/10.1103/PhysRevE.76.036106
  15. Rosvall M., Bergstorm C. T. (2007). An information-theoric framework for resolving community structure in complex networks, Proc. Natl. Acad. Sci., 104, pp.7327-73-31, http://dx.doi.org/10.1073/pnas.0611034104
  16. Xie J.R., Szymanski B.K. (2014). LabelRank: a stabilized label propagation algorithm for community detection in networks, Proc IEEE, Network Science Workshop, pp.386-399, http://dx.doi.org/10.1109/NSW.2013.6609210
  17. Applebaum W. (1951). Studying customer behavior in retail stores. Journal of marketing, 16(2), 172-178, http://dx.doi.org/10.2307/1247625
  18. Clover V.T. (1950). Relative importance of impulse-buying in retail stores. Journal of marketing, 15(1), 66-70, http://dx.doi.org/10.1177/002224295001500110
  19. See-To E., Ngai E. (2019). An empirical study of payment technologies, the psychology of consumption, and spending behavior in a retailing context. Information & Management, 56(3), 329- 342,http://dx.doi.org/10.1016/j.im.2018.07.007
  20. Rustagi A. (2011). A Near Real-Time Personalization for eCommerce Platform. In International Workshop on Business Intelligence for the Real-Time Enterprise (pp. 109-117). Springer, Berlin, Heidelberg,http://dx.doi.org/10.1007/978-3-642-33500-6_8
  21. Kaptein M., Parvinen P. (2015). Advancing e-commerce personalization: Process framework and case study. International Journal of Electronic Commerce, 19(3), 7-33, http://dx.doi.org/10.1080/10864415.2015.1000216
  22. Campello R.J., Moulavi D., Sander J. (2013, April). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160-172). Springer, Berlin, Heidelberg, http://dx.doi.org/10.1007/978-3-642-37456-2_14
  23. Pondel M., Korczak J. (2017). A view on the methodology of analysis and exploration of marketing data. In: Federated Conference on co-algorithm to detect community structure in large-scale networks, Phys.Rev. E. 760360106 Computer Science and Information Systems (FedCSIS), IEEE, pp. 1135-1143, http://dx.doi.org/10.15439/2017F442
  24. Schubert E., Sander J., Ester M., Kriegel H.P., Xu X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), 42(3), 19, http://dx.doi.org/10.1145/3068335
  25. Ester M., Kriegel H.P., Sander J., Xu X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226-231), ISBN:1-57735-004-9
  26. McInnes L., Healy J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Preprint https://arxiv.org/abs/1802.03426
  27. Newman M.E.J. (2004). Detecting community structure in networks. Eur. Phys. J. B 38(2), 321-330, http://dx.doi.org/10.1140/epjb/e2004-00124-y
  28. Fortunato S. (2004). Community detection in graphs. Preprint https://arxiv.org/abs/0906.0612, http://dx.doi.org/10.1016/j.physrep.2009.11.002
  29. Emmons S., Kobourov S., Gallant M., Börner K. (2016). Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale. PLOS ONE 11(7): e0159161. http://dx.doi.org/10.1371/journal.pone.0159161
  30. Blondel V.D., Guillaume J.L., Lambiotte R., Lefebvre E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008(10):P10008. http://dx.doi.org/10.1088/1742-5468/2008/10/P10008
  31. Waltman L., Eck N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European Physical Journal B. 86(11):1–14. http://dx.doi.org/10.1140/epjb/e2013-40829-0
  32. Rosvall M., Bergstrom C.T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences. 105(4):1118–1123. http://dx.doi.org/10.1073/pnas.0706851105
  33. Zhu X., Ghahramani Z. (2002). Learning from labeled and unlabeled data with label propagation (p. 1). Technical Report CMU-CALD-02-107, Carnegie Mellon University