An Approach to Customer Community Discovery
Jerzy Korczak, Maciej Pondel, Wiktor Sroka
DOI: http://dx.doi.org/10.15439/2019F308
Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 675–683 (2019)
Abstract. In the paper, a new multi-level hybrid method of community detection combining a density-based clustering with a label propagation method is proposed. Many algorithms have been applied to preprocess, visualize, cluster, and interpret the data describing customer behavior, among others DBSCAN, RFM, k-NN, UMAP, LPA. In the paper, two key algorithms have been detailed: DBSCAN and LPA. DBSCAN is a density-based clustering algorithm. However, managers usually find the clustering results too difficult to interpret and apply. To enhance the business value of clustering and create customer communities, the label propagation algorithm (LPA) has been proposed due to its quality and low computational complexity. The approach is validated on real life marketing database using advanced analytics platform Upsaily.
References
- Barber M. J. (2007). Modularity and community detection in bipartite networks, Physical Review E.,76(6):066102, http://dx.doi.org/10.1103/PhysRevE.76.066102
- Codaasco G., Gargano L. (2011). Label propagation algorithm: A semi-synchronous approach, Internat. Journal of Social Network Mining, 1(1):, pp.3-26, http://dx.doi.org/1504/IJSNM.2012.045103
- Gregory S. (2010). Finding overlapping communities in networks by label propagation, New J. Pys., 12, 103018, http://dx.doi.org/10.1088/1367-2630/12/10/103018
- Han J., Li W., Su Z, Zhao L. and Deng W. (2016). Community detection by label propagation with compression of flow, e-print https://arxiv.org/abs/161202463v1, http://dx.doi.org/10.1140/epjb/e2016-70264-6
- Liu W., Jiang X., Pellegrini M., Wang X. (2016). Discovering communities in complex networks by edge label propagation, Scientific Reports 6, http://dx.doi.org/10.1038/srep22470
- Rossetti G., Cazabet R. (2017). Community Discovery in Dynamic Networks: A Survey, https://arxiv.org/abs/1707.03186, http://dx.doi.org/10.1145/3172867
- Wu Z.H. et al. (2012). Balanced multi-label propagation for overlapping community detection in social networks, Journal of Comp. Sc. And technology, 27(3), pp. 468-479, http://dx.doi.org/10.1007/s11390-012-1236-x
- Subelj L., Bajec M. (2014). Group detection in complex networks: An algorithm and comparison of the state of the art, Physica A: statistical Mechanics and its Applications, 397, pp. 144-156, http://dx.doi.org/10.1016/j.physa.2013.12.003
- Gordon S., Linoff M., Berry J.A. (2011). Data Mining Techniques for Marketing, Sales, and Customer Relationship, Wiley, ISBN:978-0470650936
- Aggarwal C.C., Reddy C.K. (2013). Data Clustering: Algorithms and Applications, Chapman & Hall / CRC, ISBN:978-1466558212
- Gan G., Ma C., Wu J. (2007). Data Clustering: Theory, Algorithms, and Applications, SIAM Series, http://dx.doi.org/10.1137/1.9780898718348
- Pondel M., Korczak J. (2018). Recommendations Based on Collective Intelligence–Case of Customer Segmentation. In Information Technology for Management: Emerging Research and Applications (pp. 73-92). Springer, Cham, http://dx.doi.org/10.1007/978-3-030-15154-6_5
- Witten I.H. et al. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, ISBN:978-0128042915
- Raghavan U.N., Albert R., Kumara S. (2007). Near linear time algorithm to detect community structures in large-scale networks, Phys. Rev. E. 76,036106, http://dx.doi.org/10.1103/PhysRevE.76.036106
- Rosvall M., Bergstorm C. T. (2007). An information-theoric framework for resolving community structure in complex networks, Proc. Natl. Acad. Sci., 104, pp.7327-73-31, http://dx.doi.org/10.1073/pnas.0611034104
- Xie J.R., Szymanski B.K. (2014). LabelRank: a stabilized label propagation algorithm for community detection in networks, Proc IEEE, Network Science Workshop, pp.386-399, http://dx.doi.org/10.1109/NSW.2013.6609210
- Applebaum W. (1951). Studying customer behavior in retail stores. Journal of marketing, 16(2), 172-178, http://dx.doi.org/10.2307/1247625
- Clover V.T. (1950). Relative importance of impulse-buying in retail stores. Journal of marketing, 15(1), 66-70, http://dx.doi.org/10.1177/002224295001500110
- See-To E., Ngai E. (2019). An empirical study of payment technologies, the psychology of consumption, and spending behavior in a retailing context. Information & Management, 56(3), 329- 342,http://dx.doi.org/10.1016/j.im.2018.07.007
- Rustagi A. (2011). A Near Real-Time Personalization for eCommerce Platform. In International Workshop on Business Intelligence for the Real-Time Enterprise (pp. 109-117). Springer, Berlin, Heidelberg,http://dx.doi.org/10.1007/978-3-642-33500-6_8
- Kaptein M., Parvinen P. (2015). Advancing e-commerce personalization: Process framework and case study. International Journal of Electronic Commerce, 19(3), 7-33, http://dx.doi.org/10.1080/10864415.2015.1000216
- Campello R.J., Moulavi D., Sander J. (2013, April). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160-172). Springer, Berlin, Heidelberg, http://dx.doi.org/10.1007/978-3-642-37456-2_14
- Pondel M., Korczak J. (2017). A view on the methodology of analysis and exploration of marketing data. In: Federated Conference on co-algorithm to detect community structure in large-scale networks, Phys.Rev. E. 760360106 Computer Science and Information Systems (FedCSIS), IEEE, pp. 1135-1143, http://dx.doi.org/10.15439/2017F442
- Schubert E., Sander J., Ester M., Kriegel H.P., Xu X. (2017). DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), 42(3), 19, http://dx.doi.org/10.1145/3068335
- Ester M., Kriegel H.P., Sander J., Xu X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226-231), ISBN:1-57735-004-9
- McInnes L., Healy J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Preprint https://arxiv.org/abs/1802.03426
- Newman M.E.J. (2004). Detecting community structure in networks. Eur. Phys. J. B 38(2), 321-330, http://dx.doi.org/10.1140/epjb/e2004-00124-y
- Fortunato S. (2004). Community detection in graphs. Preprint https://arxiv.org/abs/0906.0612, http://dx.doi.org/10.1016/j.physrep.2009.11.002
- Emmons S., Kobourov S., Gallant M., Börner K. (2016). Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale. PLOS ONE 11(7): e0159161. http://dx.doi.org/10.1371/journal.pone.0159161
- Blondel V.D., Guillaume J.L., Lambiotte R., Lefebvre E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008(10):P10008. http://dx.doi.org/10.1088/1742-5468/2008/10/P10008
- Waltman L., Eck N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European Physical Journal B. 86(11):1–14. http://dx.doi.org/10.1140/epjb/e2013-40829-0
- Rosvall M., Bergstrom C.T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences. 105(4):1118–1123. http://dx.doi.org/10.1073/pnas.0706851105
- Zhu X., Ghahramani Z. (2002). Learning from labeled and unlabeled data with label propagation (p. 1). Technical Report CMU-CALD-02-107, Carnegie Mellon University