An Experimental Analysis on Scalable Implementations of the Alternating Least Squares Algorithm
Dânia Meira, José Viterbo, Flavia Bernardini
DOI: http://dx.doi.org/10.15439/2018F166
Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 351–359 (2018)
Abstract. The use of latent factor models technique overcomes two major problems of most collaborative filtering approaches: scalability and sparseness of the user's profile matrix. The most successful realizations of latent factor models are based on matrix factorization. Among the algorithms for matrix factorization, alternating least squares (ALS) stands out due to its easily parallelizable computations. In this work we propose a methodology for comparing the performance of two parallel implementations of the ALS algorithm, one executed with MapReduce in Apache Hadoop framework and another executed in Apache Spark framework. We performed experiments to evaluate the accuracy of generated recommendations and the execution time of both algorithms, using publicly available datasets with different sizes and from different recommendation domains. Experimental results show that running the recommendation algorithm on Spark framework is in fact more efficient, once it provides in-memory processing, in contrast to Hadoop's two-stage disk-based MapReduce paradigm.
References
- F. Cacheda, V. Carneiro, D. Fernández, and V. Formoso, “Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems,” ACM Trans. Web, vol. 5, no. 1, pp. 2:1–2:33, Feb. 2011. http://dx.doi.org/10.1145/1921591.1921593.
- B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the 10th International Conference on World Wide Web, ser. WWW ’01. New York, NY, USA: ACM, 2001. http://dx.doi.org/10.1145/371920.372071. ISBN 1-58113-348-0 pp. 285–295.
- Z. Huang, H. Chen, and D. Zeng, “Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering,” ACM Trans. Inf. Syst., vol. 22, no. 1, pp. 116–142, Jan. 2004. http://dx.doi.org/10.1145/963770.963775.
- J. S. Breese, D. Heckerman, and C. Kadie, “Empirical analysis of predictive algorithms for collaborative filtering,” in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, ser. UAI’98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998. ISBN 1-55860-555-X pp. 43–52. [Online]. Available: http://dl.acm.org/citation.cfm?id=2074094.2074100
- R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted boltzmann machines for collaborative filtering,” in Proceedings of the 24th International Conference on Machine Learning, ser. ICML ’07. New York, NY, USA: ACM, 2007. http://dx.doi.org/10.1145/1273496.1273596. ISBN 978-1-59593-793-3 pp. 791–798. [Online]. Available: http: //doi.acm.org/10.1145/1273496.1273596
- B. M. Sarwar, G. Karypis, J. A. Konstan, and J. T. Riedl, “Application of dimensionality reduction in recommender system – a case study,” in WebKDD Workshop, held in conjunction with the ACM-SIGKDD Conference on Knowledge Discovery in Databases (KDD’2000), 2000.
- Spotify, “Spotify,” n.d., available at https://www.spotify.com/.
- C. Johnson, “Scala data pipelines for music recommendations,” 2015, available at http://www.slideshare.net/MrChrisJohnson/scala-data-pipelines-for-music-recommendations.
- Amazon.com, “Amazon.com,” n.d., available at http://www.amazon.com/.
- ExportX, “How many (more) products does amazon sell?” 2014, available at http://export-x.com/2014/08/14/many-products-amazon-sell-2.
- Statista, “Number of worldwide active amazon customer accounts from 1997 to 2014 (in millions),” 2014, available at http://www.statista.com/statistics/237810/number-of-active-amazon-customer-accounts-worldwide/.
- Y. Koren and R. Bell, “Advances in collaborative filtering,” in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. Kantor, Eds. Springer, 2011, pp. 33–48.
- G. Takács, I. Pilászy, B. Németh, and D. Tikk, “Using visual representations of data to enhance sensemaking in data exploration tasks.” Journal of Machine Learning Research, vol. 10, pp. 623–656, 2009.
- Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, Aug. 2009. http://dx.doi.org/10.1109/MC.2009.263. [Online]. Available: http://dx.doi.org/10.1109/MC.2009.263
- R. Bell, Y. Koren, and C. Volinsky, “Modeling relationships at multiple scales to improve accuracy of large recommender systems,” in Proc. 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’07. New York, NY, USA: ACM, 2007. http://dx.doi.org/10.1145/1281192.1281206. ISBN 978-1-59593-609-7 pp. 95–104. [Online]. Available: http://doi.acm.org/10.1145/1281192.1281206
- S. Schelter, C. Boden, M. Schenck, A. Alexandrov, and V. Markl, “Distributed matrix factorization with mapreduce using a series of broadcast-joins,” in Proceedings of the 7th ACM Conference on Recommender Systems, ser. RecSys ’13. New York, NY, USA: ACM, 2013. http://dx.doi.org/10.1145/2507157.2507195. ISBN 978-1-4503-2409-0 pp. 281–284. [Online]. Available: http://doi.acm.org/10.1145/2507157.2507195
- The Apache Software Foundation, “Apache hadoop,” n.d., available at http://hadoop.apache.org/.
- Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan, “Large-scale parallel collaborative filtering for the netflix prize,” in Algorithmic Aspects in Information and Management. AAIM 2008. LNCS, vol 5034, R. Fleischer and J. Xu, Eds. Springer, 2008.
- The Apache Software Foundation, “MLlib,” n.d., available at http://spark.apache.org/mllib/.
- J. Chen, J. Fang, W. Liu, T. Tang, X. Chen, and C. Yang, “Efficient and portable als matrix factorization for recommender systems,” in 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2017. http://dx.doi.org/10.1109/IPDPSW.2017.91 pp. 409–418.
- C. Enrique, T. Alexander, C. Héctor, J. G. Francisco, G. Felipe, B. Belén, and A. Diego, “In-memory distributed software solution to improve the performance of recommender systems,” Software: Practice and Experience, vol. 47, no. 6, pp. 867–889, 2017. http://dx.doi.org/10.1002/spe.2467. [Online]. Available: https://onlinelibrary.wiley.com/ doi/abs/10.1002/spe.2467
- D. Kim and B.-J. Yum, “Collaborative filtering based on iterative principal component analysis,” Expert Syst. Appl., vol. 28, no. 4, pp. 823–830, May 2005. http://dx.doi.org/10.1016/j.eswa.2004.12.037. [Online]. Available: http://dx.doi.org/10.1016/j.eswa.2004.12.037
- R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in Proceedings of the 20th International Conference on Neural Information Processing Systems, ser. NIPS’07. USA: Curran Associates Inc., 2007. ISBN 978-1-60560-352-0 pp. 1257–1264. [Online]. Available: http://dl.acm.org/citation.cfm?id=2981562.2981720
- A. N. Tikhonov, “Solution of incorrectly formulated problems and the regularization method,” Soviet Mathematics, vol. 4, p. 1035–1038, 1963.
- S. Funk, “Netflix update: Try this at home,” 2006, available at http://sifter.org/~simon/journal/20061211.html.
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets,” in Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, ser. HotCloud’10. Berkeley, CA, USA: USENIX Association, 2010, pp. 10–10. [Online]. Available: http://dl.acm.org/citation.cfm?id=1863103.1863113
 
