Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 11

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

On Memory Footprints of Partitioned Sparse Matrices

,

DOI: http://dx.doi.org/10.15439/2017F70

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 513521 ()

Full text

Abstract. The presented study analyses 563 representative benchmark sparse matrices with respect to their partitioning into uniformly-sized blocks. The aim is to minimize memory footprints of matrices. Different block sizes and different ways of storing blocks in memory are considered and statistically evaluated. Memory footprints of partitioned matrices are additionally compared with lower bounds and the CSR storage format. The average measured memory savings against CSR in case of single and double precision are 42.3 and 28.7 percents, respectively. The corresponding worst-case savings are 25.5 and 17.1 percents. Moreover, memory footprints of partitioned matrices were in average 5 times closer to their lower bounds than CSR. Based on the obtained results, we provide generic suggestions for efficient partitioning and storage of sparse matrices in a computer memory.

References

  1. D. Langr, I. Šimeček, and P. Tvrdı́k, “Storing sparse matrices in the adaptive-blocking hierarchical storage format,” in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2013). IEEE Xplore Digital Library, 2013, pp. 479–486.
  2. D. Langr, “Algorithms and data structures for very large sparse matrices,” Ph.D. dissertation, Czech Technical University in Prague, 2014.
  3. G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris, “Performance evaluation of the sparse matrix-vector multiplication on modern architectures,” The Journal of Supercomputing, vol. 50, no. 1, pp. 36–77, 2009. http://dx.doi.org/10.1007/s11227-008-0251-8
  4. M. Belgin, G. Back, and C. J. Ribbens, “Pattern-based sparse matrix representation for memory-efficient SMVM kernels,” in Proceedings of the 23rd International Conference on Supercomputing, ser. ICS ’09. New York, NY, USA: ACM, 2009. http://dx.doi.org/10.1145/1542275.1542294. ISBN 978-1-60558-498-0 pp. 100–109.
  5. M. Belgin, G. Back, and C. J. Ribbens, “A library for pattern-based sparse matrix vector multiply,” International Journal of Parallel Programming, vol. 39, no. 1, pp. 62–87, 2011. http://dx.doi.org/10.1007/s10766-010-0145-2
  6. G. E. Blelloch, M. A. Heroux, and M. Zagha, “Segmented operations for sparse matrix computation on vector multiprocessors,” School of Computer Science, Carnegie Mellon University, Tech. Rep. CMU-CS-93-173, 1993.
  7. A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, “Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks,” in Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures, ser. SPAA ’09. New York, NY, USA: ACM, 2009. http://dx.doi.org/10.1145/1583991.1584053. ISBN 978-1-60558-606-9 pp. 233–244.
  8. A. Buluç, S. Williams, L. Oliker, and J. Demmel, “Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication,” in Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, ser. IPDPS ’11. IEEE Computer Society, 2011. http://dx.doi.org/10.1109/IPDPS.2011.73 pp. 721–733.
  9. D. Buono, F. Petrini, F. Checconi, X. Liu, X. Que, C. Long, and T.-C. Tuan, “Optimizing sparse matrix-vector multiplication for large-scale data analytics,” in Proceedings of the 2016 International Conference on Supercomputing, ser. ICS ’16. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2925426.2926278 pp. 37:1–37:12.
  10. J.-H. Byun, R. Lin, K. A. Yelick, and J. Demmel, “Autotuning sparse matrix-vector multiplication for multicore,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2012-215, 2012.
  11. J. W. Choi, A. Singh, and R. W. Vuduc, “Model-driven autotuning of sparse matrix-vector multiply on GPUs,” in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’10. New York, NY, USA: ACM, 2010. http://dx.doi.org/10.1145/1693453.1693471 pp. 115–126.
  12. R. Eberhardt and M. Hoemmen, “Optimization of block sparse matrix-vector multiplication on shared-memory parallel architectures,” in Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016. http://dx.doi.org/10.1109/IPDPSW.2016.42 pp. 663–672.
  13. E.-J. Im and K. Yelick, “Optimizing sparse matrix computations for register reuse in SPARSITY,” in Proceedings of the International Conference on Computational Science (ICCS 2001), Part I, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2001, vol. 2073, pp. 127–136.
  14. E.-J. Im, K. Yelick, and R. Vuduc, “Sparsity: Optimization framework for sparse matrix kernels,” International Journal of High Performance Computing Applications, vol. 18, no. 1, pp. 135–158, 2004. doi: 10.1177/1094342004041296
  15. R. Kannan, “Efficient sparse matrix multiple-vector multiplication using a bitmapped format,” in 20th Annual International Conference on High Performance Computing, 2013. http://dx.doi.org/10.1109/HiPC.2013.6799135 pp. 286–294.
  16. V. Karakasis, G. Goumas, and N. Koziris, “A comparative study of blocking storage methods for sparse matrices on multicore architectures,” in Proceedings of the 2009 International Conference on Computational Science and Engineering (CSE ’09)., vol. 1, Aug 2009. http://dx.doi.org/10.1109/CSE.2009.223 pp. 247–256.
  17. D. Langr, I. Šimeček, P. Tvrdı́k, T. Dytrych, and J. P. Draayer, “Adaptiveblocking hierarchical storage format for sparse matrices,” in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2012). IEEE Xplore Digital Library, 2012, pp. 545–551.
  18. D. Langr and P. Tvrdı́k, “Evaluation criteria for sparse matrix storage formats,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 2, pp. 428–440, 2016. http://dx.doi.org/10.1109/TPDS.2015.2401575
  19. R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick, “Performance modeling and analysis of cache blocking in sparse matrix vector multiply,” Computer Science Division (EECS), University of California, Tech. Rep. UCB/CSD-04-1335, 2004.
  20. ——, “When cache blocking of sparse matrix vector multiply works and why,” Applicable Algebra in Engineering, Communication and Computing, vol. 18, no. 3, pp. 297–311, 2007. http://dx.doi.org/10.1007/s00200-007-0038-9
  21. I. Šimeček, D. Langr, and P. Tvrdı́k, “Space-efficient sparse matrix storage formats for massively parallel systems,” in Proceedings of the 14th IEEE International Conference of High Performance Computing and Communications (HPCC 2012). IEEE Computer Society, 2012. http://dx.doi.org/10.1109/HPCC.2012.18 pp. 54–60.
  22. I. Šimeček and D. Langr, “Space and execution efficient formats for modern processor architectures,” in Proceedings of the 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2015). IEEE Computer Society, 2015. http://dx.doi.org/10.1109/SYNASC.2015.24 pp. 98–105.
  23. F. S. Smailbegovic, G. N. Gaydadjiev, and S. Vassiliadis, “Sparse Matrix Storage Format,” in Proceedings of the 16th Annual Workshop on Circuits, Systems and Signal Processing, ProRisc 2005, 2005, pp. 445–448.
  24. P. Stathis, S. Vassiliadis, and S. Cotofana, “A hierarchical sparse matrix storage format for vector processors,” in Proceedings of the 17th International Symposium on Parallel and Distributed Processing, ser. IPDPS ’03. Washington, DC, USA: IEEE Computer Society, 2003, p. 61.
  25. P. Tvrdı́k and I. Šimeček, “A new diagonal blocking format and model of cache behavior for sparse matrices,” in Proceedings of the 6th International Conference on Parallel Processing and Applied Mathematics (PPAM 2005), ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2006, vol. 3911, pp. 164–171.
  26. S. Williams, A. Waterman, and D. Patterson, “Roofline: An insightful visual performance model for multicore architectures,” Commun. ACM, vol. 52, no. 4, pp. 65–76, 2009. http://dx.doi.org/10.1145/1498765.1498785
  27. T. A. Davis and Y. F. Hu, “The University of Florida Sparse Matrix Collection,” ACM Transactions on Mathematical Software, vol. 38, no. 1, pp. 1:1–1:25, 2011. http://dx.doi.org/10.1145/2049662.2049663
  28. R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. Philadelphia, PA: SIAM, 1994.
  29. D. Langr, I. Šimeček, and T. Dytrych, “Block iterators for sparse matrices,” in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2016). IEEE Xplore Digital Library, 2016. http://dx.doi.org/10.15439/2016F35 pp. 695–704.
  30. A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan, “Fast sparse matrix-vector multiplication on GPUs for graph applications,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’14. Piscataway, NJ, USA: IEEE Press, 2014. http://dx.doi.org/10.1109/SC.2014.69 pp. 781–792.
  31. X. Yang, S. Parthasarathy, and P. Sadayappan, “Fast sparse matrix-vector multiplication on GPUs: Implications for graph mining,” Proc. VLDB Endow., vol. 4, no. 4, pp. 231–242, 2011. http://dx.doi.org/10.14778/1938545.1938548
  32. “IEEE Standard for Floating-Point Arithmetic,” IEEE Std 754-2008, pp. 1–58, 2008. http://dx.doi.org/10.1109/IEEESTD.2008.4610935