Block Iterators for Sparse Matrices

Daniel Langr, Ivan Šimeček, Tomáš Dytrych

DOI: http://dx.doi.org/10.15439/2016F35

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 695–704 (2016)

Full text

Abstract. Finding an optimal block size for a given sparse matrix forms an important problem for storage formats that partition matrices into uniformly-sized blocks. Finding a solution to this problem can take a significant amount of time, which, effectively, may negate the benefits that such a format brings into sparse-matrix computations. A key for an efficient solution is the ability to quickly iterate, for a particular block size, over matrix nonzero blocks. This work proposes an efficient parallel algorithm for this task and evaluate it experimentally on modern multi-core and many-core high performance computing (HPC) architectures.

References

D. Langr, I. Šimeček, P. Tvrdík, T. Dytrych, and J. P. Draayer, “Adaptive-blocking hierarchical storage format for sparse matrices,” in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2012). IEEE Xplore Digital Library, 2012, pp. 545–551.
D. Langr, I. Šimeček, and P. Tvrdík, “Storing sparse matrices in the adaptive-blocking hierarchical storage format,” in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2013). IEEE Xplore Digital Library, 2013, pp. 479–486.
D. Langr and P. Tvrdík, “Evaluation criteria for sparse matrix stor- age formats,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 2, pp. 428–440, 2016. http://dx.doi.org/10.1109/TPDS.2015.2401575
A. Fog, “Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs,” 2016, accessed April 8, 2016 at http://www.agner.org/optimize/instruction_tables.pdf.
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. Philadelphia, PA: SIAM, 1994.
Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2003. ISBN 0898715342
T. A. Davis and Y. F. Hu, “The University of Florida Sparse Matrix Collection,” ACM Transactions on Mathematical Software, vol. 38, no. 1, pp. 1:1–1:25, 2011. http://dx.doi.org/10.1145/2049662.2049663
M. Belgin, G. Back, and C. J. Ribbens, “Pattern-based sparse matrix representation for memory-efficient SMVM kernels,” in Proceedings of the 23rd International Conference on Supercomputing, ser. ICS ’09. New York, NY, USA: ACM, 2009. http://dx.doi.org/10.1145/1542275.1542294. ISBN 978-1-60558-498-0 pp. 100–109.
M. Belgin, G. Back, and C. J. Ribbens, “A library for pattern-based sparse matrix vector multiply,” International Journal of Parallel Programming, vol. 39, no. 1, pp. 62–87, 2011. http://dx.doi.org/10.1007/s10766-010-0145-2
A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, “Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks,” in Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures, ser. SPAA ’09. New York, NY, USA: ACM, 2009. http://dx.doi.org/10.1145/1583991.1584053. ISBN 978-1-60558-606-9 pp. 233–244.
A. Buluc, S. Williams, L. Oliker, and J. Demmel, “Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication,” in Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, ser. IPDPS ’11. IEEE Computer Society, 2011. http://dx.doi.org/10.1109/IPDPS.2011.73 pp. 721–733.
J. W. Choi, A. Singh, and R. W. Vuduc, “Model-driven autotuning of sparse matrix-vector multiply on GPUs,” in Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’10. New York, NY, USA: ACM, 2010. http://dx.doi.org/10.1145/1693453.1693471 pp. 115–126.
E.-J. Im and K. Yelick, “Optimizing sparse matrix computations for register reuse in SPARSITY,” in Proceedings of the International Conference on Computational Science (ICCS 2001), Part I, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2001, vol. 2073, pp. 127–136.
E.-J. Im, K. Yelick, and R. Vuduc, “Sparsity: Optimization framework for sparse matrix kernels,” International Journal of High Performance Computing Applications, vol. 18, no. 1, pp. 135–158, 2004. http://dx.doi.org/10.1177/1094342004041296
V. Karakasis, G. Goumas, and N. Koziris, “A comparative study of blocking storage methods for sparse matrices on multicore architectures,” in Computational Science and Engineering, 2009. CSE ’09. International Conference on, vol. 1, Aug 2009. http://dx.doi.org/10.1109/CSE.2009.223 pp. 247–256.
R. Nishtala, R. W. Vuduc, J. W. Demmel, and K. A. Yelick, “Performance modeling and analysis of cache blocking in sparse matrix vector multiply,” University of California, Tech. Rep. UCB/CSD-04-1335, 2004.
——, “When cache blocking of sparse matrix vector multiply works and why,” Applicable Algebra in Engineering, Communication and Computing, vol. 18, no. 3, pp. 297–311, 2007. http://dx.doi.org/10.1007/s00200-007-0038-9
I. Šimeček, D. Langr, and P. Tvrdík, “Space-efficient sparse matrix storage formats for massively parallel systems,” in Proceedings of the 14th IEEE International Conference of High Performance Computing and Communications (HPCC 2012). IEEE Computer Society, 2012. http://dx.doi.org/10.1109/HPCC.2012.18 pp. 54–60.
I. Šimeček and D. Langr, “Space and execution efficient formats for modern processor architectures,” in Proceedings of the 17th Interna- tional Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2015). IEEE Computer Society, 2015. http://dx.doi.org/10.1109/SYNASC.2015.24 pp. 98–105.
F. S. Smailbegovic, G. N. Gaydadjiev, and S. Vassiliadis, “Sparse Matrix Storage Format,” in Proceedings of the 16th Annual Workshop on Circuits, Systems and Signal Processing, ProRisc 2005, 2005, pp. 445–448.
P. Stathis, S. Vassiliadis, and S. Cotofana, “A hierarchical sparse matrix storage format for vector processors,” in Proceedings of the 17th International Symposium on Parallel and Distributed Processing, ser. IPDPS ’03. Washington, DC, USA: IEEE Computer Society, 2003, p. 61.