Efficient parallel evaluation of block properties of sparse matrices

Ivan Šimeček, Daniel Langr

DOI: http://dx.doi.org/10.15439/2016F366

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 709–716 (2016)

Full text

Abstract. Many storage formats for sparse matrices have been developed. Majority of these formats can be parametrized, so the algorithm for finding optimal parameters is crucial. For overall efficiency, it is important to reduce the execution time of this preprocessing. In this paper, we propose a new algorithm for the determination of the number of nonzero blocks of the given size in a sparse matrix. The proposed algorithm requires relatively a small amount of auxiliary memory. Our approach is based on the Morton reordering and bitwise manipulations. We also present a parallel (multithreaded) version and evaluate its performance and space complexity.

References

G. H. Golub and C. F. Van Loan, Matrix Computations (3rd ed.). Baltimore: Johns Hopkins, 1996.
Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2003.
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. Philadelphia, PA: SIAM, 1994.
I. Šimeček and P. Tvrdı́k, “A new approach for accelerating the sparse matrix-vector multiplication,” in Proceedings of 8th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC ’06). Los Alamitos: IEEE Computer Society, 2006, pp. 156–163. [Online]. Available: http://dl.acm.org/citation.cfm?id=1264261
I. Šimeček and P. Tvrdı́k, “Sparse matrix-vector multiplication — final solution?” in Parallel Processing and Applied Mathematics, ser. PPAM’07, vol. 4967. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 156–165. [Online]. Available: http://www.springerlink.com/content/48x1345471067304/
D. Langr and P. Tvrdı́k, “Evaluation criteria for sparse matrix storage formats,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 2, pp. 428–440, 2016.
B. Bylina, J. Bylina, P. Stpiczyński, and D. Szałkowski, “Performance analysis of multicore and multinodal implementation of spmv operation,” in Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. P. M. Ganzha, L. Maciaszek, Ed., vol. 2. IEEE, 2014, pp. pages 569–576. [Online]. Available: http://dx.doi.org/10.15439/2014F313
D. Langr, I. Šimeček, P. Tvrdı́k, T. Dytrych, and J. P. Draayer, “Adaptive-blocking hierarchical storage format for sparse matrices,” in Federated Conference on Computer Science and Information Systems (FedCSIS). 345 E 47TH ST, NEW YORK, NY 10017 USA: IEEE Xplore Digital Library, September 2012, pp. 545–551.
D. Langr, I. Šimeček, and P. Tvrdı́k, “Storing sparse matrices in the adaptive-blocking hierarchical storage format,” in Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2013). IEEE Xplore Digital Library, September 2013, pp. 479–486.
I. Šimeček, D. Langr, and P. Tvrdı́k, “Space-efficient sparse matrix storage formats for massively parallel systems,” in High Performance Computing and Communication and 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), ser. HPCC’12, Liverpool, Great Britain, june 2012, pp. 54–60.
I. Šimeček and D. Langr, “Space and execution efficient formats for modern processor architectures,” in 2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Sept 2015, pp. 98–105.
E. Im, Optimizing the Performance of Sparse Matrix-Vector Multiplication—dissertation thesis, University of California at Berkeley, 2001.
M. Martone, M. Paprzycki, and S. Filippone, “An improved sparse matrix-vector multiply based on recursive sparse blocks layout,” in Large-Scale Scientific Computing, ser. Lecture Notes in Computer Science, I. Lirkov, S. Margenov, and J. Waśniewski, Eds. Springer Berlin Heidelberg, 2012, vol. 7116, pp. 606–613. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-29843-1_69
P. Tvrdı́k and I. Šimeček, “A new diagonal blocking format and model of cache behavior for sparse matrices,” in Proceedings of the 6th International Conference on Parallel Processing and Applied Mathematics, ser. PPAM’05, vol. 12, no. 4. Poznan, Poland: Springer-Verlag, 2005, pp. 164–171. [Online]. Available: http://dl.acm.org/citation.cfm?id=2096870.2096894
I. Šimeček and D. Langr, “Space-efficient sparse matrix storage formats with 8-bit indices,” in Seminar on Numerical Analysis. Liberec: Technical University of Liberec, 2012, pp. 161–164. [Online]. Available: http://shimi.webzdarma.cz/vyzkum/sna12/article.pdf
M. Martone et al., “On the usage of 16 bit indices in recursively stored sparse matrices,” Symbolic and Numeric Algorithms for Scientific Computing, vol. 0, pp. 57–64, 2010.
M. Martone et al., “Use of hybrid recursive CSR/COO data structures in sparse matrices-vector multiplication,” in Proceedings of the International Multiconference on Computer Science and Information Technology, Wisla, Poland, October 2010.
G. M. Morton, A computer Oriented Geodetic Data Base; and a New Technique in File Sequencing. IBM Ltd., 1966.
OpenMP Architecture Review Board, “Openmp application program interface,” online, 2013. [Online]. Available: http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
J. Singler and B. Konsik, “The gnu libstdc++ parallel mode: Software engineering considerations,” in Proceedings of the 1st International Workshop on Multicore Software Engineering, ser. IWMSE ’08. New York, NY, USA: ACM, 2008, pp. 15–22. [Online]. Available: http://doi.acm.org/10.1145/1370082.1370089
D. Langr, “Parallel multi-array in-place sort with openmp.” [Online]. Available: https://github.com/DanielLangr/AQsort
J. Cáceres, B. Barán, and C. Schaerer, “Implementation of a distributed parallel in time scheme using petsc for a parabolic optimal control problem,” in Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. P. M. Ganzha, L. Maciaszek, Ed., vol. 2. IEEE, 2014, pp. pages 577–586. [Online]. Available: http://dx.doi.org/10.15439/2014F340
S. Fialko, “Parallel finite element solver for multi-core computers,” in Computer Science and Information Systems (FedCSIS), 2012 Federated Conference on, Sept 2012, pp. 525–532.
T. A. Davis, “The university of florida sparse matrix collection,” NA DIGEST, vol. 92, 1994.