A Non-Speculative Parallelization of Reverse Cuthill-McKee Algorithm for Sparse Matrices Reordering

Thiago Nascimento Rodrigues; Maria Claudia Silva Boeres; Lucia Catabriga

A Non-Speculative Parallelization of Reverse Cuthill-McKee Algorithm for Sparse Matrices Reordering

Thiago Nascimento Rodrigues, Maria Claudia Silva Boeres, Lucia Catabriga

DOI: http://dx.doi.org/10.15439/2017F179

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 527–536 (2017)

Full text

Abstract. This work presents a new parallel non-speculative implementation of the Unordered Reverse Cuthill-McKee algorithm. Reordering quality (bandwidth reduction) and reordering performance (CPU time) are evaluated in comparison with a serial implementation of the algorithm made available by the state-of-the-art mathematical software library HSL. The bandwidth reductions reached by our parallel RCM were more than 90\% for several large matrices out of the ones tested, and the time reordering improvement was up to 57.82\%. Speedups higher than 3.0X were achieved with the parallel RCM. The underlying parallelism was supported by the OpenMP framework and three strategies for reducing idle threads were incorporated into the algorithm.

References

Y. Saad, Iterative methods for sparse linear systems, 2nd ed. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2003. ISBN 0898715342
C. H. Papadimitriou, “The np-completeness of the bandwidth minimization problem,” Computing, vol. 16, no. 3, pp. 263–270, 1976. http://dx.doi.org/10.1007/BF02280884. [Online]. Available: http://dx.doi.org/10.1007/BF02280884
E. Cuthill and J. McKee, “Reducing the bandwidth of sparse symmetric matrices,” in Proceedings of the 1969 24th National Conference, ser. ACM ’69. New York, NY, USA: ACM, 1969. http://dx.doi.org/10.1145/800195.805928 pp. 157–172. [Online]. Available: http://doi.acm.org/10.1145/800195.805928
W. Liu and A. H. Sherman, “Comparative analysis of the Cuthill-McKee and the Reverse Cuthill-McKee ordering algorithms for sparse matrices,” SIAM Journal on Numerical Analysis, vol. 13, no. 2, pp. 198–213, May 1974. http://dx.doi.org/10.1137/0713020. [Online]. Available: http://dx.doi.org/10.1137/0713020
S. W. Sloan, “An algorithm for profile and wavefront reduction of sparse matrices,” International Journal for Numerical Methods in Engineering, vol. 23, no. 2, pp. 239–251, 1986. http://dx.doi.org/10.1002/nme.1620230208
N. E. Gibbs, W. G. Poole, and P. K. Stockmeyer, “An algorithm for reducing the bandwidth and profile of a sparse matrix,” SIAM Journal on Numerical Analysis, vol. 13, no. 2, pp. 236–250, 1976. [Online]. Available: http://www.jstor.org/stable/2156090
K. I. Karantasis, A. Lenharth, D. Nguyen, M. Garzarán, and K. Pingali, “Parallelization of reordering algorithms for bandwidth and wavefront reduction,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’14. Piscataway, NJ, USA: IEEE Press, 2014. doi: 10.1109/SC.2014.80. ISBN 978-1-4799-5500-8 pp. 921–932. [Online]. Available: http://dx.doi.org/10.1109/SC.2014.80
D. Padua, Encyclopedia of parallel computing. Springer Publishing Company, Incorporated, 2011. ISBN 0387097651
K. Pingali et al., “The tao of parallelism in algorithms,” in Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’11. New York, NY, USA: ACM, 2011. http://dx.doi.org/10.1145/1993498.1993501. ISBN 978-1-4503-0663-8 pp. 12–25. [Online]. Available: http://doi.acm.org/10.1145/1993498.1993501
M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval, “How much parallelism is there in irregular applications?” SIGPLAN Not., vol. 44, no. 4, pp. 3–14, Feb. 2009. http://dx.doi.org/10.1145/1594835.1504181. [Online]. Available: http://doi.acm.org/10.1145/1594835.1504181
M. Kulkarni et al., “Optimistic parallelism requires abstractions,” in Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’07. New York, NY, USA: ACM, 2007. http://dx.doi.org/10.1145/1250734.1250759. ISBN 978-1-59593-633-2 pp. 211–222. [Online]. Available: http://doi.acm.org/10. 1145/1250734.1250759
L. Dagum and R. Menon, “Openmp: An industry-standard api for shared-memory programming,” IEEE Comput. Sci. Eng., vol. 5, no. 1, pp. 46–55, Jan. 1998. http://dx.doi.org/10.1109/99.660313. [Online]. Available: http://dx.doi.org/10.1109/99.660313
HSL, “A collection of fortran codes for large scale scientific computation,” 2011. [Online]. Available: http://www.hsl.rl.ac.uk/
A. Farzaneh, H. Kheiri, and M. A. Shahmersi, “An efficient storage format for large sparse matrices,” Communications Series A1 Mathematics & Statistics, vol. 58, no. 2, pp. 1–10, Jul. 2009.
J. K. Reid and J. A. Scott, “Ordering symmetric sparse matrices for small profile and wavefront,” International Journal for Numerical Methods in Engineering, vol. 45, pp. 1737–1755, Feb. 1999.
G. K. Kumfert, “An object-oriented algorithmic laboratory for ordering sparse matrices,” Ph.D. dissertation, Lawrence Livermore National Laboratory and United States. Department of Energy and United States. Department of Energy. Office of Scientific and Technical Information, 2000.
I. S. Duff, J. K. Reid, and J. A. Scott, “The use of profile reduction algorithms with a frontal code,” International Journal for Numerical Methods in Engineering, vol. 28, no. 11, pp. 2555–2568, 1989. http://dx.doi.org/10.1002/nme.1620281106. [Online]. Available: http://dx.doi.org/10.1002/nme.1620281106
M. A. Hassaan, M. Burtscher and K. Pingali, “Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms,” in Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’11. New York, NY, USA: ACM, 2011. http://dx.doi.org/10.1145/1941553.1941557. ISBN 978-1-4503-0119-0 pp. 3–12. [Online]. Available: http://doi.acm.org/10.1145/1941553.1941557
B. S. W. Schröder, “Algorithms for the fixed point property,” Theoretical Computer Science, vol. 217, no. 2, pp. 301 – 358, 1999. doi: http://dx.doi.org/10.1016/S0304-3975(98)00273-4. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0304397598002734
D. Chazan and W. Miranker, “Chaotic relaxation,” Linear Algebra and its Applications, vol. 2, no. 2, pp. 199 – 222, 1969. doi: http://dx.doi.org/10.1016/0024-3795(69)90028-7. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0024379569900287
S. Aluru, “Teaching parallel computing through parallel prefix,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC12, 2012. [Online]. Available: http://sc12.supercomputing.org/hpceducator/ParallelPrefix/ParallelPrefix.pdf
T. N. Rodrigues, “tnas/reordering-library: Federated Conference on Computer Science and Information Systems 2017,” May 2017. [Online]. Available: https://doi.org/10.5281/zenodo.570225
T. A. Davis and Y. Hu, “The university of florida sparse matrix collection,” ACM Trans. Math. Softw., vol. 38, no. 1, pp. 1:1–1:25, Dec. 2011. http://dx.doi.org/10.1145/2049662.2049663. [Online]. Available: http://doi.acm.org/10.1145/2049662.2049663
OpenMP Language Working Group, “Openmp technical report 4,” OpenMP Architecture Review Board, Tech. Rep. TR-4 Version 5 Preview 1, 2016.
C. E. Leiserson and T. B. Schardl, “A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers),” in Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, ser. SPAA’10. New York, NY, USA: ACM, 2010. http://dx.doi.org/10.1145/1810479.1810534. ISBN 978-1-4503-0079-7 pp. 303–314. [Online]. Available: http://doi.acm.org/10.1145/1810479.1810534