Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 11

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

A Non-Speculative Parallelization of Reverse Cuthill-McKee Algorithm for Sparse Matrices Reordering

, ,

DOI: http://dx.doi.org/10.15439/2017F179

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 527536 ()

Full text

Abstract. This work presents a new parallel non-speculative implementation of the Unordered Reverse Cuthill-McKee algorithm. Reordering quality (bandwidth reduction) and reordering performance (CPU time) are evaluated in comparison with a serial implementation of the algorithm made available by the state-of-the-art mathematical software library HSL. The bandwidth reductions reached by our parallel RCM were more than 90\% for several large matrices out of the ones tested, and the time reordering improvement was up to 57.82\%. Speedups higher than 3.0X were achieved with the parallel RCM. The underlying parallelism was supported by the OpenMP framework and three strategies for reducing idle threads were incorporated into the algorithm.

References

  1. Y. Saad, Iterative methods for sparse linear systems, 2nd ed. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2003. ISBN 0898715342
  2. C. H. Papadimitriou, “The np-completeness of the bandwidth minimization problem,” Computing, vol. 16, no. 3, pp. 263–270, 1976. http://dx.doi.org/10.1007/BF02280884. [Online]. Available: http://dx.doi.org/10.1007/BF02280884
  3. E. Cuthill and J. McKee, “Reducing the bandwidth of sparse symmetric matrices,” in Proceedings of the 1969 24th National Conference, ser. ACM ’69. New York, NY, USA: ACM, 1969. http://dx.doi.org/10.1145/800195.805928 pp. 157–172. [Online]. Available: http://doi.acm.org/10.1145/800195.805928
  4. W. Liu and A. H. Sherman, “Comparative analysis of the Cuthill-McKee and the Reverse Cuthill-McKee ordering algorithms for sparse matrices,” SIAM Journal on Numerical Analysis, vol. 13, no. 2, pp. 198–213, May 1974. http://dx.doi.org/10.1137/0713020. [Online]. Available: http://dx.doi.org/10.1137/0713020
  5. S. W. Sloan, “An algorithm for profile and wavefront reduction of sparse matrices,” International Journal for Numerical Methods in Engineering, vol. 23, no. 2, pp. 239–251, 1986. http://dx.doi.org/10.1002/nme.1620230208
  6. N. E. Gibbs, W. G. Poole, and P. K. Stockmeyer, “An algorithm for reducing the bandwidth and profile of a sparse matrix,” SIAM Journal on Numerical Analysis, vol. 13, no. 2, pp. 236–250, 1976. [Online]. Available: http://www.jstor.org/stable/2156090
  7. K. I. Karantasis, A. Lenharth, D. Nguyen, M. Garzarán, and K. Pingali, “Parallelization of reordering algorithms for bandwidth and wavefront reduction,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’14. Piscataway, NJ, USA: IEEE Press, 2014. doi: 10.1109/SC.2014.80. ISBN 978-1-4799-5500-8 pp. 921–932. [Online]. Available: http://dx.doi.org/10.1109/SC.2014.80
  8. D. Padua, Encyclopedia of parallel computing. Springer Publishing Company, Incorporated, 2011. ISBN 0387097651
  9. K. Pingali et al., “The tao of parallelism in algorithms,” in Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’11. New York, NY, USA: ACM, 2011. http://dx.doi.org/10.1145/1993498.1993501. ISBN 978-1-4503-0663-8 pp. 12–25. [Online]. Available: http://doi.acm.org/10.1145/1993498.1993501
  10. M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval, “How much parallelism is there in irregular applications?” SIGPLAN Not., vol. 44, no. 4, pp. 3–14, Feb. 2009. http://dx.doi.org/10.1145/1594835.1504181. [Online]. Available: http://doi.acm.org/10.1145/1594835.1504181
  11. M. Kulkarni et al., “Optimistic parallelism requires abstractions,” in Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’07. New York, NY, USA: ACM, 2007. http://dx.doi.org/10.1145/1250734.1250759. ISBN 978-1-59593-633-2 pp. 211–222. [Online]. Available: http://doi.acm.org/10. 1145/1250734.1250759
  12. L. Dagum and R. Menon, “Openmp: An industry-standard api for shared-memory programming,” IEEE Comput. Sci. Eng., vol. 5, no. 1, pp. 46–55, Jan. 1998. http://dx.doi.org/10.1109/99.660313. [Online]. Available: http://dx.doi.org/10.1109/99.660313
  13. HSL, “A collection of fortran codes for large scale scientific computation,” 2011. [Online]. Available: http://www.hsl.rl.ac.uk/
  14. A. Farzaneh, H. Kheiri, and M. A. Shahmersi, “An efficient storage format for large sparse matrices,” Communications Series A1 Mathematics & Statistics, vol. 58, no. 2, pp. 1–10, Jul. 2009.
  15. J. K. Reid and J. A. Scott, “Ordering symmetric sparse matrices for small profile and wavefront,” International Journal for Numerical Methods in Engineering, vol. 45, pp. 1737–1755, Feb. 1999.
  16. G. K. Kumfert, “An object-oriented algorithmic laboratory for ordering sparse matrices,” Ph.D. dissertation, Lawrence Livermore National Laboratory and United States. Department of Energy and United States. Department of Energy. Office of Scientific and Technical Information, 2000.
  17. I. S. Duff, J. K. Reid, and J. A. Scott, “The use of profile reduction algorithms with a frontal code,” International Journal for Numerical Methods in Engineering, vol. 28, no. 11, pp. 2555–2568, 1989. http://dx.doi.org/10.1002/nme.1620281106. [Online]. Available: http://dx.doi.org/10.1002/nme.1620281106
  18. M. A. Hassaan, M. Burtscher and K. Pingali, “Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms,” in Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, ser. PPoPP ’11. New York, NY, USA: ACM, 2011. http://dx.doi.org/10.1145/1941553.1941557. ISBN 978-1-4503-0119-0 pp. 3–12. [Online]. Available: http://doi.acm.org/10.1145/1941553.1941557
  19. B. S. W. Schröder, “Algorithms for the fixed point property,” Theoretical Computer Science, vol. 217, no. 2, pp. 301 – 358, 1999. doi: http://dx.doi.org/10.1016/S0304-3975(98)00273-4. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0304397598002734
  20. D. Chazan and W. Miranker, “Chaotic relaxation,” Linear Algebra and its Applications, vol. 2, no. 2, pp. 199 – 222, 1969. doi: http://dx.doi.org/10.1016/0024-3795(69)90028-7. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0024379569900287
  21. S. Aluru, “Teaching parallel computing through parallel prefix,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC12, 2012. [Online]. Available: http://sc12.supercomputing.org/hpceducator/ParallelPrefix/ParallelPrefix.pdf
  22. T. N. Rodrigues, “tnas/reordering-library: Federated Conference on Computer Science and Information Systems 2017,” May 2017. [Online]. Available: https://doi.org/10.5281/zenodo.570225
  23. T. A. Davis and Y. Hu, “The university of florida sparse matrix collection,” ACM Trans. Math. Softw., vol. 38, no. 1, pp. 1:1–1:25, Dec. 2011. http://dx.doi.org/10.1145/2049662.2049663. [Online]. Available: http://doi.acm.org/10.1145/2049662.2049663
  24. OpenMP Language Working Group, “Openmp technical report 4,” OpenMP Architecture Review Board, Tech. Rep. TR-4 Version 5 Preview 1, 2016.
  25. C. E. Leiserson and T. B. Schardl, “A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers),” in Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, ser. SPAA’10. New York, NY, USA: ACM, 2010. http://dx.doi.org/10.1145/1810479.1810534. ISBN 978-1-4503-0079-7 pp. 303–314. [Online]. Available: http://doi.acm.org/10.1145/1810479.1810534