Logo PTI Logo FedCSIS

Proceedings of the 18th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 35

Resilient s-ACD for Asynchronous Collaborative Solutions of Systems of Linear Equations

, , , , ,

DOI: http://dx.doi.org/10.15439/2023F8932

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 441450 ()

Full text

Abstract. Solving systems of linear equations is a critical component of nearly all scientific computing methods. Traditional algorithms that rely on synchronization become prohibitively expensive in computing paradigms where communication is costly, such as heterogeneous hardware, edge computing, and unreliable environments. In this paper, we introduce an s-step Approximate Conjugate Directions (s-ACD) method and develop resiliency measures that can address a variety of different data error scenarios. This method leverages a Conjugate Gradient (CG) approach locally while using Conjugate Directions (CD) globally to achieve asynchronicity. We demonstrate with numerical experiments that s-ACD admits scaling with respect to the condition number that is comparable with CG on the tested 2D Poisson problem. Furthermore, through the addition of resiliency measures, our method is able to cope with data errors, allowing it to be used effectively in unreliable environments.

References

  1. Y. Saad, Iterative methods for sparse linear systems. SIAM, 2003.
  2. C. T. Kelley, Iterative methods for linear and nonlinear equations. SIAM, 1995.
  3. I. S. Duff, A. M. Erisman, and J. K. Reid, Direct methods for sparse matrices. Oxford University Press, 2017.
  4. T. A. Davis, Direct methods for sparse linear systems. SIAM, 2006.
  5. J. J. Dongarra, I. S. Duff, D. C. Sorensen, H. A. Van der Vorst, and others, Solving linear systems on vector and shared memory computers. SIAM Philadelphia, 1991.
  6. J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. Van der Vorst, Numerical linear algebra for high-performance computers. SIAM, 1998.
  7. Y. Saad, “Krylov subspace methods on supercomputers,” SIAM Journal on Scientific and Statistical Computing, vol. 10, no. 6, pp. 1200–1232, 1989. https://doi.org/10.1137/0910073
  8. C. Lanczos, “Solution of systems of linear equations by minimized iterations,” J. Res. Nat. Bur. Standards, vol. 49, no. 1, pp. 33–53, 1952.
  9. M. R. Hestenes, E. Stiefel, and others, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, vol. 49, no. 6, pp. 409–436, 1952.
  10. C. Ponce, K. Harter, A. Fox, and C. Vogl, “Skywing,” [Computer Software] https://doi.org/10.11578/dc.20221110.2, nov 2022. [Online]. Available: https://doi.org/10.11578/dc.20221110.2
  11. E. C. Carson, “An adaptive s-step conjugate gradient algorithm with dynamic basis updating,” Applications of Mathematics, vol. 65, no. 2, pp. 123–151, 2020. https://doi.org/10.21136/AM.2020.0136-19
  12. A. Chronopoulos and C. Gear, “s-step iterative methods for symmetric linear systems,” Journal of Computational and Applied Mathematics, vol. 25, no. 2, pp. 153–168, 1989. https://doi.org/10.1016/0377-0427(89)90045-9. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0377042789900459
  13. S. Cools, J. Cornelis, P. Ghysels, and W. Vanroose, “Improving strong scaling of the conjugate gradient method for solving large linear systems using global reduction pipelining,” arXiv preprint https://arxiv.org/abs/1905.06850, 2019. https://doi.org/10.48550/arXiv.1905.06850
  14. P. R. Eller and W. Gropp, “Scalable non-blocking preconditioned conjugate gradient methods,” in SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2016. https://doi.org/10.1109/SC.2016.17 pp. 204–215.
  15. M. Tiwari and S. Vadhiyar, “Pipelined Preconditioned s-step Conjugate Gradient Methods for Distributed Memory Systems,” in 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2021. https://doi.org/10.1109/Cluster48925.2021.00061 pp. 215–225.
  16. M. Shantharam, S. Srinivasmurthy, and P. Raghavan, “Fault tolerant preconditioned conjugate gradient for sparse linear system solution,” in Proceedings of the 26th ACM international conference on Supercomputing, 2012. https://doi.org/10.1145/2304576.2304588 pp. 69–78.
  17. M. E. Ozturk, M. Renardy, Y. Li, G. Agrawal, and C.-S. Chou, “A Novel Approach for Handling Soft Error in Conjugate Gradients,” in 2018 IEEE 25th International Conference on High Performance Computing (HiPC), 2018. http://dx.doi.org/10.1109/HiPC.2018.00030 pp. 193–202.
  18. A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, “Efficient online fault-tolerance for the preconditioned conjugate gradient method,” in 2015 IEEE 21st International On-Line Testing Symposium (IOLTS), 2015. http://dx.doi.org/10.1109/IOLTS.2015.7229839 pp. 95–100.
  19. D. Chazan and W. Miranker, “Chaotic relaxation,” Linear algebra and its applications, vol. 2, no. 2, pp. 199–222, 1969. https://doi.org/10.1016/0024-3795(69)90028-7
  20. A. Frommer and D. B. Szyld, “On asynchronous iterations,” Journal of computational and applied mathematics, vol. 123, no. 1-2, pp. 201–216, 2000. https://doi.org/10.1016/S0377-0427(00)00409-X
  21. J. M. Bahi, S. Contassot-Vivier, and R. Couturier, Parallel iterative algorithms: from sequential to grid computing. CRC Press, 2007.
  22. J. Hook and N. Dingle, “Performance analysis of asynchronous parallel Jacobi,” Numerical Algorithms, vol. 77, pp. 831–866, 2018. http://dx.doi.org/https://doi.org/10.1007/s11075-017-0342-9
  23. J. Wolfson-Pou and E. Chow, “Convergence models and surprising results for the asynchronous Jacobi method,” in 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2018. http://dx.doi.org/10.1109/IPDPS.2018.00103 pp. 940–949.
  24. ——, “Modeling the asynchronous Jacobi method without communication delays,” Journal of Parallel and Distributed Computing, vol. 128, pp. 84–98, 2019. https://doi.org/10.1016/j.jpdc.2019.02.002. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0743731518304751
  25. H. Anzt, J. Dongarra, and E. S. Quintana-Ortí, “Tuning stationary iterative solvers for fault resilience,” in Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015. https://doi.org/10.1145/2832080.2832081 pp. 1–8.
  26. Apache Software Foundation, “Hadoop,” 2021, version Number: 3.3.03. [Online]. Available: https://hadoop.apache.org
  27. ——, “Spark,” 2021, version Number: 3.3.0. [Online]. Available: https://spark.apache.org
  28. A. Moody, G. Bronevetsky, K. Mohror, and B. R. De Supinski, “Design, modeling, and evaluation of a scalable multi-level checkpointing system,” in SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2010. https://doi.org/10.1109/SC.2010.18 pp. 1–11.