Parallelizing nested loops on the Intel Xeon Phi on the example of the dense WZ factorization

Jarosław Bylina, Beata Bylina

DOI: http://dx.doi.org/10.15439/2016F436

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 655–664 (2016)

Full text

Abstract. In this article we evaluate some strategies of parallelizing nested loops on Intel Xeon Phi on the example of the WZ factorization for dense matrices. We employ both parallelizm and vectorization to accelerate nested loops on manycore coprocessor.

References

R. Blikberg, T. Sørevik, “Load balancing and OpenMP implementation of nested parallelism”, Parallel Computing 31, Elsevier, 2005, pp. 984–998.
B. Bylina, J. Bylina, “Strategies of parallelizing nested loops on the multicore architectures on the example of the WZ factorization for the dense matrices”, Proceedings of the Federated Conference on Computer Science and Information Systems, Annals of Computer Science and Information Systems 5, 2015.
S. Chandra Sekhara Rao, “Existence and uniqueness of WZ factorization”, Parallel Computing 23, (1997), pp. 1129–1139.
T. Cramer, D. Schmidl, M. Klemm, D. Mey, “OpenMP programming on Intel Xeon Phi coprocessors: An early performance comparison, Proc. of the Many-core Applications Research Community Symposium at RWTH Aachen University, (2012), pp. 38–44.
A. Duran, R. Silvera, J. Corbalan, J. Labarta, “Runtime adjustment of parallel nested loops”, Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP, Houston, 2004, pp. 137–147.
D. J. Evans, M. Hatzopoulos, “The parallel solution of linear system”, Int. J. Comp. Math. 7 (1979), pp. 227–238.
A. Jackson, O. Agathokleous, “Dynamic Loop Parallelisation”, https://arxiv.org/abs/1205.2367v1, 10 May 2012.
J. Jeffers,J. Reinders, “Intel Xeon Phi Coprocessor High Performance Programming”, Morgan Kaufmann Publishers Inc, 2013.
R. Rahman, “Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers”,Apress , Berkely, USA, 2013.
A. Sadun, W. W. Hwu: “Executing nested parallel loops on shared-memeory multiprocessors”, Proceedings of the 21st Annual International Conference on Parallel Processing, 1992.
P. Yalamov, D. J. Evans: “The WZ matrix factorization method”, Parallel Computing 21, 1995, pp. 1111–1120.
OpenMP, http://openmp.org/wp/, April 2015.