Optimizing Numerical Code by means of the Transitive Closure of Dependence Graphs

Marek Palkowski; Wlodzimierz Bielecki

Optimizing Numerical Code by means of the Transitive Closure of Dependence Graphs

Marek Palkowski, Wlodzimierz Bielecki

DOI: http://dx.doi.org/10.15439/2017F19

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 523–526 (2017)

Full text

Abstract. A significant challenge in numerical programming modern architectures is to effectively exploit the parallelism available in the architecture and manage the fast memories to maximize performance. Loop nest tiling allows for both coarsening parallel code and improving code locality. In this paper, we explore a new way to generate tiled code and derive the free schedule of tiles. It is based on the transitive closure of loop nest dependence graphs. Tiles are executed as soon as their operands are available. To describe and implement the approach, loop dependences are presented in the form of tuple relations. Discussed techniques are implemented in the open source TRACO compiler. Experimental results carried out on multi-core architectures demonstrate the considerable speed-ups of tiled numerical codes generated by the presented approach.

References

U. Bondhugula et al., “A practical automatic polyhedral parallelizer and locality optimizer,” SIGPLAN Not., vol. 43, no. 6, pp. 101–113, Jun. 2008. http://dx.doi.org/10.1145/1379022.1375595
M. Griebl, “Automatic parallelization of loop programs for distributed memory architectures,” Habilitation thesis, Department of Informatics and Mathematics, University of Passau., 2004. [Online]. Available: http://www.fim.unipassau.de/cl/publications/docs/Gri04.pdf
F. Irigoin and R. Triolet, “Supernode partitioning,” in Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ser. POPL ’88. New York, NY, USA: ACM, 1988. http://dx.doi.org/10.1145/73560.73588 pp. 319–329.
A. Lim, G. I. Cheong, and M. S. Lam, “An affine partitioning algorithm to maximize parallelism and minimize communication,” in Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing. ACM Press, 1999. http://dx.doi.org/10.1145/305138.305197 pp. 228–237.
J. Xue, “On tiling as a loop transformation,” Parallel Processing Letters, vol. 7, no. 04, pp. 409–424, 1997. http://dx.doi.org/10.1142/s0129626497000401
D. G. Wonnacott and M. M. Strout, “On the scalability of loop tiling techniques,” in Proceedings of the 3rd International Workshop on Polyhedral Compilation Techniques (IMPACT), January 2013.
R. T. Mullapudi and U. Bondhugula, “Tiling for dynamic scheduling,” in Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Vienna, Austria, Jan. 2014.
D. Wonnacott, T. Jin, and A. Lake, “Automatic tiling of “mostly-tileable” loop nests,” in 5th International Workshop on Polyhedral Compilation Techniques, Amsterdam, 2015.
W.Bielecki and M. Palkowski, “Tiling arbitrarily nested loops by means of the transitive closure of dependence graphs,” International Journal of Applied Mathematics and Computer Science (AMCS), vol. Vol. 26, no. 4, pp. 919–939, December 2016. http://dx.doi.org/10.1515/amcs-2016-0065
W. Pugh and E. Rosser, “Iteration space slicing and its application to communication optimization,” in International Conference on Supercomputing, 1997. http://dx.doi.org/10.1145/263580.263637 pp. 221–228.
W. Bielecki, M. Palkowski, and T. Klimek, “Free scheduling of tiles based on the transitive closure of dependence graphs,” vol. 11TH International Conference on Parallel Processing and Applied Mathematics, Part II, LNCS 9574 proceedings, 2015. http://dx.doi.org/10.1007/978-3-319-32152-3.13 pp. 133–142.
S. Verdoolaege, “Integer set library - manual,” Tech. Rep., 2011. [Online]. Available: www.kotnet.org/~skimo//isl/manual.pdf,
W. Kelly and W. Pugh, “A framework for unifying reordering transformations,” Univ. of Maryland Institute for Advanced Computer Studies Report No. UMIACS-TR-92-126.1, College Park, MD, USA, Tech. Rep., 1993.
W. Bielecki and M. Palkowski, Facing the Multicore - Challenge II: Aspects of New Paradigms and Technologies in Parallel Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, ch. Using Free Scheduling for Programming Graphic Cards, pp. 72–83.
W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott, New User Interface for Petit and Other Extensions, 1996.
W. Bielecki, T. Klimek, M. Palkowski, and A. Beletska, “An iterative algorithm of computing the transitive closure of a union of parameterized affine integer tuple relations,” in COCOA 2010: LNCS, vol. 6508/2010, 2010. http://dx.doi.org/10.1007/978-3-642-17458-2.10 pp. 104–113.
C. Lengauer, Loop parallelization in the polytope model. CONCUR’93, Springer Berlin Heidelberg, 1993, pp. 398–416.
C. Bastoul, “Code generation in the polyhedral model is easier than you think,” in PACT’13 IEEE Int. Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, 2004. http://dx.doi.org/10.1109/pact.2004.1342537 pp. 7–16.
T. Grosser, S. Verdoolaege, and A. Cohen, “Polyhedral ast generation is more than scanning polyhedra,” ACM Trans. Program. Lang. Syst., vol. 37, no. 4, pp. 12:1–12:50, Jul. 2015. http://dx.doi.org/10.1145/2743016
J. Bylina and B. Bylina, “Parallelizing nested loops on the Intel Xeon Phi on the example of the dense WZ factorization,” in 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), Sept 2016. http://dx.doi.org/10.15439/2016f436 pp. 655–664.