Accelerating Minimum Cost Polygon Triangulation Code with the TRACO Compiler

Marek Pałkowski; Wlodzimierz Bielecki

Accelerating Minimum Cost Polygon Triangulation Code with the TRACO Compiler

Marek Pałkowski, Wlodzimierz Bielecki

DOI: http://dx.doi.org/10.15439/2018F8

Citation: Communication Papers of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 17, pages 111–114 (2018)

Full text

Abstract. In this paper, we present automatic loop tiling and parallelization for the minimum cost polygon triangulation (MCPT) task. For this purpose, we use the authorial source-to-source TRACO compiler. MCPT is a recursive algorithm encountering each subproblem many times in different branches of its recursion tree. The most intensive computing part is a triple nested polyhedral program loop nest filling a cost table using the MCPT recursive. First, the code is tiled by means of the transitive closure of a dependence graph. TRACO allows for tiling of the innermost loop nest that is not possible by means of other closely related compilers. We tile only the two innermost loops and apply skewing to serialize the outermost one and parallelize the innermost ones. An experimental study carried out on multi-core computers demonstrates considerable speed-up of tiled code, which overcomes that obtained for code generated with the closely related PLuTo compiler based on the affine transformations framework.

References

W. Kelly and W. Pugh, “A framework for unifying reordering transformations,” Univ. of Maryland Institute for Advanced Computer Studies Report No. UMIACS-TR-92-126.1, College Park, MD, USA, Tech. Rep., 1993.
U. Bondhugula et al., “A practical automatic polyhedral parallelizer and locality optimizer,” SIGPLAN Not., vol. 43, no. 6, pp. 101–113, Jun. 2008. http://dx.doi.org/10.1145/1379022.1375595 Http://pluto-compiler.sourceforge.net/.
M. Griebl, “Automatic parallelization of loop programs for distributed memory architectures,” 2004.
F. Irigoin and R. Triolet, “Supernode partitioning,” in Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ser. POPL ’88. New York, NY, USA: ACM, 1988. doi: 10.1145/73560.73588. ISBN 0-89791-252-7 pp. 319–329.
A. Lim, G. I. Cheong, and M. S. Lam, “An affine partitioning algorithm to maximize parallelism and minimize communication,” in In Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing. ACM Press, 1999. http://dx.doi.org/10.1145/305138.305197 pp. 228–237.
J. Xue, “On tiling as a loop transformation,” 1997.
U. Bondhugula, V. Bandishti, and I. Pananilath, “Diamond tiling: Tiling techniques to maximize parallelism for stencil computations,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 5, pp. 1285–1298, May 2017. http://dx.doi.org/10.1109/tpds.2016.2615094
U. Bondhugula, A. Acharya, and A. Cohen, “The pluto+ algorithm: A practical approach for parallelization and locality optimization of affine loop nests,” ACM Trans. Program. Lang. Syst., vol. 38, no. 3, pp. 12:1–12:32, Apr. 2016. http://dx.doi.org/10.1145/2896389
D. G. Wonnacott and M. M. Strout, “On the scalability of loop tiling