Parallel cache-efficient code for computing the McCaskill partition functions

Marek Pałkowski; Włodzimierz Bielecki

Parallel cache-efficient code for computing the McCaskill partition functions

Marek Pałkowski, Włodzimierz Bielecki

DOI: http://dx.doi.org/10.15439/2019F8

Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 207–210 (2019)

Full text

Abstract. We present parallel tiled optimized McCaskill's partition functions computation code. That CPU and memory intensive dynamic programming task is within computational biology. To optimize code, we use the authorial source-to-source TRACO compiler and compare obtained code performance with that generated with the state-of-the-art PluTo compiler based on the affine transformations framework (ATF). For the considered task, PluTo is able to generate only serial highly cache efficient code without any parallelism. A TRACO tiling and parallelizing strategy uses the transitive closure of a dependence graph to avoid affine function calculation. First, for each loop nest statement, rectangular tiles are formed. Then those tiles are corrected to be valid under lexicographical order if necessary. A correction is carried out by means of applying transitive closure. The validity of tiles guarantees that the inter-tile dependence graph is acyclic. So, a valid schedule for target tiles can be derived and applied to generate parallel tiled code. For this purpose, the ISL scheduler is used. An experimental study carried out on a multi-core computer demonstrates considerable speed-up of generated code for the larger number of threads. Generated parallel tiled code overcomes that generated with the PluTo compiler.

References

U. Bondhugula, V. Bandishti, and I. Pananilath, “Diamond tiling: Tiling techniques to maximize parallelism for stencil computations,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 5, pp. 1285–1298, May 2017. http://dx.doi.org/10.1109/tpds.2016.2615094
U. Bondhugula, A. Acharya, and A. Cohen, “The pluto+ algorithm: A practical approach for parallelization and locality optimization of affine loop nests,” ACM Trans. Program. Lang. Syst., vol. 38, no. 3, pp. 12:1–12:32, Apr. 2016. http://dx.doi.org/10.1145/2896389
U. Bondhugula et al., “A practical automatic polyhedral parallelizer and locality optimizer,” SIGPLAN Not., vol. 43, no. 6, pp. 101–113, Jun. 2008. http://dx.doi.org/10.1145/1379022.1375595 Http://pluto-compiler.sourceforge.net/.
M. Palkowski and W. Bielecki, “Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing,” BMC Bioinformatics, vol. 18, no. 1, p. 290, 2017. doi: 10.1186/s12859-017-1707-8
D. Wonnacott, T. Jin, and A. Lake, “Automatic tiling of “mostly-tileable” loop nests,” in 5th International Workshop on Polyhedral Compilation Techniques, Amsterdam, 2015.
R. T. Mullapudi and U. Bondhugula, “Tiling for dynamic scheduling,” in Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Vienna, Austria, Jan. 2014.
M. Fekete, I. L. Hofacker, and P. F. Stadler, “Prediction of rna base pairing probabilities on massively parallel computers,” Journal of Computational Biology, vol. 7, no. 1-2, pp. 171–182, 2000. doi: 10.1089/10665270050081441
J. Li, S. Ranka, and S. Sahni, “Multicore and GPU algorithms for Nussinov RNA folding,” BMC Bioinformatics, vol. 15, no. 8, p. S1, 2014. http://dx.doi.org/10.1186/1471-2105-15-S8-S1. [Online]. Available: http://dx.doi.org/10.1186/1471-2105-15-S8-S1
M. Palkowski and W. Bielecki, “Tuning iteration space slicing based tiled multi-core code implementing Nussinov’s rna folding,” BMC Bioinformatics, vol. 19, no. 1, p. 12, Jan 2018.
C. Zhao and S. Sahni, “Cache and energy efficient algorithms for nussinov’s rna folding,” BMC Bioinformatics, vol. 18, no. 15, p. 518, Dec 2017.
W.Bielecki and M. Palkowski, “Tiling of arbitrarily nested loops by means of the transitive closure of dependence graphs,” International Journal of Applied Mathematics and Computer Science (AMCS), vol. Vol. 26, no. 4, p. 919939, December 2016. http://dx.doi.org/10.1515/amcs-2016-0065
S. Verdoolaege, “Integer set library - manual,” Tech. Rep., 2011. [Online]. Available: www.kotnet.org/~skimo//isl/manual.pdf,
W. Bielecki and M. Palkowski, “A parallelizing and optimizing compiler - traco,” 2013. [Online]. Available: http://traco.sourceforge.net
W. Pugh and D. Wonnacott, “An exact method for analysis of value-based array data dependences,” in Sixth Annual Workshop on Programming Languages and Compilers for Parallel Computing. Springer-Verlag, 1993.
M. Raden, S. M. Ali, O. S. Alkhnbashi, A. Busch, F. Costa, J. A. Davis, F. Eggenhofer, R. Gelhausen, J. Georg, S. Heyne, M. Hiller, K. Kundu, R. Kleinkauf, S. C. Lott, M. M. Mohamed, A. Mattheis, M. Miladi, A. S. Richter, S. Will, J. Wolff, P. R. Wright, and R. Backofen, “Freiburg RNA tools: a central online resource for RNA-focused research and teaching,” Nucleic Acids Research, vol. 46, no. W1, pp. W25–W29, 2018. doi: 10.1093/nar/gky329
J. S. McCaskill, “The equilibrium partition function and base pair binding probabilities for rna secondary structure,” Biopolymers, vol. 29, no. 6-7, pp. 1105–1119. http://dx.doi.org/10.1002/bip.360290621. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/bip. 360290621
W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott, New User Interface for Petit and Other Extensions, 1996.
W. Bielecki, K. Kraska, and T. Klimek, “Using basis dependence distance vectors in the modified Floyd–Warshall algorithm,” Journal of Combinatorial Optimization, vol. 30, no. 2, pp. 253–275, 2014.
P. Feautrier, “Some efficient solutions to the affine scheduling problem: I. one-dimensional time,” Int. J. Parallel Program., vol. 21, no. 5, pp. 313–348, 1992.
——, “Some efficient solutions to the affine scheduling problem: II. multidimensional time,” Int. J. Parallel Program., vol. 21, no. 5, pp. 389–420, 1992.
M. Wolfe, “Loops skewing: The wavefront method revisited,” International Journal of Parallel Programming, vol. 15, no. 4, pp. 279–293, 1986.
OpenMP Architecture Review Board, “OpenMP application program interface version 4.0,” 2012. [Online]. Available: http://www.openmp.org/mp-documents/OpenMP4.0RC1 final.pdf