On the Autotuning Potential of Time-stepping methods from Scientific Computing

Natalia Kalinnik; Robert Kiesel; Thomas Rauber; Marcel Richter; Gudula Rünger

On the Autotuning Potential of Time-stepping methods from Scientific Computing

Natalia Kalinnik, Robert Kiesel, Thomas Rauber, Marcel Richter, Gudula Rünger

DOI: http://dx.doi.org/10.15439/2018F169

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 329–338 (2018)

Full text

Abstract. Due to the ever changing characteristics of the newly provided hardware, there is the permanent requirement of designing and re-designing software adequately to meet the basic hardware conditions. Especially for well-established software, easy portability of the functional as well as the non-functional properties, such as runtime performance or energy efficiency, would be beneficial, so that the software adapts automatically to the given hardware conditions. In this article, we explore the autotuning potential of several methods from scientific computing. In particular, we consider time-stepping methods and investigate the effect of relevant tuning parameters of the different methods. We also address the question, whether offline or online autotuning approaches are appropriate for the specific method. The methods from scientific computing considered are particle simulation methods, solution methods for differential equations, as well as sparse matrix computations.

References

R. Whaley, A. Petitet, and J. Dongarra, “Automated empirical optimizations of software and the ATLAS project,” Parallel Computing, vol. 27, no. 1-2, pp. 3–35, 2001. http://dx.doi.org/10.1016/S0167-8191(00)00087-9
J. Bilmes, K. Asanovic, C. Chin, and J. Demmel, “Optimizing Matrix Multiply Using PHiPAC: A Portable, High-performance, ANSI C Coding Methodology,” in Proc. of the 11th Int. Conf. on Supercomputing, ser. ICS ’97. New York, NY, USA: ACM, 1997. http://dx.doi.org/10.1145/263580.263662 pp. 340–347.
A. Tiwari and J. K. Hollingsworth, “Online adaptive code generation and tuning,” in Proc. of the 2011 IEEE Int. Parallel & Distributed Processing Symp. (IPDPS 2011). IEEE, 2011. http://dx.doi.org/10.1109/IPDPS.2011.86 pp. 879–892.
M. Gerndt, E. César, and S. Benkner, Eds., Automatic Tuning of HPC Applications – The Periscope Tuning Framework. Shaker Verlag, 2015, http://dx.doi.org/10.2370/9783844035179.
M. Wolfe, “Compilers and More: What Makes Performance Portable?” April 19, 2016.
P. Phothilimthana, J. Ansel, J. Ragan-Kelley, and S. Amarasinghe, “Portable Performance on Heterogeneous Architectures,” in Proc. of the Eighteenth Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’13). ACM, 2013. http://dx.doi.org/10.1145/2451116.2451162 pp. 431–444.
J. Aseltine, A. Mancini, and C. Sarture, “A survey of adaptive control systems,” IRE Transactions on Automatic Control, vol. 6, no. 1, pp. 102–108, Dec 1958. doi: 10.1109/TAC.1958.1105168
D. BAA-98-12, “DARPA Broad Agency Announcement on Self Adaptive Software,” 1997.
G. Hager, J. Treibig, J. Habich, and G. Wellein, “Exploring performance and power properties of modern multi-core chips via simple machine models,” Concurrency and Computation: Practice and Experience, vol. 28, pp. 189–210, 2016. http://dx.doi.org/10.1002/cpe.3180
M. Frigo and S. Johnson, “The design and implementation of FFTW3,” Proc. of the IEEE, vol. 93, no. 2, pp. 216–231, 2005. http://dx.doi.org/10.1109/JPROC.2004.840301 Special issue on “Program Generation, Optimization, and Platform Adaptation”.
F. de Mesmay, Y. Voronenko, and M. Püschel, “Offline library adaptation using automatically generated heuristics,” in Int. Parallel and Distributed Processing Symp. (IPDPS), 2010. http://dx.doi.org/10.1109/IPDPS.2010.5470479
R. Vuduc, J. Demmel, and K. Yelick, “OSKI: A library of automatically tuned sparse matrix kernels,” in Institute of Physics Publishing, vol. 16, 2005. http://dx.doi.org/10.1088/1742-6596/16/1/071
V. Eijkhout and E. Fuentes, “Machine learning for multi-stage selection of numerical methods,” in New Advances in Machine Learning. INTECH, feb 2010, pp. 117–136, http://dx.doi.org/10.5772/9376.
S. Williams, A. Waterman, and D. Patterson, “Roofline: An insightful visual performance model for multicore architectures,” Commun. ACM, vol. 52, no. 4, pp. 65–76, Apr. 2009. http://dx.doi.org/10.1145/1498765.1498785
R. W. Hockney and J. W. Eastwood, Computer Simulation Using Particles. Bristol, PA, USA: Taylor & Francis, Inc., 1988, http://dx.doi.org/10.1137/1025102.
L. Greengard, The Rapid Evaluation of Potential Fields in Particle Systems. Boston: MIT Press, 1988.
T. A. Davis and Y. Hu, “The university of florida sparse matrix collection,” ACM Trans. Math. Softw., vol. 38, no. 1, pp. 1:1–1:25, Dec. 2011. http://dx.doi.org/10.1145/2049662.2049663
J. Hofmann, J. Eitzinger, and D. Fey, “Execution-Cache-Memory Performance Model: Introduction and Validation,” ArXiv e-prints, Sep. 2015.
S. Lee and R. Eigenmann, “Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems,” in Proceedings of the 22Nd Annual International Conference on Supercomputing, ser. ICS ’08. ACM, 2008. http://dx.doi.org/10.1145/1375527.1375558 pp. 195–204.
E. Hairer, S. Nørsett, and G. Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems. Berlin: Springer–Verlag, 1993, http://dx.doi.org/10.1137/1032091.
M. Korch and T. Rauber, “Locality optimized shared-memory implementations of iterated Runge-Kutta methods,” in Euro-Par 2007. Parallel Processing, ser. Springer LNCS, vol. 4641. Springer, 2007. http://dx.doi.org/10.1007/978-3-540-74466-5 78 pp. 737–747.
J. Ansel, “Autotuning programs with algorithmic choice,” Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, MA, Feb. 2014. [Online]. Available: http://groups.csail.mit.edu/commit/papers/2014/ansel-phd-thesis.pdf
M. Püschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo, “SPIRAL: Code generation for DSP transforms,” Proc. of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol. 93, no. 2, pp. 232– 275, 2005. http://dx.doi.org/10.1109/jproc.2004.840306
M. Christen, O. Schenk, and H. Burkhart, “PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures,” in Proc. of the 25th IEEE Int. Parallel and Distributed Processing Symp., May 2011. http://dx.doi.org/10.1109/IPDPS.2011.70
Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson, “The Pochoir stencil compiler,” in Proc. of the Twenty-third Annual ACM Symp. on Parallelism in Algorithms and Architectures (SPAA ’11). ACM, 2011. doi: 10.1145/1989493.1989508 pp. 117–128.
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe, “Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines,” in Proc. of the 34th ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI’13), 2013. http://dx.doi.org/10.1145/2499370.2462176 pp. 519–530.
T. Karcher and V. Pankratius, “Run-time automatic performance tuning for multicore applications,” in Euro-Par 2011. Part I., ser. LNCS, E. Jeannot, R. Namyst, and J. Roman, Eds., no. 6852. Springer, 2011. http://dx.doi.org/10.1007/978-3-642-23400-2 2 pp. 3–14.
A. Raman, A. Zaks, J. Lee, and D. August, “Parcae: A System for Flexible Parallel Execution,” in Proc. of the 33rd ACM SIGPLAN Conf. on Programming Language Design and Implementation, ser. PLDI ’12, 2012. http://dx.doi.org/10.1145/2254064.2254082 pp. 133–144.
A. H. Ashouri, G. Palermo, J. Cavazos, and C. Silvano, Automatic Tuning of Compilers Using Machine Learning, 1st ed. Springer, 2017. http://dx.doi.org/10.1007/978-3-319-71489-9.
D. Feld, T. Soddemann, M. Jünger, and S. Mallach, “Facilitate SIMD-Code-Generation in the Polyhedral Model by Hardware-aware Automatic Code-Transformation,” in Proc. of the 3rd International Workshop on Polyhedral Compilation Techniques, A. Größlinger and L.-N. Pouchet, Eds., Berlin, Germany, Jan. 2013. http://dx.doi.org/10.13140/2.1.5066.3368 pp. 45–54.
J. A. Nelder and R. Mead, “A Simplex Method for Function Minimization,” The Computer Journal, vol. 7, no. 4, pp. 308–313, 1965. http://dx.doi.org/10.1093/comjnl/7.4.308
S. Muralidharan, M. Shantharam, M. Hall, M. Garland, and B. Catanzaro, “Nitro: A framework for adaptive code variant tuning,” in 28th IEEE Int. Parallel and Distributed Processing Symp. (IPDPS 2014), May 2014. http://dx.doi.org/10.1109/IPDPS.2014.59 pp. 501–512.
A. Panyala, D. Chavarra-Miranda, J. B. Manzano, A. Tumeo, and M. Halappanavar, “Exploring performance and energy tradeoffs for irregular applications,” J. Parallel Distrib. Comput., vol. 104, no. C, pp. 234–251, Jun. 2017. http://dx.doi.org/10.1016/j.jpdc.2016.06.006