Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs

Bartłomiej Gawrych; Pawel Czarnul

Performance assessment of OpenMP constructs and benchmarks using modern compilers and multi-core CPUs

Bartłomiej Gawrych, Pawel Czarnul

DOI: http://dx.doi.org/10.15439/2023F7822

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 973–978 (2023)

Full text

Abstract. Considering ongoing developments of both modern CPUs, especially in the context of increasing numbers of cores, cache memory and architectures as well as compilers there is a constant need for benchmarking representative and frequently run workloads. The key metric is speed-up as the computational power of modern CPUs stems mainly from using multiple cores. In this paper, we show and discuss results from running codes such as: batch normalization, convolution, linear function, matrix multiplication, prime number test and wave equation; using compilers such as: GNU gcc, LLVM clang, icx, icc; run on four different 1 or 2-socket systems: 1 x Intel Core i7-5960X, 1 x Intel Core i9-9940X, 2 x Intel Xeon Platinum 8280L, 2 x Intel Xeon Gold 6130. Results can be regarded as suggestions concerning scaling on particular CPUs including recommended thread number configurations.

References

P. Czarnul, Parallel Programming for Modern High Performance Computing Systems. CRC Press, Taylor & Francis, 2018, iSBN 9781138305953.
A. Prabhakar, V. Getov, and B. Chapman, “Performance comparisons of basic openmp constructs,” in High Performance Computing, H. P. Zima, K. Joe, M. Sato, Y. Seo, and M. Shimasaki, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002. ISBN 978-3-540-47847-8 pp. 413–424.
Q. Luo, C. Kong, Y. Cai, and G. Liu, “Performance evaluation of openmp constructs and kernel benchmarks on a loongson-3a quadcore smp system,” in 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, 2011. http://dx.doi.org/10.1109/PDCAT.2011.66 pp. 191–196.
N. R. Fredrickson, A. Afsahi, and Y. Qian, “Performance characteristics of openmp constructs, and application benchmarks on a large symmetric multiprocessor,” in Proceedings of the 17th Annual International Conference on Supercomputing, ser. ICS ’03. New York, NY, USA: Association for Computing Machinery, 2003. http://dx.doi.org/10.1145/782814.782835. ISBN 1581137338 p. 140–149. [Online]. Available: https://doi.org/10.1145/782814.782835
P. Czarnul, “Assessment of openmp master–slave implementations for selected irregular parallel applications,” Electronics, vol. 10, no. 10, 2021. http://dx.doi.org/10.3390/electronics10101188. [Online]. Available: https://www.mdpi.com/2079-9292/10/10/1188
K. Fürlinger and M. Gerndt, “Analyzing overheads and scalability characteristics of openmp applications,” in High Performance Computing for Computational Science - VECPAR 2006, M. Daydé, J. M. L. M. Palma, Á. L. G. A. Coutinho, E. Pacitti, and J. C. Lopes, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. ISBN 978-3-540-71351-7 pp. 39–51.
V. G. M. Vergara, W. D. Joubert, M. G. Lopez, and O. R. Hernandez, “Early experiences writing performance portable openmp 4 codes,” in Cray User Group Conference, London, United Kingdom, May 2016.
W. Zhu, J. del Cuvillo, and G. R. Gao, “Performance characteristics of openmp language constructs on a many-core-on-a-chip architecture,” in OpenMP Shared Memory Parallel Programming, M. S. Mueller, B. M. Chapman, B. R. de Supinski, A. D. Malony, and M. Voss, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. ISBN 978-3-540-68555-5 pp. 230–241.
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML’15. JMLR.org, 2015, p. 448–456.
B. Gawrych and P. Czarnul, “Performance investigation of openmp constructs and benchmarks using modern compilers and multi-core cpus,” September 2021, https://github.com/bgawrych/openmp_benchmark.
A. Krzywaniak, J. Proficz, and P. Czarnul, “Analyzing energy/performance trade-offs with power capping for parallel applications on modern multi and many core processors,” in 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), 2018, pp. 339–346.
P. Czarnul, P. Rościszewski, M. Matuszek, and J. Szymański, “Simulation of parallel similarity measure computations for large data sets,” in 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF), 2015. http://dx.doi.org/10.1109/CYBConf.2015.7175980 pp. 472–477.
P. Czarnul, A. Ciereszko, and M. Frązak, “Towards efficient parallel image processing on cluster grids using gimp,” in Computational Science - ICCS 2004, M. Bubak, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. ISBN 978-3-540-24687-9 pp. 451–458.