Impact of processor frequency scaling on performance and energy consumption for WZ factorization on multicore architecture

Beata Bylina; Jarosław Bylina; Monika Piekarz

Impact of processor frequency scaling on performance and energy consumption for WZ factorization on multicore architecture

Beata Bylina, Jarosław Bylina, Monika Piekarz

DOI: http://dx.doi.org/10.15439/2023F6213

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 377–383 (2023)

Full text

Abstract. With the growing demand for computing power, new multicore architectures have emerged to provide better performance. Reducing their energy consumption is one of the main challenges in achieving high performance computing. Current research trends develop new software and hardware techniques to achieve the best performance and energy compromise. In this work, we investigate the effect of processor frequency scaling using Dynamic Voltage Frequency Scaling on performance and energy consumption for the WZ factorization. This factorization is implemented both without optimization techniques and with strip mining. This technique involves transforming the program loop to improve program performance. Based on time and energy tests, we have shown that for the WZ factorization algorithm, regardless of the presence of manual optimization, it pays to reduce the frequency to save energy without losing performance. The conclusion can be extended to analogous algorithms --- also having a high ratio of memory access to computational operations.

References

Weisel. A. and F. Bellosa. Process cruise control - event-driven clock scaling for dynamic power management. CASES, 2002.
Amd powernow! technology dynamically manages powerand performance. https://www.amd.com/system/files/TechDocs/24404a.pdf, 2000.
B. Bylina, J. Potiopa, M. Klisowski, and J. Bylina. The impact of vectorization and parallelization of the slope algorithm on performance and energy efficiency on multi-core architecture. Annals of Computer Science and Information Systems, 25:2283–290, 2021.
Beata Bylina and Jarosław Bylina. Nested loop transformations on multi- and many-core computers with shared memory. In Jarosław Bylina, editor, Selected Topics in Applied Computer Science, pages 167–186. Maria Curie-Skłodowska University Press, Lublin, 2021. http://stacs.matrix.umcs.pl/v01/stacs_v01.pdf.
Jarosław Bylina, Beata Bylina, and Monika Piekarz. Influence of loop transformations on performance and energy consumption of the multithreded wz factorization. Preproceedings of the of the 17th Conference on Computer Science and Intelligence Systems, pages 479–488, 2022. https://annals-csis.org/proceedings/2022/pliks/251.pdf.
R. Chandra, L. Dagum, D. Kohr, D. Maydan, R. Menon, and J. McDonald. Parallel Programming in OpenMP. Morgan Kaufmann Publishers, San Francisco, 2001.
K. De Vogeleer, G. Memmi, P. Jouvelot, and F. Coelho. The energy/frequency convexity rule:modeling and experimental validation onmobile devices. PPAM’2013, 2014.
J. Dongarra, H. Ltaief, P. Luszczek, and V. M. Weaver. Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures. In 2012 Second International Conference on Cloud and Green Computing, pages 274–281, 2012.
Jack J Dongarra, Cleve Barry Moler, James R Bunch, and Gilbert W Stewart. LINPACK users’ guide. SIAM, , 1979.
D.J. Evans and M. Hatzopoulos. A parallel linear system solver. International Journal of Computer Mathematics, 7(3):227–238, 1979.
Green500. https://www.top500.org/lists/green500/, 2022.
T. Jakobs and G. Rünger. Examining energy efficiency of vectorization techniques using a Gaussian elimination. In 2018 International Conference on High Performance Computing Simulation (HPCS), pages 268–275, 2018.
K. Khan, M. Hirki, T. Niemi, J. Nurminen, and Z. Ou. RAPL in action: Experiences in using RAPL for power measurements. ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), 3, 2018.
João V.F. Lima, Issam Raïs, Laurent Lefevre, and Thierry Gautier. Performance and energy analysis of openmp runtime systems with dense linear algebra algorithms. In 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pages 7–12, 2017.
João Vicente Ferreira Lima, Issam Raïs, Laurent Lefèvre, and Thierry Gautier. Performance and energy analysis of OpenMP runtime systems with dense linear algebra algorithms. The International Journal of High Performance Computing Applications, 33(3):431–443, 2019.
Maxime Mirka, Guillaume Devic, Florent Bruguier, Gilles Sassatelli, and Abdoulaye Gamatié. Automatic energy-efficiency monitoring of openmp workloads. In 2019 14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), pages 43– 50, 2019.
E. Rotem, A. Mendelson, A. Naveh, and M. Moffie. Analysis of the enhanced intel R speedstep R technology of the pentium R m processor. https://www.cs.virginia.edu/~skadron/tacs/rotem\_slides.pdf, 2004.
A. Shahid, S. Arif, M.Y. Qadri, and S. Munawar. Power optimization using clock gating and power gating: A review. In Qusay F. Hassan, editor, Innovative Research and Applications in Next-Generation High Performance Computing, , 2016. IGI Global.
Md Abdullah Shahneous Bari, M. Malik Abid, Ahmad Qawasmeh, and Barbara Chapman. Performance and energy impact of openmp runtime configurations on power constrained systems. Sustainable Computing: Informatics and Systems, 23:1–12, 2019.
L. Szustak, R. Wyrzykowski, T. Olas, and V. Mele. Correlation of performance optimizations and energy consumption for stencil-based application on Intel Xeon scalable processors. IEEE Transactions on Parallel and Distributed Systems, 31(11):2582–2593, 2020.
L. Szustak, R. Wyrzykowski, T. Olas, and V. Mele. Correlation of performance optimizations and energy consumption for stencil-based application on Intel Xeon scalable processors. IEEE Transactions on Parallel and Distributed Systems, 31(11):2582–2593, 2020.
Top500. https://www.top500.org/, 2022.
M. Weiser, B. Welch, A.J. Demers, and S. Shenker. Scheduling for reduced cpu energy. 1st OSDI, pages 13–23, 1994.
P. Yalamov and D.J. Evans. The wz matrix factorisation method. Parallel Computing, 21(7):1111–1120, 1995.