Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 319–327 (2018)
Abstract. The energy efficiency of program executions is an active research field in recent years and the influence of different programming styles on the energy consumption is part of the research effort. In this article, we concentrate on SIMD programming and study the effect of vectorization on performance as well as on power and energy consumption. Especially, SIMD programs using AVX instructions are considered and the focus is on the AVX load and store instruction set. Several semantically similar but different load and store instructions are selected and are used to build different program versions of for the same algorithm. As example application, the Gaussian elimination has been chosen due to its interesting feature of using arrays of varying length in each factorization step. Five different SIMD program versions of the Gaussian elimination have been implemented, each of which uses different load and store instructions. Performance, power, and energy measurements for all program versions are provided for the Intel Sandy Bridge, Haswell and Skylake architectures and the results are discussed and analyzed.
- Intel Corporation, “Intel Intrinsics Guide,” Apr. 2018. [Online]. Available: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#
- C. Kim, N. Satish, J. Chhugani, H. Saito, R. Krishnaiyer, M. Smelyanskiy, M. Girkar, and P. Dubey, “Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology,” Intel R Corporation, Tech. Rep., 2013. [Online]. Available: http://www.intel.com.br/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-closing-ninja-gap-paper.pdf
- J. M. Cebrián, L. Natvig, and J. C. Meyer, “Improving Energy Efficiency through Parallelization and Vectorization on Intel Core i5 and i7 Processors,” in 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, Nov. 2012. http://dx.doi.org/10.1109/SC.Companion.2012.93 pp. 675–684.
- M. Lorenz, L. Wehmeyer, and T. Dräger, “Energy Aware Compilation for DSPs with SIMD Instructions,” in Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems, ser. LCTES/SCOPES ’02. New York, NY, USA: ACM, 2002. http://dx.doi.org/10.1145/513829.513847. ISBN 978-1-58113-527-5 pp. 94–101.
- Intel Corporation, “Intel C++ Compiler 17.0 Developer Guide and Reference,” 2018. [Online]. Available: https://software.intel.com/en-us/node/682974
- G. H. Golub and C. F. Van Loan, Matrix computations, 4th ed. Baltimore, Md.: Johns Hopkins University Pr., 2013. ISBN 978-1-4214-0794-4
- T. Jakobs, M. Hofmann, and G. Rünger, “Reducing the Power Consumption of Matrix Multiplications by Vectorization,” in 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES), Aug. 2016. http://dx.doi.org/10.1109/CSE-EUC-DCABES.2016.187 pp. 213–220.
- M. Plauth and A. Polze, “Are Low-Power SoCs Feasible for Heterogenous HPC Workloads?” in Euro-Par 2016: Parallel Processing Workshops, vol. 10104. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-58943-5 61. ISBN 978-3-319-58942-8 978-3-319-58943-5 pp. 763–774.
- T. Rauber and G. Rünger, “Towards an Energy Model for Modular Parallel Scientific Applications,” in 2012 IEEE International Conference on Green Computing and Communications, Nov. 2012. http://dx.doi.org/10.1109/GreenCom.2012.79 pp. 523–532.
- H. Lien, L. Natvig, A. A. Hasib, and J. C. Meyer, “Case Studies of Multi-core Energy Efficiency in Task Based Programs,” in ICT as Key Technology against Global Warming. Springer, Berlin, Heidelberg, Sep. 2012. http://dx.doi.org/10.1007/978-3-642-32606-6 4 pp. 44–54.
- H. Caminal, D. Caballero, J. M. Cebrián, R. Ferrer, M. Casas, M. Moretó, X. Martorell, and M. Valero, “Performance and energy effects on task-based parallelized applications: User-directed versus manual vectorization,” The Journal of Supercomputing, Mar. 2018. http://dx.doi.org/10.1007/s11227-018-2294-9
- J. Hofmann, J. Treibig, G. Hager, and G. Wellein, “Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips,” in Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing, ser. WPMVP ’14. New York, NY, USA: ACM, 2014. http://dx.doi.org/10.1145/2568058.2568068. ISBN 978-1-4503-2653-7 pp. 57–64.