NPDP programming for RISC multi-core processors

Marek Palkowski; Mateusz Grużewski

NPDP programming for RISC multi-core processors

Marek Palkowski, Mateusz Grużewski

DOI: http://dx.doi.org/10.15439/2025F8280

Citation: Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 43, pages 753–758 (2025)

Full text

Abstract. In recent years, parallel architectures have become ubiquitous due to advancements in AI and cloud computing. However, parallel processing is not limited to x86 CISC CPUs and advanced graphics cards, GPUs; it also includes computations on ARM-based and RISC-V devices. In recent years, ARM processors have been adapted to incorporate an increasing number of cores. Today, mobile devices feature at least eight execution units, typically divided into energy-efficient and performance-oriented groups. RISC-based parallel processors are also integrated on development boards supported by the Linux kernel. In this article, we tested our NPDP Benchmark Suite for non-serial polyadic dynamic programming, primarily in the field of computer algorithms and bioinformatics, to evaluate the performance of the RISC processors under study, as well as code locality and cache efficiency. The benchmark consists of 10 kernels written in C++ and OpenMP. In the Android environment, we used the JAVA NDK (Native Development Kit) to port the application. For Apple machines, we used a port to OpenMP for parallelization. For the RISC-V native Linux environment, we applied the native Linux setup for efficient execution. Finally, we summarized the article and outlined future work.

References

M. Palkowski and W. Bielecki, “NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers,” Parallel Computing, vol. 116, p. 103016, Jul. 2023. https://dx.doi.org/10.1016/j.parco.2023.103016. [Online]. Available: https://doi.org/10.1016/j.parco.2023.103016
S. Verdoolaege, “Integer set library - manual,” www.kotnet.org/~skimo/isl/manual.pdf, 2011, accessed on: 2024-01-11.
L. Liu, M. Wang, J. Jiang, R. Li, and G. Yang, “Efficient nonserial polyadic dynamic programming on the cell processor.” in IPDPS Workshops. Anchorage, Alaska: IEEE, 2011, pp. 460–471.
R. T. Mullapudi and U. Bondhugula, “Tiling for dynamic scheduling,” in Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, S. Rajopadhye and S. Verdoolaege, Eds., Vienna, Austria, Jan. 2014.
M. Palkowski and W. Bielecki, “Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing,” BMC Bioinformatics, vol. 18, no. 1, p. 290, 2017. https://dx.doi.org/10.1186/s12859-017-1707-8
V. K. Tchendji, F. I. K. Youmbi, C. T. Djamegni, and J. L. Zeutouo, “A parallel tiled and sparsified Four-Russians algorithm for Nussinov's RNA folding,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, pp. 1–12, 2022. https://dx.doi.org/10.1109/tcbb.2022.3216826
R. Nussinov et al., “Algorithms for loop matchings,” SIAM Journal on Applied mathematics, vol. 35, no. 1, pp. 68–82, 1978.
U. Bondhugula et al., “A practical automatic polyhedral parallelizer and locality optimizer,” SIGPLAN Not., vol. 43, no. 6, pp. 101–113, Jun. 2008. https://dx.doi.org/10.1145/1379022.1375595
W. Bielecki and M. Poliwoda, “Automatic parallel tiled code generation based on dependence approximation,” in Parallel Computing Technologies, V. Malyshkin, Ed. Cham: Springer International Publishing, 2021, pp. 260–275.
M. Palkowski and M. Gruzewski, “Time and energy benefits of using automatic optimization compilers for NPDP tasks,” Electronics, vol. 12, no. 17, p. 3579, Aug. 2023. https://dx.doi.org/10.3390/electronics12173579. [Online]. Available: http://dx.doi.org/10.3390/electronics12173579
J. Xue, Loop Tiling for Parallelism. Norwell, MA, USA: Kluwer Academic Publishers, 2000. ISBN 0-7923-7933-0
OpenMP Architecture Review Board, “OpenMP application program interface version 5.2,” https://www.openmp.org/specifications, 2021, accessed on: 2023-10-22.
Apple Inc., “Grand central dispatch (gcd),” https://developer.apple.com/documentation/dispatch, 2009, accessed: 2025-05-05.
Banana Pi Team, “Banana pi bpi-f3 documentation,” https://docs.banana-pi.org/en/BPI-F3/BananaPi BPI-F3, 2025, accessed: 2025-05-05.
SpacemiT, “Spacemit key stone k1 – octa-core 64-bit risc-v ai cpu,” https://www.spacemit.com/en/key-stone-k1/, 2024, accessed: 2025-05-05.
Banana Pi Team, “Spacemit k1 8-core risc-v chip brief,” https://docs.banana-pi.org/en/BPI-F3/SpacemiT K1, 2024, accessed: 2025-05-05.
SpacemiT, “Bianbu linux,” https://bianbu.spacemit.com/en/, 2024, accessed: 2025-05-05.
Arm Ltd., “Arm cortex-a78 processor,” https://developer.arm.com/Processors/Cortex-A78, 2020, accessed: 2025-05-05.
Arm Ltd., “Arm cortex-a55 processor,” https://developer.arm.com/Processors/Cortex-A55, 2017, accessed: 2025-05-05.
Google LLC, “Android ndk – native development kit,” https://developer.android.com/ndk, 2024, accessed: 2025-05-05.
B. Bylina, J. Potiopa, M. Klisowski, and J. Bylina, “The impact of vectorization and parallelization of the slope algorithm on performance and energy efficiency on multi-core architecture,” in Proceedings of the 16th Conference on Computer Science and Intelligence Systems (FedCSIS), ser. Annals of Computer Science and Information Systems, vol. 25. PTI, 2021. https://dx.doi.org/10.15439/2021F68 pp. 283–290.
Sophgo, “Sg2042 risc-v 64-core soc,” 2025, accessed on: 2024-07-13. [Online]. Available: https://en.sophgo.com/sophon-u/product/introduce/sg2042.html