Logo PTI Logo FedCSIS

Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS)

Annals of Computer Science and Information Systems, Volume 39

Automatic Generation of OpenCL Code through Polyhedral Compilation with LLM

,

DOI: http://dx.doi.org/10.15439/2024F6469

Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 671676 ()

Full text

Abstract. In recent years, a multitude of AI solutions has emerged to facilitate code generation, commonly known as Language Model-based Programming (LLM). These tools empower programmers to automate their work. Automatic programming also falls within the domain of optimizing compilers, primarily based on the polyhedral model, which processes loop nests concentrating most computations. This article focuses on harnessing LLM tools to generate OpenCL code for non-serial polyadic dynamic programming kernels. We have chosen the Nussinov RNA folding computational task, previously employed to test polyhedral compilers in optimizing kernels with non-uniform dependences. The code generated in OpenMP by polyhedral optimizers is limited to CPU computations. We automatically convert it into the OpenCL standard using ChatGPT-3.5 through its source-to-source queries to extend the number of possible platforms. The validity and efficiency of the generated code were verified on various CPUs and GPUs from different manufacturers.

References

  1. Khronos OpenCL Working Group, The OpenCL Specification, Version 1.1, 2011. [Online]. Available: https://www.khronos.org/registry/cl/specs/opencl-1.1.pdf
  2. OpenMP Architecture Review Board, “OpenMP application program interface version 5.2,” https://www.openmp.org/specifications, 2021, accessed on: 2023-10-22.
  3. R. Nussinov et al., “Algorithms for loop matchings,” SIAM Journal on Applied mathematics, vol. 35, no. 1, pp. 68–82, 1978.
  4. R. T. Mullapudi and U. Bondhugula, “Tiling for dynamic scheduling,” in Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, S. Rajopadhye and S. Verdoolaege, Eds., Vienna, Austria, Jan. 2014.
  5. D. Wonnacott, T. Jin, and A. Lake, “Automatic tiling of “mostly-tileable” loop nests,” in 5th International Workshop on Polyhedral Compilation Techniques, Amsterdam, 2015.
  6. R. Chowdhury, , and et. al., “Autogen: Automatic discovery of efficient recursive divide-8-conquer algorithms for solving dynamic programming problems,” ACM Transactions on Parallel Computing, vol. 4, no. 1, pp. 1–30, oct 2017. http://dx.doi.org/10.1145/3125632
  7. W. Bielecki, P. Blaszynski, and M. Poliwoda, “3d parallel tiled code implementing a modified Knuth's optimal binary search tree algorithm,” Journal of Computational Science, vol. 48, p. 101246, jan 2021. http://dx.doi.org/10.1016/j.jocs.2020.101246
  8. W. Bielecki and M. Palkowski, “A parallelizing and optimizing compiler - traco,” http://traco.sourceforge.net, 2013, accessed on: 2024-01-11.
  9. W. Bielecki and M. Poliwoda, “Automatic parallel tiled code generation based on dependence approximation,” in Parallel Computing Technologies, V. Malyshkin, Ed. Cham: Springer International Publishing, 2021, pp. 260–275.
  10. U. Bondhugula et al., “A practical automatic polyhedral parallelizer and locality optimizer,” SIGPLAN Not., vol. 43, no. 6, pp. 101–113, Jun. 2008. [Online]. Available: http://pluto-compiler.sourceforge.net
  11. J. Xue, Loop Tiling for Parallelism. Norwell, MA, USA: Kluwer Academic Publishers, 2000. ISBN 0-7923-7933-0
  12. M. Palkowski and W. Bielecki, “NPDP benchmark suite for the evaluation of the effectiveness of automatic optimizing compilers,” Parallel Computing, vol. 116, p. 103016, Jul. 2023. http://dx.doi.org/10.1016/j.parco.2023.103016. [Online]. Available: https://doi.org/10.1016/j.parco.2023.103016
  13. A. Mehdi, Par4All User Guide, 2012. [Online]. Available: http://www.par4all.org
  14. C. Dave, H. Bae, S.-J. Min, S. Lee, R. Eigenmann, and S. Midkiff, “Cetus: A source-to-source compiler infrastructure for multicores,” Computer, vol. 42, pp. 36–42, 2009.
  15. S. Verdoolaege, J. Carlos Juega, A. Cohen, J. Ignacio Gómez, C. Tenllado, and F. Catthoor, “Polyhedral parallel code generation for cuda,” ACM Transactions on Architecture and Code Optimization, vol. 9, no. 4, p. 1–23, Jan. 2013. http://dx.doi.org/10.1145/2400682.2400713. [Online]. Available: http://dx.doi.org/10.1145/2400682.2400713
  16. “Nvidia corporation, cuda programming guide 12.3,” https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html, 2023, accessed on: 2023-10-22.
  17. K. Thouti and S. R. Sathe, “Comparison of openmp amp; opencl parallel processing technologies,” 2012. http://dx.doi.org/10.48550/ARXIV.1211.2038. [Online]. Available: https://arxiv.org/abs/1211.2038
  18. M. Khalilov and A. Timoveev, “Performance analysis of cuda, openacc and openmp programming models on tesla v100 gpu,” Journal of Physics: Conference Series, vol. 1740, no. 1, p. 012056, Jan. 2021. http://dx.doi.org/10.1088/1742-6596/1740/1/012056. [Online]. Available: http://dx.doi.org/10.1088/1742-6596/1740/1/012056
  19. G. Kan, X. He, L. Ding, J. Li, K. Liang, and Y. Hong, “A heterogeneous computing accelerated sce-ua global optimization method using openmp, opencl, cuda, and openacc,” Water Science and Technology, vol. 76, no. 7, p. 1640–1651, Jun. 2017. http://dx.doi.org/10.2166/wst.2017.322. [Online]. Available: http://dx.doi.org/10.2166/wst.2017.322
  20. D. Nichols, A. Marathe, H. Menon, T. Gamblin, and A. Bhatele, “Modeling parallel programs using large language models,” 2023, accessed on: 2024-01-11.
  21. L. Chen, P.-H. Lin, T. Vanderbruggen, C. Liao, M. Emani, and B. de Supinski, LM4HPC: Towards Effective Language Model Application in High-Performance Computing. Springer Nature Switzerland, 2023, p. 18–33. ISBN 9783031407444. [Online]. Available: http://dx.doi.org/10.1007/978-3-031-40744-4 2
  22. W. Godoy, P. Valero-Lara, K. Teranishi, P. Balaprakash, and J. Vetter, “Evaluation of openai codex for hpc parallel programming models kernel generation,” in Proceedings of the 52nd International Conference on Parallel Processing Workshops, ser. ICPP-W 2023. ACM, Aug. 2023. http://dx.doi.org/10.1145/3605731.3605886. [Online]. Available: http://dx.doi.org/10.1145/3605731.3605886
  23. G. C. Team, “Github copilot,” https://copilot.github.com/, 2022, an AI pair programmer for GitHub, Accessed on: 2023-10-22.
  24. M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, and W. Zaremba, “Evaluating large language models trained on code,” https://arxiv.org/abs/2107.03374, 2021, accessed on: 2023-10-22.
  25. P. Valero-Lara, A. Huante, M. A. Lail, W. F. Godoy, K. Teranishi, P. Balaprakash, and J. S. Vetter, “Comparing llama-2 and gpt- 3 llms for hpc kernels generation,” 2023. [Online]. Available: https://arxiv.org/abs/2309.07103
  26. “Introducing llama 2, the next generation of our open source large language model,” https://ai.meta.com/llama/, 2023, accessed on: 2023-10-22.
  27. D. Wonnacott, T. Jin, and A. Lake, “Automatic tiling of “mostly-tileable” loop nests,” in IMPACT 2015: 5th International Workshop on Polyhedral Compilation Techniques, At Amsterdam, The Netherlands, 2015.
  28. S. Verdoolaege, “Integer set library - manual,” www.kotnet.org/~skimo//isl/manual.pdf, 2011, accessed on: 2024-01-11.
  29. U. Bondhugula et al., “A practical automatic polyhedral parallelizer and locality optimizer,” SIGPLAN Not., vol. 43, no. 6, pp. 101–113, Jun. 2008. http://dx.doi.org/10.1145/1379022.1375595
  30. M. Palkowski and W. Bielecki, “Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing,” BMC Bioinformatics, vol. 18, no. 1, p. 290, 2017. http://dx.doi.org/10.1186/s12859-017-1707-8
  31. M. Palkowski and M. Gruzewski, “Time and energy benefits of using automatic optimization compilers for NPDP tasks,” Electronics, vol. 12, no. 17, p. 3579, Aug. 2023. http://dx.doi.org/10.3390/electronics12173579. [Online]. Available: http://dx.doi.org/10.3390/electronics12173579