Optimizing the Optimizer: An Example Showing the Power of LLM Code Generation

Camilo Chacón Sartori; Christian Blum

Optimizing the Optimizer: An Example Showing the Power of LLM Code Generation

Camilo Chacón Sartori, Christian Blum

DOI: http://dx.doi.org/10.15439/2025F1481

Citation: Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 43, pages 47–57 (2025)

Full text

Abstract. The integration of Large Language Models (LLMs) into optimization has created a powerful synergy, opening exciting research opportunities. This paper investigates how LLMs can enhance existing optimization algorithms. Using their pre-trained knowledge, we demonstrate their ability to propose innovative heuristic variations based on a semantic understanding of the algorithm's components. To evaluate this, we applied a nontrivial optimization algorithm, Construct, Merge, Solve & Adapt (CMSA)---a hybrid metaheuristic for combinatorial optimization problems that incorporates a heuristic in the solution construction phase. Our results show that an alternative heuristic proposed by GPT-4o outperforms the expert-designed heuristic of CMSA, with the performance gap widening on larger and denser graphs.

References

C. Blum. Construct, Merge, Solve & Adapt: A Hybrid Metaheuristic for Combinatorial Optimization. Computational Intelligence Methods and Applications. Springer Nature Switzerland, 2024. ISBN 9783031601026. URL https://books.google.es/books?id=ENCt0AEACAAJ.
C. Blum, J. Puchinger, G. R. Raidl, and A. Roli. Hybrid metaheuristics in combinatorial optimization: A survey. Applied Soft Computing, 11(6):4135–4151, 2011. ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2011.02.032. URL https://www.sciencedirect.com/science/article/pii/S1568494611000962.
C. Blum, P. Pinacho, M. López-Ibáñez, and J. A. Lozano. Construct, merge, solve & adapt a new general algorithm for combinatorial optimization. Computers & Operations Research, 68:75–88, 2016. ISSN 0305-0548. https://doi.org/10.1016/j.cor.2015.10.014. URL https://www.sciencedirect.com/science/article/pii/S0305054815002452.
C. Chacón Sartori. Architectures of error: A philosophical inquiry into ai-generated and human-generated code, May 2025. URL https://ssrn.com/abstract=5265751. Available at SSRN.
A. Chen, J. Scheurer, T. Korbak, J. A. Campos, J. S. Chan, S. R. Bowman, K. Cho, and E. Perez. Improving code generation by training with natural language feedback. arXiv [cs.SE], Mar. 2023.
L. Chen, Q. Guo, H. Jia, Z. Zeng, X. Wang, Y. Xu, J. Wu, Y. Wang, Q. Gao, J. Wang, W. Ye, and S. Zhang. A survey on evaluating large language models in code generation tasks, 2024. URL https://arxiv.org/abs/2408.16498.
M. Chen and et al. Evaluating large language models trained on code, 2021. URL https://arxiv.org/abs/2107.03374.
W. Chen, J. Zhu, Q. Fan, Y. Ma, and A. Zou. Cuda-llm: Llms can write efficient cuda kernels, 2025. URL https://arxiv.org/abs/2506.09092.
W.-L. Chiang, L. Zheng, Y. Sheng, A. N. Angelopoulos, T. Li, D. Li, H. Zhang, B. Zhu, M. Jordan, J. E. Gonzalez, and I. Stoica. Chatbot arena: An open platform for evaluating llms by human preference, 2024. URL https://arxiv.org/abs/2403.04132.
C. Cummins, V. Seeker, D. Grubisic, M. Elhoushi, Y. Liang, B. Roziere, J. Gehring, F. Gloeckle, K. Hazelwood, G. Synnaeve, and H. Leather. Large language models for compiler optimization. arXiv [cs.PL], Sept. 2023.
DeepSeek-AI et al. Deepseek-v3 technical report, 2024. URL https://arxiv.org/abs/2412.19437.
Q. Dong, L. Li, D. Dai, C. Zheng, J. Ma, R. Li, H. Xia, J. Xu, Z. Wu, T. Liu, B. Chang, X. Sun, L. Li, and Z. Sui. A survey on in-context learning, 2024. URL https://arxiv.org/abs/2301.00234.
E. Hemberg, S. Moskal, and U.-M. O’Reilly. Evolving code with a large language model, 2024. URL https://arxiv.org/abs/2401.07102.
J. Jiang, F. Wang, J. Shen, S. Kim, and S. Kim. A survey on large language models for code generation. arXiv [cs.CL], June 2024.
X. Jiang, Y. Dong, L. Wang, Z. Fang, Q. Shang, G. Li, Z. Jin, and W. Jiao. Self-planning code generation with large language models. ACM Trans. Softw. Eng. Methodol., 33(7):1–30, Sept. 2024.
S. Joel, J. J. Wu, and F. H. Fard. A survey on llm-based code generation for low-resource and domain-specific programming languages, 2024. URL https://arxiv.org/abs/2410.03981.
S. Joel, J. J. W. Wu, and F. H. Fard. A survey on LLM-based code generation for low-resource and domainspecific programming languages. arXiv [cs.SE], Oct. 2024.
U. Kamath, K. Keenan, G. Somers, and S. Sorenson. Large Language Models: A Deep Dive: Bridging Theory and Practice. Springer Nature Switzerland, 2024. ISBN 9783031656477. URL https://books.google.es/books?id=kDobEQAAQBAJ.
M.-A. Lachaux, B. Roziere, L. Chanussot, and G. Lample. Unsupervised translation of programming languages. arXiv [cs.CL], June 2020.
J. Li, G. Li, C. Tao, J. Li, H. Zhang, F. Liu, and Z. Jin. Large language model-aware in-context learning for code generation. arXiv [cs.SE], Oct. 2023.
F. Liu, R. Zhang, Z. Xie, R. Sun, K. Li, X. Lin, Z. Wang, Z. Lu, and Q. Zhang. Llm4ad: A platform for algorithm design with large language model, 2024. URL https://arxiv.org/abs/2412.17287.
M. López-Ibáñez, J. Dubois-Lacoste, L. Pérez Cáceres, T. Stützle, and M. Birattari. The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives, 3:43–58, 2016. https://dx.doi.org/10.1016/j.orp.2016.09.002.
S. Mirchandani, F. Xia, P. Florence, B. Ichter, D. Driess, M. G. Arenas, K. Rao, D. Sadigh, and A. Zeng. Large language models as general pattern machines, 2023. URL https://arxiv.org/abs/2307.04721.
OpenAI et al. Gpt-4 technical report, 2024. URL https://arxiv.org/abs/2303.08774.
R. Pan, A. R. Ibrahimzada, R. Krishna, D. Sankar, L. P. Wassi, M. Merler, B. Sobolev, R. Pavuluri, S. Sinha, and R. Jabbarvand. Lost in translation: A study of bugs introduced by large language models while translating code. arXiv [cs.SE], Aug. 2023.
M. Pluhacek, A. Kazikova, T. Kadavy, A. Viktorin, and R. Senkerik. Leveraging large language models for the generation of novel metaheuristic optimization algorithms. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation, GECCO ’23 Companion, page 1812–1820, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400701207. https://dx.doi.org/10.1145/3583133.3596401. URL https://doi.org/10.1145/3583133.3596401.
B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. R. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi, P. Kohli, A. Fawzi, J. Grochow, A. Lodi, J.-B. Mouret, T. Ringer, and T. Yu. Mathematical discoveries from program search with large language models. Nature, 625:468 – 475, 2023. URL https://api.semanticscholar.org/CorpusID:266223700.
C. C. Sartori, C. Blum, F. Bistaffa, and G. Rodríguez Corominas. Metaheuristics and large language models join forces: Toward an integrated optimization approach. IEEE Access, 13:2058–2079, 2025. https://dx.doi.org/10.1109/ACCESS.2024.3524176.
K. Sim, Q. Renau, and E. Hart. Beyond the hype: Benchmarking llm-evolved heuristics for bin packing, 2025. URL https://arxiv.org/abs/2501.11411.
N. v. Stein and T. Bäck. LLaMEA: A large language model evolutionary algorithm for automatically generating metaheuristics. IEEE Transactions on Evolutionary Computation, 29(2):331–345, 2025. https://dx.doi.org/10.1109/TEVC.2024.3497793.
A. Team. Introducing Claude 3.5 Sonnet — anthropic.com. https://www.anthropic.com/news/claude-3-5-sonnet, 2024. [Accessed 02-11-2024].
G. Team and et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024. URL https://arxiv.org/abs/2403.05530.
M. Team. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783.
H. Wang, T. Fu, Y. Du, W. Gao, K. Huang, Z. Liu, P. Chandak, S. Liu, P. Katwyk, A. Deac, A. Anandkumar, K. Bergen, C. P. Gomes, and Shir. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, August 2023. https://dx.doi.org/10.1038/s41586-023-06221-2. URL https://ideas.repec.org/a/nat/nature/v620y2023i7972d10.1038_s41586-023-06221-2.html.
Y. Wen, P. Yin, K. Shi, H. Michalewski, S. Chaudhuri, and A. Polozov. Grounding data science code generation with input-output specifications, 2024. URL https://arxiv.org/abs/2402.08073.
D. Wolpert and W. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997. https://dx.doi.org/10.1109/4235.585893.
S. Xiao, Y. Chen, J. Li, L. Chen, L. Sun, and T. Zhou. Prototype2code: End-to-end front-end code generation from ui design prototypes, 2024. URL https://arxiv.org/abs/2405.04975.
D. Zan, B. Chen, F. Zhang, D. Lu, B. Wu, B. Guan, Y. Wang, and J.-G. Lou. Large language models meet NL2Code: A survey, 2023. URL https://arxiv.org/abs/2212.09420.
Y. Zhou, A. I. Muresanu, Z. Han, K. Paster, S. Pitis, H. Chan, and J. Ba. Large language models are human-level prompt engineers, 2023. URL https://arxiv.org/abs/2211.01910.