Logo PTI Logo FedCSIS

Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS)

Annals of Computer Science and Information Systems, Volume 43

Pretraining Transformers for Chess Puzzle Difficulty Prediction

DOI: http://dx.doi.org/10.15439/2025F7603

Citation: Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 43, pages 831835 ()

Full text

Abstract. This paper presents our third-place solution for the FedCSIS 2025 Challenge: Predicting Chess Puzzle Difficulty - Second Edition. Building on our prior GlickFormer architecture, we develop a transformer-based approach featuring a novel multitask pretraining strategy that combines masked-square reconstruction with solution policy prediction. Our spatial-only architecture directly embeds solution moves, eliminating temporal modules, while integrating human-centric priors through Maia-2 engine solve-rate predictions. Evaluated on the Lichess puzzle corpus, our approach reduces validation MSE by 30.4\% compared to from-scratch training and achieves competitive results (test MSE: 55.9k) despite distribution shifts in the competition environment.

References

  1. S. Miłosz and P. Kapusta, “Predicting chess puzzle difficulty with transformers,” in 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 8377–8384. DOI: 10.1109/BigData62323.2024.10825919.
  2. J. Zyśko, M. Ślęzak, D. Ślęzak, and M. Świechowski, “FedCSIS 2025 knowledgepit.ai Competition: Predicting Chess Puzzle Difficulty Part 2 & A Step Toward Uncertainty Contests,” in Proceedings of the 20th Conference on Computer Science and Intelligence Systems, M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, and D. Ślęzak, Eds., ser. Annals of Computer Science and Information Systems, vol. 43, Polish Information Processing Society, 2025. DOI : 10.15439/2025F5937.
  3. Z. Tang, D. Jiao, R. McIlroy-Young, J. Kleinberg, S. Sen, and A. Anderson, “Maia-2: A unified model for human-ai alignment in chess,” in Advances in Neural Information Processing Systems 37 (NeurIPS 2024), Accepted @ NeurIPS 2024, 2024. DOI : 10.48550/arXiv.2409.20553. eprint: 2409.20553.
  4. T. Woodruff, O. Filatov, and M. Cognetta, “The Bread Emoji Team’s Submission to the IEEE BigData 2024 Cup: Predicting Chess Puzzle Difficulty Challenge,” in 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 8415–8422. DOI : 10.1109/BigData62323.2024.10826037.
  5. J. Zyśko, M. Świechowski, S. Stawicki, K. Jagieła, A. Janusz, and D. Ślezak, “IEEE Big Data Cup 2024 Report: Predicting Chess Puzzle Difficulty at KnowledgePit.ai,” in 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 8423–8429. DOI:10.1109/BigData62323.2024.10825289.
  6. S. Bjorkqvist, “Estimating the puzzlingness of chess puzzles,” in 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 8370–8376. DOI: 10.1109/BigData62323.2024.10825991.
  7. A. Schutt, T. Huber, and E. Andre, “Estimating chess puzzle difficulty without past game records using a human problem-solving inspired neural network architecture,” in 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 8396–8402. DOI: 10.1109/BigData62323.2024.10826087.
  8. A. Rafaralahy, “Pairwise learning to rank for chess puzzle difficulty prediction,” in 2024 IEEE International Conference on Big Data (BigData), Washing- ton, DC, USA, 2024, pp. 8385–8389. DOI: 10.1109/BigData62323.2024.10825356.
  9. J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint https://arxiv.org/abs/1810.04805, 2018.
  10. A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint https://arxiv.org/abs/2010.11929, 2020. DOI: 10.48550/arXiv.2010.11929. [Online]. Available: https://arxiv.org/abs/2010.11929.
  11. A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” OpenAI, Tech. Rep., 2018. [Online]. Available: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
  12. C. Raffel, N. Shazeer, A. Roberts, et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020. eprint: 1910.10683.
  13. D. Noever, “Chess transformer: Mastering the game of chess with attention,” arXiv preprint https://arxiv.org/abs/2008.04057, 2020.
  14. D. Misra, “Mish: A self regularized non-monotonic activation function,” arXiv preprint https://arxiv.org/abs/1908.08681, 2019.
  15. D. Monroe and P. Chalmers, “Mastering chess with a transformer model,” arXiv preprint https://arxiv.org/abs/2409.12272, 2024, Describes Lc0’s transformer architecture and smolgen position encoding. [Online]. Available: https://arxiv.org/html/2409.12272v1.
  16. J. Hoffmann, S. Borgeaud, A. Mensch, et al., “Training compute-optimal large language models,” arXiv preprint https://arxiv.org/abs/2203.15556, 2022.
  17. I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” arXiv preprint https://arxiv.org/abs/1711.05101, 2017.