Gradient Boosting Trees and Large Language Models for Tabular Data Few-Shot Learning

Carlos Huertas

Gradient Boosting Trees and Large Language Models for Tabular Data Few-Shot Learning

Carlos Huertas

DOI: http://dx.doi.org/10.15439/2024F1407

Citation: Communication Papers of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 41, pages 53–59 (2024)

Full text

Abstract. Large Language Models (LLM) have brought numerous of new applications to Machine Learning (ML). In the context of tabular data (TD), recent studies show that TabLLM is a very powerful mechanism for few-shot-learning (FSL) applications, even if gradient boosting decisions trees (GBDT) have historically dominated the TD field. In this work we demonstrate that although LLMs are a viable alternative, the evidence suggests that baselines used to gauge performance can be improved. We replicate public benchmarks and our methodology improves LightGBM by 290\%, this is mainly driven by forcing node splitting with few samples, a critical step in FSL with GBDT. Our results show an advantage to TabLLM for 8 or fewer shots, but as the number of samples increases GBDT provides competitive performance at a fraction of runtime. For other real-life applications with vast number of samples, we found FSL still useful to improve model diversity, and when combined with ExtraTrees provides strong resilience to overfitting, our proposal was validated in a ML competition setting ranking 1st place.

References

Ravid Shwartz-Ziv and Amitai Armon, “Tabular data: Deep learning is not all you need,” Information Fusion, vol. 81, pp. 84–90, 2022.
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang, “Deep & cross network for ad click predictions,” 2017.
Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar, “Vime: Extending the success of self- and semi-supervised learning to tabular domain,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, Eds. 2020, vol. 33, pp. 11033–11043, Curran Associates, Inc.
Yixuan Zhang, Jialiang Tong, Ziyi Wang, and Fengqiang Gao, “Customer transaction fraud detection using xgboost model,” in 2020 International Conference on Computer Engineering and Application (ICCEA), 2020, pp. 554–558.
Zifeng Wang and Suzhen Li, “Data-driven risk assessment on urban pipeline network based on a cluster model,” Reliability Engineering & System Safety, vol. 196, pp. 106781, 2020.
Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socolinsky, Srinivasan Sengamedu, and Christos Faloutsos, “Large language models(llms) on tabular data: Prediction, generation, and understanding – a survey,” 2024.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” CoRR, vol. abs/1706.03762, 2017.
Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen, “Deberta: Decoding-enhanced BERT with disentangled attention,” CoRR, vol. abs/2006.03654, 2020.
Mingxing Tan and Quoc V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” CoRR, vol. abs/1905.11946, 2019.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. 2021, OpenReview.net.
Jiashi Li, Xin Xia, Wei Li, Huixia Li, Xing Wang, Xuefeng Xiao, Rui Wang, Min Zheng, and Xin Pan, “Next-vit: Next generation vision transformer for efficient deployment in realistic industrial scenarios,” 2022.
Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci, “Deep neural networks and tabular data: A survey,” CoRR, vol. abs/2110.01889, 2021.
Dugang Liu, Pengxiang Cheng, Hong Zhu, Xing Tang, Yanyu Chen, Xiaoting Wang, Weike Pan, Zhong Ming, and Xiuqiang He, “Diwift: Discovering instance-wise influential features for tabular data,” 2022.
Tianqi Chen and Carlos Guestrin, “Xgboost: A scalable tree boosting system,” CoRR, vol. abs/1603.02754, 2016.
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. 2017, vol. 30, Curran Associates, Inc.
Anna Veronika Dorogush, Andrey Gulin, Gleb Gusev, Nikita Kazeev, Liudmila Ostroumova Prokhorenkova, and Aleksandr Vorobev, “Fighting biases with dynamic boosting,” CoRR, vol. abs/1706.09516, 2017.
Ravid Shwartz-Ziv and Amitai Armon, “Tabular data: Deep learning is not all you need,” 2021.
Tomaso Poggio, Andrzej Banburski, and Qianli Liao, “Theoretical issues in deep networks,” Proceedings of the National Academy of Sciences, vol. 117, no. 48, pp. 30039–30045, 2020.
Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux, “Why do tree-based models still outperform deep learning on tabular data?,” 2022.
Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian, “A comprehensive overview of large language models,” 2024.
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2023.
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang, “Sparks of artificial general intelligence: Early experiments with gpt-4,” 2023.
Xue Jiang, Yihong Dong, Lecheng Wang, Zheng Fang, Qiwei Shang, Ge Li, Zhi Jin, and Wenpin Jiao, “Self-planning code generation with large language models,” 2023.
Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, Cambridge, MA, USA, 2016, http://www.deeplearningbook.org.
Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci, “Deep neural networks and tabular data: A survey,” IEEE Transactions on Neural Networks and Learning Systems, p. 1–21, 2024.
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah, “Wide and deep learning for recommender systems,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, New York, NY, USA, 2016, DLRS 2016, p. 7–10, Association for Computing Machinery.
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He, and Zhenhua Dong, “Deepfm an end-to-end wide and deep learning framework for ctr prediction,” 2018.
Haoran Luo, Fan Cheng, Heng Yu, and Yuqi Yi, “Sdtr: Soft decision tree regressor for tabular data,” IEEE Access, vol. 9, pp. 55999–56011, 2021.
Guolin Ke, Zhenhui Xu, Jia Zhang, Jiang Bian, and Tie-Yan Liu, “Deepgbm: A deep learning framework distilled by gbdt for online prediction tasks,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 2019, KDD ’19, p. 384–394, Association for Computing Machinery.
Guolin Ke, Jia Zhang, Zhenhui Xu, Jiang Bian, and Tie-Yan Liu, “TabNN: A universal neural network solution for tabular data,” 2019.
Sergei Ivanov and Liudmila Prokhorenkova, “Boost then convolve: Gradient boosting meets graph neural networks,” 2021.
Sercan O. Arik and Tomas Pfister, “Tabnet: Attentive interpretable tabular learning,” 2019.
Zifeng Wang and Jimeng Sun, “Transtab: Learning transferable tabular transformers across tables,” 2022.
Xin Huang, Ashish Khetan, Milan Cvitkovic, and Zohar Karnin, “Tab-transformer: Tabular data modeling using contextual embeddings,” 2020
Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C. Bayan Bruss, and Tom Goldstein, “Saint: Improved neural networks for tabular data via row attention and contrastive pre-training,” 2021.
Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Tom Rainforth, and Yarin Gal, “Self-attention between datapoints: Going beyond individual input-output pairs in deep learning,” 2022.
Omurhan A. Soysal and Mehmet Serdar Guzel, “An introduction to zero-shot learning: An essential review,” in 2020 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), 2020, pp. 1–4.
Ruiyu Wang, Zifeng Wang, and Jimeng Sun, “Unipredict: Large language models are universal tabular classifiers,” 2024.
Sebastian Bordt, Harsha Nori, and Rich Caruana, “Elephants never forget: Testing language models for memorization of tabular data,” 2024.
Chaofan Chen, Kangcheng Lin, Cynthia Rudin, Yaron Shaposhnik, Sijia Wang, and Tong Wang, “An interpretable model with globally consistent explanations for credit risk,” 2018.
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel, “Extracting training data from large language models,” 2021.
Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang, “Quantifying memorization across neural language models,” 2023.
Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag, “Tabllm: Few-shot classification of tabular data with large language models,” 2023.
Arlind Kadra, Marius Lindauer, Frank Hutter, and Josif Grabocka, “Well-tuned simple nets excel on tabular datasets,” 2021.
Jack Smith, J. Everhart, W. Dickson, W. Knowler, and Richard Johannes, “Using the adap learning algorithm to forcast the onset of diabetes mellitus,” Proceedings - Annual Symposium on Computer Applications in Medical Care, vol. 10, 11 1988.
Steinbrunn William Pfisterer Matthias Janosi, Andras and Robert Detrano, “Heart Disease,” UCI Machine Learning Repository, 1988, http://dx.doi.org/https://doi.org/10.24432/C52P4X.
Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter, “Tabpfn: A transformer that solves small tabular classification problems in a second,” 2023.
Sebastian Stawicki Andrzej Janusz, Dominik Slezak and Mariusz Rosiak, “Data-driven risk assessment on urban pipeline network based on a cluster model,” Proceedings of the 24th International Workshop on Concurrency, Specification and Programming, 2015.
Ivana T. Dragovic Ana M. Poledica Milica M. Zukanovic Andrzej Janusz Dominik Slezak Aleksandar M. Rakicevic, Pavle D. Milosevic, “Predicting stock trends using common financial indicators: A summary of fedcsis 2024 data science challenge held on knowledgepit.ai platform,” Proceedings of FedCSIS 2024, 2024.
C. Huertas and Q. Zhao, “On the irrelevance of machine learning algorithms and the importance of relativity,” in 2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Los Alamitos, CA, USA, jul 2023, pp. 16–21, IEEE Computer Society.
Abien Fred Agarap, “Deep learning using rectified linear units (relu),” 2019.
Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” 2017.