Logo PTI Logo FedCSIS

Position Papers of the 20th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 44

Towards Human-Robot Interaction in Agriculture Using Large Language Models

, ,

DOI: http://dx.doi.org/10.15439/2025F0373

Citation: Position Papers of the 20th Conference on Computer Science and Intelligence Systems, M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 44, pages 8792 ()

Full text

Abstract. Labor shortages and usability challenges limit the adoption of robotics in agriculture. This work explores how Large Language Models (LLMs) and Vision-Language Models (VLMs) can bridge this gap by enabling non-expert users to command robots using natural language. A modular system was developed to interpret instructions, execute tasks, and generate visual field reports. Evaluations in a simulated field showed that hybrid prompting strategies yielded reliable plans, while VLMs supported effective object detection and contextual reporting. This approach reduces entry barriers to robotics and promotes accessible, intelligent agricultural automation.

References

  1. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language Models are Few-Shot Learners, July 2020. https://arxiv.org/abs/2005.14165 [cs].
  2. Magnus Skatvedt Iversen. Norges Bondelag vil gjøre det lettere å få tak i sesongarbeidere, June 2024. Section: dk.
  3. Dongsheng Jiang, Yuchen Liu, Songlin Liu, Jin’e Zhao, Hao Zhang, Zhen Gao, Xiaopeng Zhang, Jin Li, and Hongkai Xiong. From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models, March 2024. https://arxiv.org/abs/2310.08825 [cs].
  4. OECD. Policies for the Future of Farming and Food in Norway. OECD Agriculture and Food Policy Reviews. OECD, March 2021.
  5. David Christian Rose and Jason Chilvers. Agriculture 4.0: Broadening Responsible Innovation in an Era of Smart Farming. Frontiers in Sustainable Food Systems, 2, December 2018. Publisher: Frontiers.
  6. Michael Ryan. Labour and skills shortages in the agro-food sector. OECD, January 2023.
  7. Bruno Siciliano and Oussama Khatib, editors. Springer Handbook of Robotics. Springer Handbooks. Springer International Publishing, Cham, 2016.
  8. Minghe Wang, Alexandra Kapp, Trever Schirmer, Tobias Pfandzelter, and David Bermbach. Exploring Influence Factors on LLM Suitability for No-Code Development of End User IoT Applications, May 2025. https://arxiv.org/abs/2505.04710 [cs].