Utilization of Large Language Models for conformity assessment: Chances, Threats, and Mitigations
János Litzinger, Daniel Peters, Florian Thiel, Florian Tschorsch
DOI: http://dx.doi.org/10.15439/2025F7772
Citation: Position Papers of the 20th Conference on Computer Science and Intelligence Systems, M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 44, pages 61–68 (2025)
Abstract. Assessing the conformity of software in measurement instruments is a laborious process and a major bottle-neck in the process of developing new devices. Large Language Models have been shown to effectively handle complex tasks and have the ability to surpass humans with regard to speed and accuracy. However, integrating them into the technology stack can bring major security and privacy risks. This position paper performs a threat modeling in this context. By addressing the discovered confidentiality risks the paper draws a way for safely implementing Large Language Models as an essential tool in the process of conformity assessment.
References
- L. Gao et al., “The pile: An 800gb dataset of diverse text for language modeling,” 2020, https://arxiv.org/abs/2101.00027.
- T. Ucedavélez and M. M. Morana, Risk Centric Threat Modeling: Process for Attack Simulation and Threat Analysis. Hobekin: John Wiley & Sons, 2015.
- European Parliament and C. of the European Union, “Directive 2014/32/EU of the European Parliament and of the council,” 2014. [Online]. Available: https://eur-lex.europa.eu/eli/dir/2014/32/oj
- WELMEC Guide 7.2, “Software guide (EU measuring instruments directive 2014/32/EU,” 2023. [Online]. Available: https://www.welmec.org/welmec/documents/guides/7.2/2023/WELMEC_Guide_7.2_2023.pdf
- K. Spärck Jones, “A statistical interpretation of termspecificity and its application in retrieval,” Journal of Documentation, vol. 28, no. 1, pp. 11–21, 1972. https://dx.doi.org/10.1108/eb026526
- A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon et al., Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- A. Wang et al., “Superglue: A stickier benchmark for general-purpose language understanding systems,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf
- A. Dubey et al., “The llama 3 herd of models,” 2024, https://arxiv.org/abs/2407.21783.
- OpenAI et al., “Gpt-4 technical report,” 2024, https://arxiv.org/abs/2303.08774.
- Gemini Team et al., “Gemini: A family of highly capable multimodal models,” 2024, https://arxiv.org/abs/2312.11805.
- P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” 2021, https://arxiv.org/abs/2005.11401.
- L. Kohnfelder and P. Garg, “The threats to our products,” Microsoft Interface, 1999. [Online]. Available: https://adam.shostack.org/microsoft/The-Threats-To-Our-Products.docx
- M. Deng, K. Wuyts, R. Scandariato, B. Preneel, and W. Joosen, “A privacy threat analysis framework: supporting the elicitation and fulfillment of privacy requirements,” Requirements Engineering, vol. 16, no. 1, pp. 3–32, 2011. https://dx.doi.org/10.1007/s00766-010-0115-7
- M. Esche and F. Thiel, “Software risk assessment for measuring instruments in legal metrology,” in Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 5. IEEE, 2015. https://dx.doi.org/10.15439/2015F127 pp. 1113–1123.
- Z. Ji et al., “Survey of hallucination in natural language generation,” ACM Comput. Surv., vol. 55, no. 12, Mar. 2023. https://dx.doi.org/10.1145/3571730
- R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “ Membership Inference Attacks Against Machine Learning Models ,” in 2017 IEEE Symposium on Security and Privacy (SP). Los Alamitos, CA, USA: IEEE Computer Society, May 2017. https://dx.doi.org/10.1109/SP.2017.41. ISSN 2375-1207 pp. 3–18.
- N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks,” in 28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019. ISBN 978-1-939133-06-9 pp. 267–284. [Online]. Available: https://www.usenix.org/conference/usenixsecurity19/presentation/carlini
- N. Carlini et al., “Extracting training data from large language models,” 2021, https://arxiv.org/abs/2012.07805.
- F. Mireshghallah, K. Goyal, A. Uniyal, T. Berg-Kirkpatrick, and R. Shokri, “Quantifying privacy risks of masked language models using membership inference attacks,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y. Goldberg, Z. Kozareva, and Y. Zhang, Eds. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022. https://dx.doi.org/10.18653/v1/2022.emnlp-main.570 pp. 8332–8347.
- E. Lehman, S. Jain, K. Pichotta, Y. Goldberg, and B. Wallace, “Does BERT pretrained on clinical notes reveal sensitive data?” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Toutanova et al., Eds. Online: Association for Computational Linguistics, Jun. 2021. https://dx.doi.org/10.18653/v1/2021.naacl-main.73 pp. 946–959.
- M. X. Chen et al., “Gmail smart compose: Real-time assisted writing,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’19. New York, NY, USA: Association for Computing Machinery, 2019. https://dx.doi.org/10.1145/3292500.3330723. ISBN 9781450362016 p. 2287–2295.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019. [Online]. Available: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- W. Shi et al., “Detecting pretraining data from large language models,” in The Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=zWqr3MQuNs
- S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in 2018 IEEE 31st Computer Security Foundations Symposium (CSF), 2018. https://dx.doi.org/10.1109/CSF.2018.00027 pp. 268–282.
- J. Mattern, F. Mireshghallah, Z. Jin, B. Schoelkopf, M. Sachan, and T. Berg-Kirkpatrick, “Membership inference attacks against language models via neighbourhood comparison,” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul. 2023. https://dx.doi.org/10.18653/v1/2023.findings-acl.719 pp. 11 330–11 343.
- N. Lukas, A. Salem, R. Sim, S. Tople, L. Wutschitz, and S. ZanellaBéguelin, “Analyzing leakage of personally identifiable information in language models,” in 2023 IEEE Symposium on Security and Privacy (SP), 2023. https://dx.doi.org/10.1109/SP46215.2023.10179300 pp. 346–363.
- K. Tirumala, A. Markosyan, L. Zettlemoyer, and A. Aghajanyan, “Memorization without overfitting: Analyzing the training dynamics of large language models,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 38 274–38 290. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/fa0509f4dab6807e2cb465715bf2d249-Paper-Conference.pdf
- F. Mireshghallah, A. Uniyal, T. Wang, D. Evans, and T. BergKirkpatrick, “Memorization in nlp fine-tuning methods,” 2022, https://arxiv.org/abs/2205.12506.
- N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramèr, “Membership inference attacks from first principles,” in 2022 IEEE Symposium on Security and Privacy (SP), 2022. https://dx.doi.org/10.1109/SP46214.2022.9833649 pp. 1897–1914.
- N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramèr, and C. Zhang, “Quantifying memorization across neural language models,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023, 2023. [Online]. Available: https://openreview.net/forum?id=TatRHT_1cK
- S. Biderman et al., “Emergent and predictable memorization in large language models,” in Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 28 072–28 090. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2023/file/59404fb89d6194641c69ae99ecdf8f6d-Paper-Conference.pdf
- N. Kandpal, E. Wallace, and C. Raffel, “Deduplicating training data mitigates privacy risks in language models,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162. PMLR, 17–23 Jul 2022, pp. 10 697–10 707. [Online]. Available: https://proceedings.mlr.press/v162/kandpal22a.html
- K. Lee et al., “Deduplicating training data makes language models better,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), S. Muresan, P. Nakov, and A. Villavicencio, Eds. Dublin, Ireland: Association for Computational Linguistics, May 2022. https://dx.doi.org/10.18653/v1/2022.acl-long.577 pp. 8424–8445.
- M. Abadi et al., “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’16. New York, NY, USA: Association for Computing Machinery, 2016. https://dx.doi.org/10.1145/2976749.2978318 p. 308–318.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Proceedings of the Third Conference on Theory of Cryptography, ser. TCC’06. Berlin, Heidelberg: Springer-Verlag, 2006. https://dx.doi.org/10.1007/11681878_14 p. 265–284.
- D. Ippolito et al., “Preventing generation of verbatim memorization in language models gives a false sense of privacy,” in Proceedings of the 16th International Natural Language Generation Conference, C. M. Keet, H.-Y. Lee, and S. Zarrieß, Eds. Prague, Czechia: Association for Computational Linguistics, Sep. 2023. https://dx.doi.org/10.18653/v1/2023.inlg-main.3 pp. 28–53.
- A. Ginart, L. van der Maaten, J. Zou, and C. Guo, “Submix: Practical private prediction for large-scale language models,” 2022, https://arxiv.org/abs/2201.00971.