Logo PTI Logo FedCSIS

Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS)

Annals of Computer Science and Information Systems, Volume 39

Unconditional Token Forcing: Extracting Text Hidden Within LLM

, , , ,

DOI: http://dx.doi.org/10.15439/2024F4511

Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 621624 ()

Full text

Abstract. With the help of simple fine-tuning, one can artificially embed hidden text into large language models (LLMs). This text is revealed only when triggered by a specific query to the LLM. Two primary applications are LLM fingerprinting and steganography. In the context of LLM fingerprinting, a unique text identifier (fingerprint) is embedded within the model to verify licensing compliance. In the context of steganography, the LLM serves as a carrier for hidden messages that can be disclosed through a designated trigger.

References

  1. J. Xu, F. Wang, M. D. Ma, P. W. Koh, C. Xiao, and M. Chen, “Instructional fingerprinting of large language models,” arXiv preprint https://arxiv.org/abs/2401.12255, 2024. http://dx.doi.org/10.48550/arXiv.2401.12255
  2. W. Shi, A. Ajith, M. Xia, Y. Huang, D. Liu, T. Blevins, D. Chen, and L. Zettlemoyer, “Detecting pretraining data from large language models,” arXiv preprint https://arxiv.org/abs/2310.16789, 2024. http://dx.doi.org/10.48550/arXiv.2310.16789
  3. M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A. F. Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, and K. Lee, “Scalable extraction of training data from (production) language models,” arXiv preprint https://arxiv.org/abs/2311.17035, 2023. http://dx.doi.org/10.48550/arXiv.2311.17035
  4. J. Hoscilowicz, P. Popiołek, J. Rudkowski, J. Bieniasz, and A. Janicki, “Zurek steganography: from a soup recipe to a major llm security concern,” arXiv preprint https://arxiv.org/abs/2303.5637631, 2024. http://dx.doi.org/10.48550/arXiv.2303.5637631. [Online]. Available: https://github.com/j-hoscilowic/zurek-stegano
  5. Y. Yao, P. Wang, B. Tian, S. Cheng, Z. Li, S. Deng, H. Chen, and N. Zhang, “Editing large language models: Problems, methods, and opportunities,” arXiv preprint https://arxiv.org/abs/2305.13172, 2023. http://dx.doi.org/10.48550/arXiv.2305.13172
  6. Y. Wang, R. Song, R. Zhang, J. Liu, and L. Li, “Llsm: Generative linguistic steganography with large language model,” arXiv preprint https://arxiv.org/abs/2401.15656, 2024. http://dx.doi.org/10.48550/arXiv.2401.15656
  7. J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” arXiv preprint https://arxiv.org/abs/2301.10226, 2023. http://dx.doi.org/10.48550/arXiv.2301.10226
  8. J. Fairoze, S. Garg, S. Jha, S. Mahloujifar, M. Mahmoody, and M. Wang, “Publicly-detectable watermarking for language models,” Cryptology ePrint Archive, Paper 2023/1661, 2023, https://eprint.iacr.org/2023/1661. [Online]. Available: https://eprint.iacr.org/2023/1661
  9. Open Worldwide Application Security Project (OWASP), “OWASP Top 10 for Large Language Model Applications,” https://owasp.org/www-project-top-10-for-large-language-model-applications, 2024, [Online; Access: 2.06.2024].
  10. N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models,” in 30th USENIX Security Symposium (USENIX Security 21), 2021. http://dx.doi.org/10.48550/arXiv.2303.08774 pp. 2633–2650.
  11. N. Carlini, M. Nasr, J. Hayase, M. Jagielski, A. F. Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, and K. Lee, “Scalable extraction of training data from (production) language models,” arXiv preprint https://arxiv.org/abs/2311.17035, 2023. http://dx.doi.org/10.48550/arXiv.2311.17035
  12. T.-Y. Chang, J. Thomason, and R. Jia, “Do localization methods actually localize memorized data in llms? a tale of two benchmarks,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024. http://dx.doi.org/10.48550/arXiv.2401.02909 pp. 3190–3211.
  13. H. Song, J. Geiping, T. Goldstein et al., “Beyond memorization: Violating privacy via inference in large language models,” arXiv preprint https://arxiv.org/abs/2310.07298, 2023. http://dx.doi.org/10.48550/arXiv.2310.07298
  14. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” 2023.