Logo PTI Logo FedCSIS

Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS)

Annals of Computer Science and Information Systems, Volume 43

Towards the automated classification of German job titles according to KldB

,

DOI: http://dx.doi.org/10.15439/2025F4081

Citation: Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 43, pages 681686 ()

Full text

Abstract. The automated classification of occupational titles is a pivotal component of labor market analysis, survey research, and administrative data processing. This paper explores the viability of mapping German job titles to the German Classification of Occupations (KldB) by employing conventional machine learning methodologies to examine the challenges and limitations inherent in the data itself. To this end, the present study leverages two complementary datasets---manually annotated survey data and a dataset of occupational synonyms---to assess the performance of established classifiers under varying levels of taxonomic granularity. The methodological challenges inherent to this study include class imbalance, semantic ambiguity, and linguistic variability, which are all characteristics of German job title expressions. The findings of the study suggest that while coarse-level classifications can be addressed with relatively simple models and text representations, finer-grained distinctions remain challenging to resolve using title-based features alone. The findings indicate that more expressive models and richer contextual information may be necessary for high-resolution occupational coding.

References

  1. K. Hein and J. Dörpinghaus, “What is said about vet on social media in germany? trends, demands, and opinions.” in NORDYRK BOOK OF ABSTRACTS, 2024, p. 109.
  2. J. Binnewitt, “Recognising occupational titles in german parliamentary debates,” in Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), 2024, pp. 221–230.
  3. A. Fischer and J. Dörpinghaus, “Web mining of online resources for german labor market research and education: Finding the ground truth?” Knowledge, vol. 4, no. 1, pp. 51–67, 2024.
  4. N. R. Council, D. of Behavioral, S. Sciences, C. on Occupational Classification, and Analysis, “Work, jobs, and occupations: A critical review of the dictionary of occupational titles,” 1980.
  5. F. Javed, M. McNair, F. Jacob, and M. Zhao, “Towards a job title classification system,” arXiv preprint https://arxiv.org/abs/1606.00917, 2016.
  6. Y. Zhu, F. Javed, and O. Ozturk, “Document embedding strategies for job title classification.” in FLAIRS, 2017, pp. 221–226.
  7. I. Rahhal, K. M. Carley, I. Kassou, and M. Ghogho, “Two stage job title identification system for online job advertisements,” IEEE Access, vol. 11, pp. 19 073–19 092, 2023.
  8. F. Javed, Q. Luo, M. McNair, F. Jacob, M. Zhao, and T. S. Kang, “Carotene: A job title classification system for the online recruitment domain,” in 2015 IEEE First International Conference on Big Data Computing Service and Applications. IEEE, 2015, pp. 286–293.
  9. R. Baskaran and J. Müller, “Classification of german job titles in online job postings using the kldb-2010 taxonomy,” 2023.
  10. M. Schierholz, “An auxiliary classification with work activity descriptions for occupation coding,” AStA Wirtschafts-und Sozialstatistisches Archiv, vol. 12, pp. 285–298, 2018.
  11. A. Müller, “The implementation of the german classification of occupations 2010 in the iab job vacancy survey: documentation of the implementation process,” IAB-Forschungsbericht, Tech. Rep., 2014.
  12. V. P. V. Karanam, “Occupation coding using a pretrained language model by integrating domain knowledge,” 2022.
  13. P. Safikhani, H. Avetisyan, D. Föste-Eggers, and D. Broneske, “Automated occupation coding with hierarchical features: A data-centric approach to classification with pre-trained language models,” Discover Artificial Intelligence, vol. 3, no. 1, p. 6, 2023.
  14. R. Fechner, J. Dörpinghaus, and A. Firll, “Classifying industrial sectors from german textual data with a domain adapted transformer,” in 2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS). IEEE, 2023, pp. 463–470.
  15. R. Fechner and J. Dörpinghaus, “No train, no pain? assessing the ability of llms for text classification with no finetuning,” in Proceedings of the Position Papers of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), Belgrade, Serbia, 2024, pp. 8–11.
  16. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learning in python,” the Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011.