Ausklasser - a classifier for German apprenticeship advertisements
Kai Krüger
DOI: http://dx.doi.org/10.15439/2023F8078
Citation: Communication Papers of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 37, pages 171–178 (2023)
Abstract. The German labor market system heavily relies on apprenticeships. Online Job Advertisements (OJAs) become an increasingly important data source to monitor labor market. Commonly, researchers use Information Extraction (IE) methods from Natural Language Processing (NLP) to extract entities such as skills and tasks from OJAs and draw conclusions about the labor market by aggregating them based on relevant variables such as occupations. Depending on the research question, it may be valuable to be able to exclude apprenticeships from these analyses, because apprentices will not be expected to have a specialized skill-set yet. As a result, Apprentice OJAs (AOJAs) may not reflect the dynamics in occupations and labor market as much as Regular OJAs (ROJAs). Furthermore, certain analyses may benefit from examining apprenticeships exclusively. This paper provides an efficient distilbert based text classification model for this task and discusses findings from an experiment pipeline designed to identify the best possible implementation strategy of this task given the current NLP toolkit.
References
- Iz Beltagy et al. “Zero- and Few-Shot NLP with Pretrained Language Models”. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 32–37. DOI : 10.18653/v1/2022.acl-tutorials.6. URL: https://aclanthology.org/2022.acl-tutorials.6.
- Phillip Brown and Manuel Souto-Otero. “The end of the credential society? An analysis of the relationship between education and the labour market using big data”. In: Journal of Education Policy 35.1 (2020), pp. 95–118. ISSN: 0268-0939. DOI : 10.1080/02680939.2018.1549752.
- Marlis Buchmann et al. “Swiss Job Market Monitor: A Rich Source of Demand-Side Micro Data of the Labour Market”. In: European Sociological Review (2022). ISSN: 0266-7215. http://dx.doi.org/10.1093/esr/jcac002.
- Statistsiches Bundesamt. Berufsbildungsstatistik. Accessed on July 31, 2023. URL : https://www-genesis.destatis.de/genesis/online?operation=previous&levelindex=2&step=2&titel=Ergebnis&levelid=1690804374122&acceptscookies=false#abreadcrumb.
- Statistsiches Bundesamt. Statistik der Studenten. Accessed on July 31, 2023. URL: https://www-genesis.destatis.de/genesis/online?sequenz=tabelleErgebnis& selectionname=21311-0010#abreadcrumb.
- Branden Chan, Stefan Schweter, and Timo Möller. “German’s Next Language Model”. In: Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 6788–6796. DOI : 10.18653/v1/2020.coling-main.598. URL: https://aclanthology.org/2020.coling-main.598.
- Jacob Devlin et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. In: CoRR abs/1810.04805 (2018). https://arxiv.org/abs/ 1810.04805. URL: http://arxiv.org/abs/1810.04805.
- Khristin Fabian and Ella Taylor-Smith. How are we positioning apprenticeships? A critical analysis of job adverts for degree apprentices. English. United Kingdom: Society for Research in Higher Education, 2021.
- Khristin Fabian et al. “Signalling new opportunities? An analysis of UK job adverts for degree apprenticeships”. In: Higher Education, Skills and Work-Based Learning ahead-of-print.ahead-of-print (2023). ISSN : 2042-3896. DOI : 10.1108/HESWBL-02-2022-0037.
- Ann-Sophie Gnehm, Eva Bühlmann, and Simon Clematide. “Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements”. In: Proceedings of the 13th Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association, 2022.
- Betül Güntürk-Kuhl, Philipp Martin, and Anna Cristin Lewalder. Die Taxonomie der Arbeitsmittel des BIBB: Revision 2018. 2018.
- Robert Helmrich et al. Berufsbildung 4.0 – Fachkräfte-qualifikationen und Kompetenzen für die digitalisierte Arbeit von morgen: Säule 3: Monitoring- und Projektionssystem zu Qualifizierungsnotwendigkeiten für die Berufsbildung 4.0. 1. Auflage. Vol. 214. Wissenschaftliche Diskussionspapiere. Leverkusen: Verlag Barbara Budrich, 2020. ISBN: 9783962082024. URL: https://www.bibb.de/dienst/veroeffentlichungen/de/publication/show/16688.
- Jakob de Lazzer and Martina Rengers. “Auswirkungen der Coronakrise auf den Arbeitsmarkt: Experimentelle Statistiken aus Daten von Online-Jobportalen”. In: (2021).
- Xueqing Liu and Chi Wang. An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models. 2021. https://arxiv.org/abs/ 2106.09204 [cs.CL].
- Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. 2019. https://arxiv.org/abs/ 1711.05101 [cs.LG].
- Mirjana Pejic-Bach et al. “Text mining of industry 4.0 job advertisements”. In: International Journal of Information Management 50 (2020), pp. 416–431. ISSN: 02684012. http://dx.doi.org/10.1016/j.ijinfomgt.2019.07.014.
- Victor Sanh et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. 2020. https://arxiv.org/abs/ 1910.01108 [cs.CL].
- Emma Strubell, Ananya Ganesh, and Andrew McCallum. Energy and Policy Considerations for Deep Learning in NLP. 2019. https://arxiv.org/abs/ 1906.02243 [cs.CL].
- Dennis Ulmer et al. “Experimental Standards for Deep Learning in Natural Language Processing Research”. In: Findings of the Association for Computational Linguistics: EMNLP 2022. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 2673–2692. URL : https://aclanthology.org/2022.findings-emnlp.196.
- Stefan Winnige and Alexandra Mergener. “Homeoffice-Boom im Zuge der Corona-Pandemie: Welche Potenziale zeichnen sich langfristig für akademisch und beruflich Qualifizierte ab?” In: Berufsbildung in Wissenschaft und Praxis 50.2 (2021), pp. 27–31.