Ausklasser - a classifier for German apprenticeship advertisements

Abstract. The German labor market system heavily relies on apprenticeships. Online Job Advertisements (OJAs) become an increasingly important data source to monitor labor market. Commonly, researchers use Information Extraction (IE) methods from Natural Language Processing (NLP) to extract entities such as skills and tasks from OJAs and draw conclusions about the labor market by aggregating them based on relevant variables such as occupations. Depending on the research question, it may be valuable to be able to exclude apprenticeships from these analyses, because apprentices will not be expected to have a specialized skill-set yet. As a result, Apprentice OJAs (AOJAs) may not reflect the dynamics in occupations and labor market as much as Regular OJAs (ROJAs). Furthermore, certain analyses may benefit from examining apprenticeships exclusively. This paper provides an efficient distilbert based text classification model for this task and discusses findings from an experiment pipeline designed to identify the best possible implementation strategy of this task given the current NLP toolkit.


