Exploring Linguistic and Cultural Differences in Online Job Advertisement Analysis for NLP Applications
Kai Krüger, Lea Grüner
DOI: http://dx.doi.org/10.15439/2024F5992
Citation: Communication Papers of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 41, pages 83–94 (2024)
Abstract. Online job advertisements (OJAs) have become a significant data source for analyzing labor market dynamics, offering insights into shifts within occupations, industry sectors, skills, and tasks. This paper investigates the cross-lingual and cultural differences in OJAs and their impact on the transferability of Natural Language Processing (NLP) methods and research scope. By analyzing OJAs from Austria, France, Germany, Italy, Spain, the UK, and the US, we point out substantial variations in document length, diversity metrics, syntactic structures, and content features such as salary information. These differences underscore the challenges in applying NLP methods universally across languages and cultures. Our findings emphasize the need for tailored approaches in NLP research and offer a starting point for developing standardized pipelines for analyzing text genres across different languages.
References
- Elodie Andrieu and Malgorzata Kuczera. Minimum Wage and Skills: Evidence from Job Vacancy Data. Tech. rep. The Productivity Institute, 2023.
- Hasan Ansary and Esmat Babaii. “A cross-cultural analysis of English newspaper editorials: A systemic-functional view of text for contrastive rhetoric research”. In: RELC Journal 40.2 (2009), pp. 211–249. http://dx.doi.org/10.1177/0033688209105867.
- Enghin Atalay, Sebastian Sotelo, and Daniel Tannenbaum. “The Geography of Job Tasks”. In: Journal of Labor Economics (2023). ISSN: 0734-306X. http://dx.doi.org/10.1086/725360.
- Enghin Atalay et al. “The Evolution of Work in the United States”. In: American Economic Journal: Applied Economics 12.2 (2020), pp. 1–34. ISSN: 1945-7782. http://dx.doi.org/10.1257/app.20190070.
- Iz Beltagy et al. “Zero- and Few-Shot NLP with Pretrained Language Models”. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. Ed. by Luciana Benotti et al. Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 32–37. http://dx.doi.org/10.18653/v1/2022.acl-tutorials.6.
- Carol Berkenkotter and Thomas N Huckin. Genre knowledge in disciplinary communication: Cognition/culture/power. Routledge, 2016.
- Łucja Biel. “Genre analysis and translation”. In: The Routledge handbook of translation studies and linguistics. Routledge, 2017, pp. 151–164. http://dx.doi.org/10.4324/9781315692845-11.
- Johanna Binnewitt and Timo Schnepf. “Join us to turn the wor (l) d greener!—Investigating online apprenticeship advertisements’ reference to environmental sustainability”. In: Zum Konzept der Nachhaltigkeit in Arbeit, Beruf und Bildung—Stand in Forschung und Praxis (2022).
- Lynne Bowker. “What does it take to work in the translation profession in Canada in the 21st century? Exploring a database of job advertisements”. In: Meta 49.4 (2004), pp. 960–972. DOI : 10.7202/009804ar.
- Marlis Buchmann et al. “Swiss job market monitor: A rich source of demand-side micro data of the labour market”. In: European Sociological Review 38.6 (2022), pp. 1001–1014. DOI : 10.1093/esr/jcac002.
- Bernadette Bullinger. “Companies on the runway: Fashion companies’ multimodal presentation of their organizational identity in job advertisements”. In: Multimodality, meaning, and institutions. Vol. 54. Emerald Publishing Limited, 2017, pp. 145–177. http://dx.doi.org/10.1108/S0733-558X2017000054B005.
- Angelica Toffano Seidel Calazans et al. “Software requirements analyst profile: A descriptive study of Brazil and Mexico”. In: 2017 IEEE 25th International Requirements Engineering Conference (RE). IEEE. 2017, pp. 204–212. http://dx.doi.org/10.1109/RE.2017.22.
- Cedefop. Online job vacancies and skills analysis – A Cedefop pan-European approach. Publications Office, 2019. http://dx.doi.org/doi/10.2801/097022.
- Sugat Chaturvedi, Kanika Mahajan, and Zahra Siddique. “Words matter: Gender, jobs and applicant behavior”. In: Jobs and Applicant Behavior (February 18, 2024) (2024).
- Chih-Hung Chung and Lu-Jia Chen. “Text mining for human resources competencies: Taiwan example”. In: European Journal of Training and Development 45.6/7 (2021), pp. 588–602. http://dx.doi.org/10.1108/EJTD-07-2018-0060.
- Pascaline Descy et al. “Towards a shared infrastructure for online job advertisement data”. In: Statistical Journal of the IAOS 35.4 (2019), pp. 669–675. http://dx.doi.org/10.3233/SJI-190547.
- Jacob Devlin et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Ed. by Jill Burstein, Christy Doran, and Thamar Solorio. Minneapolis, Minnesota: Association for Computational Linguistics, June 2019, pp. 4171–4186. http://dx.doi.org/10.18653/v1/N19-1423.
- Craig L Engstrom, James T Petre, and Elizabeth A Petre. “Rhetorical analysis of fast-growth businesses’ job advertisements: Implications for job search”. In: Business and professional communication quarterly 80.3 (2017), pp. 336–364. http://dx.doi.org/10.1177/2329490617723117.
- Luna Filipović. “The role of language in legal contexts: A forensic cross-linguistic viewpoint”. In: Law and Language: Current Legal Issues 15.19 (2013), pp. 328–343.
- Dan Friedman and Adji Bousso Dieng. “The vendi score: A diversity evaluation metric for machine learning”. In: Transactions on Machine Learning Research (2023).
- Muruganantham Ganesan, Suresh Paul Antony, and Esther Princess George. “Dimensions of job advertisement as signals for achieving job seeker’s application intention”. In: Journal of Management Development 37.5 (2018), pp. 425–438. http://dx.doi.org/10.1108/JMD-02-2017-0055.
- Danielle Gaucher, Justin Friesen, and Aaron C Kay. “Evidence that gendered wording in job advertisements exists and sustains gender inequality.” In: Journal of personality and social psychology 101.1 (2011), p. 109. http://dx.doi.org/10.1037/a0022530.
- Anna Giabelli et al. “NEO: A system for identifying new emerging occupation from job ads”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. 18. 2021, pp. 16035–16037.
- Ann-Sophie Gnehm. “Text zoning for job advertisements with bidirectional LSTMs”. In: (2018). http://dx.doi.org/10.5167/uzh-186646.
- Ann-Sophie Gnehm and Simon Clematide. “Text zoning and classification for job advertisements in German, French and English”. In: Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science. 2020, pp. 83–93. http://dx.doi.org/10.18653/v1/2020.nlpcss-1.10.
- Jürgen Hermes and Michael Schandock. “Stellenanzeigenanalyse in der Qualifikationsentwicklungsforschung”. In: Die Nutzung maschineller Lernverfahren zur Klassifikation von Textabschnitten. Bundesinstitut für Berufsbildung, Bonn (2016).
- Isabel García Izquierdo and Vicent Montalt i Resurrecció. “Translating into textual genres”. In: Linguistica Antverpiensia, new series–themes in translation studies 1 (2002).
- Ian T Jolliffe. Principal component analysis for special types of data. Springer, 2002. http://dx.doi.org/10.1007/0-387-22440-8_13.
- Mary Anne Kennan et al. “Changing workplace demands: What job ads tell us”. In: Aslib Proceedings. Vol. 58. 3. Emerald Group Publishing Limited. 2006, pp. 179–196. DOI : 10.1108/00012530610677228.
- Davud Kuhi and Manijheh Mojood. “Metadiscourse in newspaper genre: A cross-linguistic study of English and Persian editorials”. In: Procedia-Social and Behavioral Sciences 98 (2014), pp. 1046–1055. http://dx.doi.org/10.1016/j.sbspro.2014.03.515.
- Peter Kuhn and Kailing Shen. “Gender discrimination in job ads: Evidence from china”. In: The Quarterly Journal of Economics 128.1 (2013), pp. 287–336. http://dx.doi.org/10.1093/qje/qjs046.
- Faisal Ladhak et al. “WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization”. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics, Nov. 2020, pp. 4034–4048. http://dx.doi.org/10.18653/v1/2020.findings-emnlp.360.
- Moritz Laurer et al. “Less annotating, more classifying: Addressing the data scarcity issue of supervised machine learning with deep transfer learning and BERT-NLI”. In: Political Analysis 32.1 (2024), pp. 84–100. DOI : 10.1017/pan.2023.20.
- Antonio Lima, B Bakhshi, et al. “Classifying occupations using web-based job advertisements: an application to STEM and creative occupations”. In: Economic Statistics Centre of Excellence Discussion Paper 8 (2018).
- Pengfei Liu et al. “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing”. In: ACM Computing Surveys 55.9 (2023), pp. 1–35. DOI : 10.1145/3560815.
- Antoine Magron et al. “JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching”. In: arXiv preprint https://arxiv.org/abs/2402.03242 (2024).
- Hamideh Marefat and Shirin Mohammadzadeh. “Genre analysis of literature research article abstracts: A crosslinguistic, cross-cultural study”. In: Applied research on English language 2.2 (2013), pp. 37–50.
- Bjorn Melander. “Culture or genre? Issues in the interpretation of cross-cultural differences in scientific papers”. In: Genre studies in English for academic purposes 9 (1998), pp. 211–226.
- Frank van Meurs et al. “Reasons for Using English or the Local Language in the Genre of Job Advertisements: Insights From Interviews With Dutch Job Ad Designers”. In: IEEE Transactions on Professional Communication 58.1 (2015), pp. 86–105. DOI : 10.1109/TPC.2015.2423351.
- Vicent Montalt, Pilar Ezpeleta-Piorno, and Isabel García-Izquierdo. “The acquisition of translation competence through textual genre”. In: (2008).
- Julia Nee et al. “Linguistic justice as a framework for designing, developing, and managing natural language processing tools”. In: Big Data & Society 9.1 (2022), p. 20539517221090930. http://dx.doi.org/10.1177/20539517221090930.
- Catherine C Nickerson. “The usefulness of genre theory in the investigation of organizational communication across cultures”. In: Document Design 1.3 (1999), pp. 203–215. http://dx.doi.org/10.1075/dd.1.3.08nic.
- Luca Papariello. xlm-roberta-base-language-detection (Revision 9865598). 2024. http://dx.doi.org/10.57967/hf/2064.
- Alec Radford et al. “Language models are unsupervised multitask learners”. In: OpenAI blog 1.8 (2019), p. 9.
- David Rear. “Converging work skills? Job advertisements and generic skills in Japanese and Anglo–Saxon contexts”. In: Asian Business & Management 12 (2013), pp. 173–196. http://dx.doi.org/10.1057/abm.2012.41.
- Elena Senger et al. Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings. 2024. https://arxiv.org/abs/ 2402.05617 [cs.CL].
- SpaCy. https://spacy.io/models, Accessed April 2024.
- Yinghui Sun. “Genre formation in contexts: a crosslingual comparison of English MA thesis introductions.” In: Linguistics & the Human Sciences 10.3 (2014). http://dx.doi.org/10.1558/lhs.v10.3.29302.
- Anna Trosborg. “Text typology: Register, genre and text type”. In: Benjamins Translation Library 26 (1997), pp. 3–24.
- Yunxia Zhu. “A situated genre approach for business communication education in cross-cultural contexts”. In: The Routledge handbook of language and professional communication. Routledge, 2014, pp. 26–39.