Reranking for a Polish Medical Search Engine

Jakub Pokrywka; Krzysztof Jassem; Piotr Wierzchoń; Piotr Badylak; Grzegorz Kurzyp

Reranking for a Polish Medical Search Engine

Jakub Pokrywka, Krzysztof Jassem, Piotr Wierzchoń, Piotr Badylak, Grzegorz Kurzyp

DOI: http://dx.doi.org/10.15439/2023F1627

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 297–302 (2023)

Full text

Abstract. Healthcare professionals are often overworked, which may impair their efficacy. Text search engines may facilitate their work. However, before making health decisions, it is important for a medical professional to consult verified sources rather than unknown web pages. In this work, we present our approach for creating a text search engine based on verified resources in the Polish language, dedicated to medical workers. This consists of collecting and comprehensively analyzing texts annotated by medical professionals and evaluating various neural reranking models. During the annotation process, we differentiate between an abstract information need and a search query. Our study shows that even within a group of trained medical specialists there is extensive disagreement on the relevance of a document to the information need. We prove that available multilingual rerankers trained in the zero-shot setup are effective for the Polish language in searches initiated by both natural language expressions and keyword search queries.

References

I. Portoghese, M. Galletta, R. C. Coppola, G. Finco, and M. Campagna, “Burnout and workload among health care workers: the moderating role of job control,” Safety and Health at Work, vol. 5, no. 3, pp. 152–157, 2014.
P. Lewis, M. Ott, J. Du, and V. Stoyanov, “Pretrained language models for biomedical and clinical tasks: Understanding and extending the state-of-the-art,” in Proceedings of the 3rd Clinical Natural Language Processing Workshop, (Online), pp. 146–157, Association for Computational Linguistics, Nov. 2020.
H.-C. Shin, Y. Zhang, E. Bakhturina, R. Puri, M. Patwary, M. Shoeybi, and R. Mani, “BioMegatron: Larger biomedical domain language model,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Online), pp. 4700–4706, Association for Computational Linguistics, Nov. 2020.
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “BioBERT: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, pp. 1234–1240, 09 2019.
Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, “Domain-specific language model pretraining for biomedical natural language processing,” 2020.
R. Luo, L. Sun, Y. Xia, T. Qin, S. Zhang, H. Poon, and T.-Y. Liu, “Biogpt: generative pre-trained transformer for biomedical text generation and mining,” Briefings in Bioinformatics, vol. 23, no. 6, 2022.
K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl, P. Payne, M. Seneviratne, P. Gamble, C. Kelly, N. Scharli, A. Chowdhery, P. Mansfield, B. A. y. Arcas, D. Webster, G. S. Corrado, Y. Matias, K. Chou, J. Gottweis, N. Tomasev, Y. Liu, A. Rajkomar, J. Barral, C. Semturs, A. Karthikesalingam, and V. Natarajan, “Large language models encode clinical knowledge,” 2022.
Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, “Domain-specific language model pretraining for biomedical natural language processing,” ACM Trans. Comput. Healthcare, vol. 3, oct 2021.
Q. Jin, Z. Yuan, G. Xiong, Q. Yu, H. Ying, C. Tan, M. Chen, S. Huang, X. Liu, and S. Yu, “Biomedical question answering: A survey of approaches and challenges,” ACM Comput. Surv., vol. 55, jan 2022.
Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,” ACM Comput. Surv., nov 2022. Just Accepted.
Y. Xiao and W. Y. Wang, “On hallucination and predictive uncertainty in conditional language generation,” arXiv preprint https://arxiv.org/abs/2103.15025, 2021.
N. Dziri, S. Milton, M. Yu, O. Zaiane, and S. Reddy, “On the origin of hallucinations in conversational models: Is it the datasets or the models?,” arXiv preprint https://arxiv.org/abs/2204.07931, 2022.
N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, “Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models,” 2021.
L. Bonifacio, V. Jeronymo, H. Q. Abonizio, I. Campiotti, M. Fadaee, R. Lotufo, and R. Nogueira, “mmarco: A multilingual version of the ms marco passage ranking dataset,” 2021.
P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. McNamara, B. Mitra, T. Nguyen, et al., “Ms marco: A human generated machine reading comprehension dataset,” arXiv preprint https://arxiv.org/abs/1611.09268, 2016.
A. Bondarenko, E. Shirshakova, M. Driker, M. Hagen, and P. Braslavski, “Misbeliefs and biases in health-related searches,” in Proceedings of the 30th ACM International Conference on Information Knowledge Management, CIKM ’21, (New York, NY, USA), p. 2894–2899, Association for Computing Machinery, 2021.
D. Cohen, K. Du, B. Mitra, L. Mercurio, N. Rekabsaz, and C. Eickhoff, “Inconsistent ranking assumptions in medical search and their downstream consequences,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, (New York, NY, USA), p. 2572–2577, Association for Computing Machinery, 2022.
N. Rekabsaz, O. Lesota, M. Schedl, J. Brassey, and C. Eickhoff, “Tripclick: The log files of a large health web search engine,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, (New York, NY, USA), p. 2507–2513, Association for Computing Machinery, 2021.
J. Jimmy, G. Zuccon, J. Palotti, L. Goeuriot, and L. Kelly, “Overview of the clef 2018 consumer health search task,” International Conference of the Cross-Language Evaluation Forum for European Languages, vol. 2125, 2018.
K. Roberts, D. Demner-Fushman, E. M. Voorhees, W. R. Hersh, S. Bedrick, A. J. Lazar, S. Pant, and F. Meric-Bernstam, “Overview of the trec 2019 precision medicine track,” in Proceedings of the Text Retrieval Conference (TREC), vol. 1250, NIH Public Access, 2019.
M. Miłkowski and P. IFiS, “Morfologik,” Web document: http://morfologik. blogspot. com, 2007.
Y. Wang, L. Wang, Y. Li, D. He, and T.-Y. Liu, “A theoretical analysis of ndcg type ranking measures,” in Conference on learning theory, pp. 25–54, PMLR, 2013.
R. Mroczkowski, P. Rybak, A. Wróblewska, and I. Gawlik, “HerBERT: Efficiently pretrained transformer-based language model for Polish,” in Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, (Kiyv, Ukraine), pp. 1–10, Association for Computational Linguistics, Apr. 2021.
P. Rybak, R. Mroczkowski, J. Tracz, and I. Gawlik, “Klej: Comprehensive benchmark for polish language understanding,” arXiv preprint https://arxiv.org/abs/2005.00630, 2020.
L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mT5: A massively multilingual pre-trained text-to-text transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Online), pp. 483–498, Association for Computational Linguistics, June 2021.
N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 11 2019.
O. Khattab and M. Zaharia, “Colbert: Efficient and effective passage search via contextualized late interaction over bert,” in Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp. 39–48, 2020.