## A Minimum Set-Cover Problem with several constraints

### Jens Dörpinghaus, Carsten Düing, Vera Weil

DOI: http://dx.doi.org/10.15439/2019F2

Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 115–122 (2019)

Abstract. A lot of problems in natural language processing can be interpreted using structures from discrete mathematics. In this paper we will discuss the search query and topic finding problem using a generic context-based approach. This problem can be described as a a Minimum Set Cover Problem with several constraints. The goal is to find a minimum covering of documents with the given context for a fixed weight function. The aim of this problem reformulation is a deeper understanding of both the hierarchical problem using union and cut as well as the non-hierarchical problem using the union. We thus choose a modeling using bipartite graphs and suggest a novel reformulation using an integer linear program as well as novel graph-theoretic approaches.

### References

- J. Dörpinghaus, J. Darms, and M. Jacobs, “What was the question? a systematization of information retrieval and nlp problems.” in 2018 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 2018.
- N. R. Coordinators, “Database resources of the national center for biotechnology information,” Nucleic acids research, vol. 45, no. Database issue, p. D12, 2017.
- D. Suryanarayana, S. M. Hussain, P. Kanakam, and S. Gupta, “Natural language query to formal syntax for querying semantic web documents,” in Progress in Advanced Computing and Intelligent Engineering. Springer, 2018, pp. 631–637.
- D. Melo, I. P. Rodrigues, and V. B. Nogueira, “Semantic web search through natural language dialogues,” in Innovations, Developments, and Applications of Semantic Web and Information Systems. IGI Global, 2018, pp. 329–349.
- J. Lin and W. J. Wilbur, “Pubmed related articles: a probabilistic topicbased model for content similarity,” BMC bioinformatics, vol. 8, no. 1, p. 423, 2007.
- D. Newman, S. Karimi, and L. Cavedon, “Using topic models to interpret medline’s medical subject headings,” in Australasian Joint Conference on Artificial Intelligence. Springer, 2009, pp. 270–279.
- D. Trieschnigg, P. Pezik, V. Lee, F. De Jong, W. Kraaij, and D. Rebholz-Schuhmann, “Mesh up: effective mesh text classification for improved document retrieval,” Bioinformatics, vol. 25, no. 11, pp. 1412–1418, 2009.
- Z. Lu, W. J. Wilbur, J. R. McEntyre, A. Iskhakov, and L. Szilagyi, “Finding query suggestions for pubmed,” in AMIA Annual Symposium Proceedings, vol. 2009. American Medical Informatics Association, 2009, p. 396.
- A. A. Bertossi, “Dominating sets for split and bipartite graphs,” Information processing letters, vol. 19, no. 1, pp. 37–40, 1984.
- M. Yannakakis and F. Gavril, “Edge dominating sets in graphs,” SIAM Journal on Applied Mathematics, vol. 38, no. 3, pp. 364–372, 1980.
- B. Korte, J. Vygen, B. Korte, and J. Vygen, Combinatorial optimization. Springer, 2012, vol. 2.
- P. Camerini, G. Galbiati, and F. Maffioli, “Complexity of spanning tree problems: Part i,” European Journal of Operational Research, vol. 5, no. 5, pp. 346 – 352, 1980. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0377221780901642
- E. Balas and M. W. Padberg, “On the set-covering problem,” Operations Research, vol. 20, no. 6, pp. 1152–1161, 1972.
- V. V. Vazirani, Approximation algorithms. Springer Science & Business Media, 2013.