A Minimum Set-Cover Problem with several constraints

Jens Dörpinghaus; Carsten Düing; Vera Weil

A Minimum Set-Cover Problem with several constraints

Jens Dörpinghaus, Carsten Düing, Vera Weil

DOI: http://dx.doi.org/10.15439/2019F2

Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 115–122 (2019)

Full text

Abstract. A lot of problems in natural language processing can be interpreted using structures from discrete mathematics. In this paper we will discuss the search query and topic finding problem using a generic context-based approach. This problem can be described as a a Minimum Set Cover Problem with several constraints. The goal is to find a minimum covering of documents with the given context for a fixed weight function. The aim of this problem reformulation is a deeper understanding of both the hierarchical problem using union and cut as well as the non-hierarchical problem using the union. We thus choose a modeling using bipartite graphs and suggest a novel reformulation using an integer linear program as well as novel graph-theoretic approaches.

References

J. Dörpinghaus, J. Darms, and M. Jacobs, “What was the question? a systematization of information retrieval and nlp problems.” in 2018 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 2018.
N. R. Coordinators, “Database resources of the national center for biotechnology information,” Nucleic acids research, vol. 45, no. Database issue, p. D12, 2017.
D. Suryanarayana, S. M. Hussain, P. Kanakam, and S. Gupta, “Natural language query to formal syntax for querying semantic web documents,” in Progress in Advanced Computing and Intelligent Engineering. Springer, 2018, pp. 631–637.
D. Melo, I. P. Rodrigues, and V. B. Nogueira, “Semantic web search through natural language dialogues,” in Innovations, Developments, and Applications of Semantic Web and Information Systems. IGI Global, 2018, pp. 329–349.
J. Lin and W. J. Wilbur, “Pubmed related articles: a probabilistic topicbased model for content similarity,” BMC bioinformatics, vol. 8, no. 1, p. 423, 2007.
D. Newman, S. Karimi, and L. Cavedon, “Using topic models to interpret medline’s medical subject headings,” in Australasian Joint Conference on Artificial Intelligence. Springer, 2009, pp. 270–279.
D. Trieschnigg, P. Pezik, V. Lee, F. De Jong, W. Kraaij, and D. Rebholz-Schuhmann, “Mesh up: effective mesh text classification for improved document retrieval,” Bioinformatics, vol. 25, no. 11, pp. 1412–1418, 2009.
Z. Lu, W. J. Wilbur, J. R. McEntyre, A. Iskhakov, and L. Szilagyi, “Finding query suggestions for pubmed,” in AMIA Annual Symposium Proceedings, vol. 2009. American Medical Informatics Association, 2009, p. 396.
A. A. Bertossi, “Dominating sets for split and bipartite graphs,” Information processing letters, vol. 19, no. 1, pp. 37–40, 1984.
M. Yannakakis and F. Gavril, “Edge dominating sets in graphs,” SIAM Journal on Applied Mathematics, vol. 38, no. 3, pp. 364–372, 1980.
B. Korte, J. Vygen, B. Korte, and J. Vygen, Combinatorial optimization. Springer, 2012, vol. 2.
P. Camerini, G. Galbiati, and F. Maffioli, “Complexity of spanning tree problems: Part i,” European Journal of Operational Research, vol. 5, no. 5, pp. 346 – 352, 1980. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0377221780901642
E. Balas and M. W. Padberg, “On the set-covering problem,” Operations Research, vol. 20, no. 6, pp. 1152–1161, 1972.
V. V. Vazirani, Approximation algorithms. Springer Science & Business Media, 2013.