What was the Question? A Systematization of Information Retrieval and NLP Problems

Jens Dörpinghaus; Johannes Darms; Marc Jacobs

What was the Question? A Systematization of Information Retrieval and NLP Problems

Jens Dörpinghaus, Johannes Darms, Marc Jacobs

DOI: http://dx.doi.org/10.15439/2018F168

Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 471–478 (2018)

Full text

Abstract. In this paper we suggest a novel systematization of Information Retrieval and Natural Language Processing problems. Using this rather general description of problems we are able to discuss and proof the equivalence of some problems. We provide reformulations of well-known problems like Named Entity Recognition using our novel description and discuss further research and the expected outcome. We will discuss the relation of two problems, cluster labeling and search query finding. With these results we are able to provide a novel optimization approach to both problems. This novel systematization approach provides a yet unknown view generating new classes of problems in NLP. It brings application and algorithmic approaches together and offers a better description with concepts of theoretical computer science.

References

C. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008.
A. Clark, C. Fox, and S. Lappin, The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
M. Hagen, M. Michel, and B. Stein, “What was the query? generating queries for document sets with applications in cluster labeling,” in International Conference on Applications of Natural Language to Information Systems. Springer, 2015, pp. 124–133.
D. Babeanu, A. A. Gavrila, and V. Mares, “Strategic Outlines: Between Value And Digital Assets Management,” Annales Universitatis Apulensis: Series Oeconomica, vol. 11, no. 1, p. 318, 2009.
J. P. Hopkins, “Afterlife in the Cloud: Managing a Digital Estate,” Hastings Science and Technology Law Journal, vol. 5, p. 209, 2013.
H. Malissa, “Automation in und mit der Analytischen Chemie IV,” Fresenius’ Zeitschrift für analytische Chemie, vol. 256, no. 1, pp. 7–14, Feb. 1971.
M. Jacobs, S. Hodapp, and J. Dörpinghaus, “SDA: Towards a novel Knowledge Discovery Model for Information Systems,” in Proceedings of the 11th IADIS International Conference Information Systems 2018. IADIS, 2018, pp. 300–302.
J. Dörpinghaus, M. Jacobs, and J. Fluck, “Graph based Discovery in biomedical Information Systems connecting scientific Texts with structured Expoert Knowledge,” in Proceedings of the 11th IADIS International Conference Information Systems 2018. IADIS, 2018, pp. 297–299.
D. Suryanarayana, S. M. Hussain, P. Kanakam, and S. Gupta, “Natural language query to formal syntax for querying semantic web docu- ments,” in Progress in Advanced Computing and Intelligent Engineering. Springer, 2018, pp. 631–637.
D. Melo, I. P. Rodrigues, and V. B. Nogueira, “Semantic web search through natural language dialogues,” in Innovations, Developments, and Applications of Semantic Web and Information Systems. IGI Global, 2018, pp. 329–349.
P. Borkowski, K. Ciesielski, and M. A. Kłopotek, “Semantic classifier approach to document classification,” arXiv preprint https://arxiv.org/abs/1701.04292, 2017.
A. Kanavos, C. Makris, and E. Theodoridis, “Topic categorization of biomedical abstracts,” International Journal on Artificial Intelligence Tools, vol. 24, no. 01, p. 1540004, 2015.
D. Demner-Fushman and J. Lin, “Answer extraction, semantic clustering, and extractive summarization for clinical question answering,” in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006, pp. 841–848.
E. Younesi, L. Toldo, B. Müller, C. M. Friedrich, N. Novac, A. Scheer, M. Hofmann-Apitius, and J. Fluck, “Mining biomarker information in biomedical literature,” BMC medical informatics and decision making, vol. 12, no. 1, p. 148, 2012.
M. A. E. K. Emon, R. Karki, E. Younesi, M. Hofmann-Apitius et al., “Using drugs as molecular probes: A computational chemical biology approach in neurodegenerative diseases,” Journal of Alzheimer’s Disease, vol. 56, no. 2, pp. 677–686, 2017.
A. Iyappan, E. Younesi, A. Redolfi, H. Vrooman, S. Khanna, G. B. Frisoni, and M. Hofmann-Apitius, “Neuroimaging feature terminology: A controlled terminology for the annotation of brain imaging features,” Journal of Alzheimer’s Disease, vol. 59, no. 4, pp. 1153–1169, 2017.
J. Dörpinghaus, S. Schaaf, J. Fluck, and M. Jacobs, “Document clustering using a graph covering with pseudostable sets,” in Computer Science and Information Systems (FedCSIS), 2017 Federated Conference on. IEEE, 2017, pp. 329–338.
D. Hanisch, K. Fundel, H.-T. Mevissen, R. Zimmer, and J. Fluck, “ProMiner: rule-based protein and gene entity recognition.” BMC bioinformatics, vol. 6 Suppl 1, p. S14, 2005.
R. Grishman and B. Sundheim, “Message understanding conference-6: A brief history,” in COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, vol. 1, 1996.
D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvisticae Investigationes, vol. 30, no. 1, pp. 3–26, 2007.
L. Ratinov and D. Roth, “Design challenges and misconceptions in named entity recognition,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning, ser. CoNLL ’09. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009, pp. 147–155. [Online]. Available: http://dl.acm.org/citation.cfm?id=1596374.1596399
J. R. Finkel, A. Kleeman, and C. D. Manning, “Efficient, feature-based, conditional random field parsing,” Proceedings of ACL-08: HLT, pp. 959–967, 2008.
R. Klinger, C. M. Friedrich, J. Fluck, and M. Hofmann-Apitius, “Named entity recognition with combinations of conditional random fields,” in Proceedings of the second biocreative challenge evaluation workshop, 2007.
R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 160–167.
H. Schütze, C. D. Manning, and P. Raghavan, Introduction to information retrieval. Cambridge University Press, 2008, vol. 39.
C. Charras and T. Lecroq, Handbook of exact string matching algorithms. Citeseer, 2004.
G. Navarro, “A guided tour to approximate string matching,” ACM computing surveys (CSUR), vol. 33, no. 1, pp. 31–88, 2001.
R. Barzilay and M. Elhadad, “Using lexical chains for text summarization,” Advances in automatic text summarization, pp. 111–121, 1999.
Y. Gong and X. Liu, “Generic text summarization using relevance measure and latent semantic analysis,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2001, pp. 19–25.
J. Fluck, S. Madan, S. Ansari et al., “Belief-a semiautomatic workflow for bel network creation,” in Proc. 6th Int. Symp. Semant. Min. Biomed, 2014, pp. 109–113.
M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2. Association for Computational Linguistics, 2009, pp. 1003–1011.
J. Fluck, S. Madan, S. Ansari, A. T. Kodamullil, R. Karki, M. Rastegar-Mojarad, N. L. Catlett, W. Hayes, J. Szostak, J. Hoeng et al., “Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (bel),” Database, vol. 2016, p. baw113, 2016.
F. Rinaldi, T. R. Ellendorff, S. Madan, S. Clematide, A. Van der Lek, T. Mevissen, and J. Fluck, “Biocreative v track 4: a shared task for the extraction of causal network information using the biological expression language,” Database, vol. 2016, 2016.