A novel link prediction approach on clinical knowledge graphs utilizing graph structures

Jens Dörpinghaus; Tobias Hübenthal; Jennifer Faber

A novel link prediction approach on clinical knowledge graphs utilizing graph structures

Jens Dörpinghaus, Tobias Hübenthal, Jennifer Faber

DOI: http://dx.doi.org/10.15439/2022F36

Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 43–52 (2022)

Full text

Abstract. This paper presents a novel approach towards link prediction in clinical knowledge graphs. They play a central role for linking data from different data sources and are widely used in big data integration, especially for connecting data from different domains. We present a knowledge graph initially build on data from a clinical trial on Spinocerebellar ataxia type 3 (SCA3), which is a rare autosomal dominant inherited disorder. The contributions of this paper are (1) to create a feasible data representation schema capable of handling clinical imaging data in a knowledge graph and to (2) convert the data efficiently into a knowledge graph. Due to the limited amount of patient-nodes usually common methods for link prediction and graph embeddings are problematic and thus we will (3) present a novel approach for link prediction utilizing graph structures and Conditional Random Fields. In addition, we present (4) an extensive evaluation underlining the importance of (a) data management and (b) further research on link prediction using graph structures.

References

O. Dössel and T. M. Buzug, Medizinische Bildgebung. Walter de Gruyter GmbH & Co KG, 2014.
D. Peck, “Digital imaging and communications in medicine (dicom): a practical introduction and survival guide,” 2009.
P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: A survey,” Knowledge-Based Systems, vol. 151, pp. 78–94, 2018.
C. S. Burns, R. M. Shapiro, T. Nix, J. T. Huber et al., “Examining medline search query reproducibility and resulting variation in search results,” iConference 2019 Proceedings, 2019.
A. Callahan, V. Polony, J. D. Posada, J. M. Banda, S. Gombar, and N. H. Shah, “Ace: the advanced cohort engine for searching longitudinal patient records,” Journal of the American Medical Informatics Association, vol. 28, no. 7, pp. 1468–1479, 2021.
X. Xu, X. Xu, Y. Sun, X. Liu, X. Li, G. Xie, and F. Wang, “Predictive modeling of clinical events with mutual enhancement between longitudinal patient records and medical knowledge graph,” in 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 2021, pp. 777–786.
Hulpus, Ioana and Hayes, Conor and Karnstedt, Marcel and Greene, Derek, “Unsupervised Graph-Based Topic Labelling Using Dbpedia,” in Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, 2013, pp. 465–474.
J. Dörpinghaus and A. Stefan, “Knowledge extraction and applications utilizing context data in knowledge graphs,” in 2019 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 2019, pp. 265–272.
M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig et al., “Gene ontology: tool for the unification of biology,” Nature genetics, vol. 25, no. 1, p. 25, 2000.
D. S. Wishart, Y. D. Feunang, A. C. Guo, E. J. Lo, A. Marcu, J. R. Grant, T. Sajed, D. Johnson, C. Li, Z. Sayeeda et al., “Drugbank 5.0: a major update to the drugbank database for 2018,” Nucleic acids research, vol. 46, no. D1, pp. D1074–D1082, 2017.
K. Khan, E. Benfenati, and K. Roy, “Consensus qsar modeling of toxicity of pharmaceuticals to different aquatic organisms: Ranking and prioritization of the drugbank database compounds,” Ecotoxicology and environmental safety, vol. 168, pp. 287–297, 2019.
H. Cai, V. W. Zheng, and K. C.-C. Chang, “A comprehensive survey of graph embedding: Problems, techniques, and applications,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 9, pp. 1616–1637, 2018.
D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,” Journal of the American society for information science and technology, vol. 58, no. 7, pp. 1019–1031, 2007.
M. Xu, “Understanding graph embedding methods and their applications,” SIAM Review, vol. 63, no. 4, pp. 825–853, 2021.
M. Simonovsky and N. Komodakis, “Graphvae: Towards generation of small graphs using variational autoencoders,” in International conference on artificial neural networks. Springer, 2018, pp. 412–422.
W. Cukierski, B. Hamner, and B. Yang, “Graph-based features for supervised link prediction,” in The 2011 International Joint Conference on Neural Networks. IEEE, 2011, pp. 1237–1244.
J. Dörpinghaus, A. Stefan, B. Schultz, and M. Jacobs. (2020) Towards context in large scale biomedical knowledge graphs. [Online]. Available: http://arxiv.org/abs/2001.08392
J. Dörpinghaus, V. Weil, S. Schaaf, and T. Hübenthal, “An efficient approach towards the generation and analysis of interoperable clinical data in a knowledge graph,” in 2021 16th Conference on Computer Science and Intelligence Systems (FedCSIS). IEEE, 2021, pp. 59–68.
J. Frochte, Maschinelles Lernen: Grundlagen und Algorithmen in Python. Carl Hanser Verlag GmbH Co KG, 2019.
Z. Ghahramani, “An introduction to hidden markov models and bayesian networks,” in Hidden Markov models: applications in computer vision. World Scientific, 2001, pp. 9–41.
L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
L. Rabiner and B. Juang, “An introduction to hidden markov models,” ieee assp magazine, vol. 3, no. 1, pp. 4–16, 1986.
A. Blake, P. Kohli, and C. Rother, Markov random fields for vision and image processing. MIT press, 2011.
X. Zhu, “Cs838-1 advanced nlp: Conditional random fields,” Technical report, The University of Wisconsin Madison, 2007.