Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 8

Proceedings of the 2016 Federated Conference on Computer Science and Information Systems

Semantic Knowledge Extraction from Research Documents


DOI: http://dx.doi.org/10.15439/2016F221

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 439445 ()

Full text

Abstract. In this paper, we designed a knowledge supporting software system in which sentences and key words are extracted from large scale document database. This system consists of semantic representation scheme for natural language processing of the document database. Documents originally in a form of PDF are broken into triple-store data after pre-processing. The semantic representation is a hyper-graph which consists of collections of binary relations of ‘triples'. According to a certain rule based on user's interests, the system identify sentences and words of interests. The relationship of those extracted sentences is visualized in the form of network graph. A user of the system can introduce new rules to create additional relationship between sentences and words. For practical example, we chose a set of research papers related IoT for the analysis. Applying several rules concerning authors' indicated keywords as well as the system's specified discourse words, significant sentences are extracted from the papers.


  1. W. Frawley and G. Piatetsky‐Shapiro and C. Matheus, Knowledge Discovery in Databases: An Overview. AI Magazine, 1992,213‐228.
  2. Michalski, R.S.: Knowledge Mining: A Proposed New Direction, In: Invited talk at the Sanken Symposium on Data Mining and Semantic Web, Osaka University, Japan, March 10-11, 2003.
  3. Jérôme Darmont, chair. Proceedings of the 15 international conference on extraction and knowledge management, Luxembourg, 2015.
  4. Jerzy Grzymala-Busse, Ingo Schwab, Maria Pia di Buono, editor. Proceedings of the second on Big Data, Small Data, Linked Data and Open Data (ALLDATA 2016) workshop on Knowledge Extraction and Semantic Annotation. Portugal, 2016.
  5. International World Wide Web Conferences (WWW 2015) Second workshop on Knowledge Extraction from Text, Italy, 2015.
  6. XIV Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2011) workshop on Knowledge Extraction and Exploitation from semi-Structured Online Sources, Spain, 2011.
  7. 1st international Workshop on Knowledge Extraction and Consolidation from Social Media collocated with the 11th International Semantic Web Conference (ISWC), USA, 2012.
  8. F. Sebastiani, “Machine learning in Automated Text Categorization,” ACM Computing Surveys, vol. 1, no. 34, pp. 1–47, 2002.
  9. J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, 2000.
  10. Dion H. Goh and Rebecca P. Ang (2007), “An introduction to association rule mining: An application in counseling and help seeking behavior of adolescents”, Journal of Behavior Research Methods39 (2), Singapore, 259-266.
  11. Pak Chung Wong, Paul Whitney and Jim Thomas,“ Visualizing Association Rules for Text Mining”, “, International Conference, Pacific Northwest National Laboratory, USA, 1-5.
  12. C. Apte and F. Damerau and S. M. Weiss and Chid Apte and Fred Damerau and Sholom Weiss,” Text Mining with Decision Trees and Decision Rules”, In Proceedings of the Conference on Automated Learning and Discorery, Workshop 6: Learning from Text and the Web, 1998.
  13. J. Nightingal, “Digging for data that can change our world,” the Guardian, Jan 2006.
  14. Grishman R. (1997), “Information Extraction: Techniques and Challenges”, International Summer School, SCIE-97.
  15. Wilks Yorick (1997), “Information Extraction as a Core Language Technology”, International Summer School, SCIE-97.
  16. H. Karanikas, C. Tjortjis, and B. Theodoulidis, “An approach to text mining using information extraction,” in Proceedings of Workshop of Knowledge Management: Theory and Applications in Principles of Data Mining and Knowledge Discovery 4th European Conference, 2000.
  17. U. Nahm and R. Mooney, “Text mining with information extraction,” in Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002.
  18. Committee on Forecasting Future Disruptive Technologies; Air Force Studies Board; Division on Engineering and Physical Sciences; National Research Council. “Persistent Forecasting of Disruptive Technologies”,2009
  19. Cuhls K,” From Forecasting to Foresight Processes – New Participative Foresight Activities in Germany”, Journal of Forecasting, 23, pp 93–111 European Foresight Monitoring Network, available at http://www.efmn.info/.
  20. Johnston R,” The State and Contribution of Foresight: New Challenges”. In Proceedings of the Workshop on the Role of Foresight in the Selection of Research Policy Priorities’ IPTS, Seville.
  21. Weber, M., 'Foresight and Adaptive Planning as Complementary Elements in Anticipatory Policy-making: A Conceptual and Methodological Approach' In: Jan-Peter Voß, Dierk Bauknecht, René Kemp (eds.) Reflexive Governance For Sustainable Development Edward Elgar, pp. 189-22.
  22. FOREN 2001: A Practical Guide to Regional Foresight. FOREN network, European Commission Research Directorate General, STRATA programme.
  23. S. Jusoh and H. M. Alfawareh, “Techniques Techniques, Applications and Challenging Issue in Text Mining.” IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No 2, November 2012.
  24. Martin Rajman and Romaric Besancon,“Text mining- Knowledge extraction from unstructured textual data”.In:Proceedings of the 6th Conference of the International Federation of Classification Societies, Rome, 1998.
  25. Alani, Harith, Kim, Sanghee, Millard, David E., Weal, Mark J., Lewis, Paul H., Hall, Wendy and Shadbolt, Nigel R,“ Automatic Extraction of Knowledge from Web Documents”, Wendy; Lewis, Paul H. and Shadbolt, Nigel R. In, 2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, Sanibel Island, Florida, USA,20 - 23 Oct 2003.
  26. Peter Clark and Phil Harrison,“Large-Scale Extraction and Use of Knowledge From Text”, In: Proceedings of the fifth international conference on Knowledge capture( K-CAP '09),USA, 2009.
  27. Ankur P. Parikh; Hoifung Poon; Kristina Toutanova,” Grounded Semantic Parsing for Complex Knowledge Extraction”,In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2015.
  28. Toby Segaran, ‘Semantic Web Programming’, O’reilly, 2009.
  29. Shinyama, Y. (2010) PDFMiner: Python PDF parser and analyzer. Retrieved on 11 June 2015 http://www.unixuser.org/~euske/python/pdfminer/.
  30. D. Fensel, “Ontologies: Silver Bullet for Knowledge Management and e-Commerce”, Springer Verlag, Berlin, 2000.
  31. J. Ellson, E. R. Gansner, L. Koutsofios, S. C. North, and G. Woodhull. Graphviz — open source graph drawing tools. In P. Mutzel, M. Jünger, and S. Leipert, editors, Proc. 9th Int. Symp. Graph Drawing (GD 2001), number 2265 in Lecture Notes in Computer Science, LNCS, pages 483–484. Springer-Verlag, 2002.
  32. Stephen C. North, “Drawing graphs with NEATO”, NEATO User manual, April 26, 2004.
  33. Behrang QasemiZadeh, "Towards Technology Structure Mining from Scientific Literature”, 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010.
  34. Cimiano, P., Buitelaar, P., Völker, J.: Ontology construction. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 577–605.
  35. Mima, H., Ananiadou, S. & Matsushima, K. (2004) Design and Implementation of a Terminology-based literature mining and knowledge structuring system, in Proceedings of international workshop of Computational Terminology, CompuTerm, Coling, Geneva , Switzerland.
  36. Younggyun Hahm, Hee-Geun Yoon, Se-Young Park, Seong-Bae Park, Jungwon Cha, Dosam Hwang, Key-Sun Choi, Towards Ontology-based Knowledge Extraction from Web Data with Lexicalization of Ontology for Korean QA System, Submitted to NLIWoD, 2014.
  37. S. Amendola, R. Lodato, S. Manzari, C. Occhiuzzi, and G. Marrocco “RFID Technology for IoT-Based Personal Healthcare in Smart Spaces”, IEEE Internet of Things Journal, Vol. 1, No. 2, APRIL 2014.