Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 11

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

Utilizing Multimedia Ontologies in Video Scene Interpretation via Information Fusion and Automated Reasoning

DOI: http://dx.doi.org/10.15439/2017F66

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 9198 ()

Full text

Abstract. There is an overwhelming variety of multimedia ontologies used to narrow the semantic gap, many of which are overlapping, not richly axiomatized, do not provide a proper taxonomical structure, and do not define complex correlations between concepts and roles. Moreover, not all ontologies used for image annotation are suitable for video scene representation, due to the lack of rich high-level semantics and spatiotemporal formalisms. This paper presents an approach for combining multimedia ontologies for video scene representation, while taking into account the specificity of the scenes to describe, minimizing the number of ontologies, complying with standards, minimizing reasoning complexity, and whenever possible, maintaining decidability.


  1. L. F. Sikos, “Ontology-based structured video annotation for content-based video retrieval via spatiotemporal reasoning,” In Bridging the Semantic Gap in Image and Video Analysis. Intelligent Systems Reference Library. H. Kwaśnicka and L. C. Jain, Eds., Cham: Springer, 2017
  2. A. Isaac and R. Troncy, “Designing and using an audio-visual description core ontology,” presented at the Workshop on Core Ontologies in Ontology Engineering, Northamptonshire, October 8, 2004.
  3. L. F. Sikos, “VidOnt: a core reference ontology for reasoning over video,” J. Inf. Telecommun, 2017.
  4. L. F. Sikos, “A novel approach to multimedia ontology engineering for automated reasoning over audiovisual LOD datasets,” in Intelligent information and database systems, N. T. Nguyễn, B. Trawiński, H. Fujita, and T.-P. Hong, Eds. Heidelberg: Springer,
  5. 2016, pp. 3–12. http://dx.doi.org/10.1007/978-3-662-49381-6_1 D. G. Lowe, “Object recognition from local scale-invariant features,” in Conf. Proc. 1999 IEEE Int. Conf. Comput. Vis., pp. 1150–1157. http://dx.doi.org/10.1109/ICCV.1999.790410
  6. N. Dalal, B. Triggs, and C. Schmid, “Human detection using oriented histograms of flow and appearance,” in Conf. Proc. 2006 Eur. Conf.
  7. Comput. Vis., pp. 428–441. http://dx.doi.org/10.1007/11744047_33 J.-H. Lee, G.-G. Lee, and W.-Y. Kim, “Automatic video summarizing tool using MPEG-7 descriptors for personal video recorder,” IEEE Trans. Consumer Electronics, vol. 49, pp. 742–749, 2003. doi: 10.1109/TCE.2003.1233813
  8. M. Bertini, A. Del Bimbo, and W. Nunziati, “Video clip matching using MPEG-7 descriptors and edit distance.” In Image and Video Retrieval, H. Sundaram, M. Naphade, J. R. Smith, and Y. Rui, Eds., Heidelberg: Springer, 2006, pp. 133–142.
  9. L. F. Sikos, Description Logics in Multimedia Reasoning. Cham: Springer, 2017. http://dx.doi.org/10.1007/978-3-319-54066-5
  10. S. Blöhdorn, K. Petridis, C. Saathoff, N. Simou, V. Tzouvaras, Y. Avrithis, S. Handschuh, Y. Kompatsiaris, S. Staab, and M. Strintzis, “Semantic annotation of images and videos for multimedia analysis,” in The Semantic Web: research and applications, A. Gómez-Pérez and J. Euzenat, Eds. Heidelberg: Springer, 2005, pp. 592–607. http://dx.doi.org/10.1007/11431053_40
  11. L. F. Sikos and D. M. W. Powers, “Knowledge-driven video information retrieval with LOD: from semi-structured to structured video metadata,” in Proc. 8th Workshop on Exploiting Semantic Annotations in Information Retrieval, New York, 2015, pp. 35–37. http://dx.doi.org/10.1145/2810133.2810141
  12. M. Horvat, N. Bogunović, and K. Ćosić, “STIMONT: a core ontology for multimedia stimuli description,” Multimed. Tools Appl., vol. 73, pp. 1103–1127, 2014. http://dx.doi.org/10.1007/s11042-013-1624-4
  13. L. F. Sikos, “RDF-powered semantic video annotation tools with concept mapping to Linked Data for next-generation video indexing: a comprehensive review. Multim. Tools Appl., vol. 76, pp. 14437–14460, 2016. http://dx.doi.org/10.1007/s11042-016-3705-7
  14. M. Abdel-Mottaleb, N. Dimitrova, L. Agnihotri, S. Dagtas, S. Jeannin, S. Krishnamachari, T. McGee, and G. Vaithilingam, “MPEG 7: a content description standard beyond compression,” in Proc. 42nd IEEE Midwest Symp. Circuits Syst., New York, 1999, pp. 770–777. http://dx.doi.org/10.1109/MWSCAS.1999.867750
  15. E. Simperl, “Reusing ontologies on the Semantic Web: a feasibility study,” Data Knowl. Eng., vol. 68, pp. 905–925. http://dx.doi.org/10.1016/j.datak.2009.02.002
  16. N. Simou, V. Tzouvaras, Y. Avrithis, G. Stamou, and S. Kollias, “A visual descriptor ontology for multimedia reasoning,” presented at the 6th International Workshop on Image Analysis for Multimedia Interactive Services, Montreux, April 13–15, 2005.
  17. L. F. Sikos, “A novel ontology for 3D semantics: from ontology-based 3D object indexing to content-based video retrieval. Int. J. Metadata, Semant. Ontol., 2017
  18. M. Y. K. Tani, A. Lablack, A. Ghomari, and I. M. Bilasco, “Events detection using a video surveillance ontology and a rule-based approach,” in Computer Vision – ECCV 2014 Workshops, L. Agapito, M. M. Bronstein, and C. Rother, Eds. Cham: Springer, 2014, pp. 299–308. http://dx.doi.org/10.1007/978-3-319-16181-5_21
  19. Sikos, L. F, “Spatiotemporal Reasoning for Complex Video Event Recognition in Content-Based Video Retrieval.” In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. Advances in Intelligent Systems and Computing, vol. 639. A. Hassanien, K. Shaalan, T. Gaber, and M. F. Tolba, Eds., Cham: Springer, 2017, pp. 704–713. http://dx.doi.org/10.1007/978-3-319-64861-3_66
  20. K.-S. Na, H. Kong, and M. Cho, “Multimedia information retrieval based on spatiotemporal relationships using description logics for the Semantic Web,” Int. J. Intell. Syst., vol. 21, pp. 679–692. http://dx.doi.org/10.1002/int.20153
  21. M. Cristani and N. Gabrielli, “Practical issues of description logics for spatial reasoning,” in Proc. 2009 AAAI Spring Symp., Menlo Park, CA, 2009, pp. 5–10.
  22. L. Bai, S. Lao, W. Zhang, G. J. F. Jones, and A. F. Smeaton, “Video semantic content analysis framework based on ontology combined MPEG-7,” in “Adaptive multimedia retrieval: retrieval, user, and semantics,” N. Boujemaa, M. Detyniecki, and A. Nürnberger, Eds. Heidelberg: Springer, 2008, pp. 237–250. http://dx.doi.org/10.1007/978-3-540-79860-6_19
  23. W. Liu, W. Xu, D. Wang, Z. Liu, X. Zhang, “A temporal description logic for reasoning about action in event,” Inf. Technol. J., vol. 11, pp. 1211–1218. http://dx.doi.org/10.3923/itj.2012.1211.1218
  24. N. Elleuch, M. Zarka, A. B. Ammar, and A. M. Alimi, “A fuzzy ontology-based framework for reasoning in visual video content analysis and indexing,” in Proc. 11th Int. Workshop Multim. Data Min., New York, 2011, Article No. 1. http://dx.doi.org/10.1145/2237827.2237828
  25. E. Elbaşi, “Fuzzy logic-based scenario recognition from video sequences,” J. Appl. Res. Technol., vol. 11, pp. 702–707. doi: 10.1016/S1665-6423(13)71578-5
  26. Netter, G., Lee, A., Womark, D. (Producers) and Lee, A. (Director), Life of Pi, 20th Century Fox, USA, 2012 [Motion picture, 2016 Ultra HD Blu-ray release].
  27. Vandenbussche, P.-Y., Atemezing, G. A., Poveda, M., and Vatant, B., “Linked Open Vocabularies (LOV): a gateway to reusable semantic vocabularies on the Web,” Semantic Web, vol. 8, pp. 437–452. http://dx.doi.org/10.3233/SW-160213
  28. Ter Horst, H. J., “Completeness, Decidability and Complexity of Entailment for RDF Schema and a Semantic Extension Involving the OWL Vocabulary,” J. Web Semant. Sci. Serv. Agents World Wide Web, vol. 3, pp. 79–115. http://dx.doi.org/10.1016/j.websem.2005.06.001