Data Mining with Trusted Knowledge

Viktor Nekvapil

Data Mining with Trusted Knowledge

Viktor Nekvapil

DOI: http://dx.doi.org/10.15439/2017F216

Citation: Communication Papers of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 13, pages 9–16 (2017)

Full text

Abstract. In this paper, a new concept of Trusted Knowledge (TK) is introduced. Trusted Knowledge are data from trusted organizations such as ministries, statistical offices and so on which can replace a domain expert in the evaluation phase of the data mining task. Two approaches to applying Trusted Knowledge are introduced. The first one called ``Explanation system'' offers additional information relevant to the resulting patterns which can help the user to better understand results of the task. The second one called ``A/TK-formulas'' filters out the resulting patterns which are consequences of Trusted Knowledge and thus enables the user to concentrate on the interesting patterns. Conversely, the user can request to be shown only the resulting patterns which are consequences of TK to see which of them are in line with TK. Feasibility of the newly proposed framework is demonstrated in a case study.

References

Qiang, Y., Xindong, W., 2006. 10 Challenging Problems in Data Mining Research, International Journal of Information Technology & Decision Making, Vol. 5, No. 4, 2006, 597-604. http://dx.doi.org/10.1142/S0219622006002258
Mansingh, G., Osei-Bryson, K.-M., Reichgelt. H.: Using ontologies to facilitate post-processing of association rules by domain experts, Information Sciences, 181(3), 2011, 419–434. http://dx.doi.org/10.1016/j.ins.2010.09.027
Rauch, J., 2015. Formal Framework for Data Mining with Association Rules and Domain Knowledge – Overview of an Approach. Fundamenta Informaticae, 137 No 2, pp. 1–47. http://dx.doi.org/10.3233/FI-2015-1175
Silberschatz, A., Tuzhilin, A., 1995. On subjective measures of interestingness in knowledge discovery. In Proc. of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 275-281, 1995. DOI: 10.1.1.88.146
Padmanabhan, B., Tuzhilin, A., 1998. A belief-driven method for discovering unexpected patterns. In Proc. of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 94-100, 1998. DOI: 10.1.1.28.728
De Bie, T., 2013. Subjective interestingness in exploratory data mining. In Advances in Intelligent Data Analysis XII: 12th International Symposium, IDA 2013, London, UK, October 17-19, 2013. http://dx.doi.org/10.1007/978-3-642-41398-8_3
Paulheim, H., Ristoski, P., Mitichkin, E., Bizer, C., 2014. Data Mining with Background Knowledge from the Web. RapidMiner World, At Boston, USA. August 2014
Paulheim, H., 2012. Generating possible interpretations for statistics from linked open data, in: 9th Extended Semantic Web Conference, ESWC, 2012.
Z. Huang, H. Chen, T. Yu, H. Sheng, Z. Luo, Y. Mao, 2009. Semantic text mining with linked data, in: INC, IMS and IDC, 2009. NCM’09. Fifth International Joint Conference on, 2009, pp. 338–343. http://dx.doi.org/10.1109/NCM.2009.131
Tiddi I., d’Aquin M., Motta E. 2014. Dedalo: Looking for Clusters Explanations in a Labyrinth of Linked Data. In: Presutti V., d’Amato C., Gandon F., d’Aquin M., Staab S., Tordai A. (eds) The Semantic Web: Trends and Challenges. ESWC 2014. Lecture Notes in Computer Science, vol 8465. Springer, pp. 333-348. http://dx.doi.org/10.1007/978-3-319-07443-6_23
Czech Statistical Office (CSO), 2015. Výsledky sčítání lidu, domů a bytů 2011 (Census 2011 – in Czech) [online]. https://www.czso.cz/csu/czso/otevrena_data_pro_vysledky_scitani_lidu_domu_a_bytu_2011_-sldb_2011- Last modified on 14 th April 2015.
Buchanan, B. G., Smith, R. G., 1988. Fundamentals of expert systems. Annual review of computer science, 1988, 3.1: 23-58.
Rauch, Jan. Observational Calculi and Association Rules [online]. 1. ed. Berlin : Springer-Verlag, 2013. ISBN 978-3-642-11736-7. Available at: http://link.springer.com/book/10.1007/978-3-642-11737-4
Šimůnek, Milan. 2014. LISp-Miner Control Language – description of scripting language implementation. Journal of Systems Integration [online], Vol 5, No 2 (2014), p. 28-44. ISSN 1804-2724. URL: http://www.si-journal.org/index.php/JSI/article/view/193 http://dx.doi.org/http://dx.doi.org/10.20470/jsi.v5i2.193
Deloitte Real Index Q3 2016, (in Czech) [online]. Available at https://www2.deloitte.com/content/dam/Deloitte/cz/Documents/real-estate/Deloitte_Real_Index_Q3_2016_CZ.pdf
Czech Ministry of Regional Development. Stav hypotečních úvěrů v krajích za leden až prosinec 2016 (in Czech). Available at http://www.mmr.cz/getmedia/a5bd12f0-2322-4037-80d4-648163c28e50/Stav-hypotecnich-uveru-v-krajich-za-leden-az-prosinec-2016,-s-logem.pdf
Vanschoren, J. 2012. The Experiment Database for Machine Learning (demo) [electronic document]. Workshop PlanLearn 2012. Available from http://datamining.liacs.nl/planlearnpapers/ planlearn2012_submission_7.pdf
Rauch, Jan, Šimůnek, Milan. 2015. Data Mining with Histograms – A Case Study. In: Foundations of Intelligent Systems [online]. Lyon, 21.10.2015 – 23.10.2015. Cham : Springer International Publishing, 2015, s. 3–8. ISBN 978-3-319-25251-3. http://dx.doi.org/��10.1007/978-3-319-25252-0.