Efficient Discovery of Top-K Sequential Patterns in Event-Based Spatio-Temporal Data
Piotr Maciąg
DOI: http://dx.doi.org/10.15439/2018F19
Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 47–56 (2018)
Abstract. We consider the problem of discovering sequential patterns from event-based spatio-temporal data. The dataset is described by a set of event types and their instances. Based on the given dataset, the task is to discover all significant sequential patterns denoting the attraction relation between event types occurring in a pattern. Already proposed algorithms discover all significant sequential patterns based on the significance threshold, which minimal value is given by an expert. Due to the nature of described data and complexity of discovered patterns, it may be very difficult to provide reasonable value of significance threshold. We consider the problem of effective discovering K most important patterns in a given dataset (that is, discovering top-K patterns). We propose algorithms for unlimited memory environments. Developed algorithms have been verified using synthetic and real datasets.
References
- Z. Li, Spatiotemporal Pattern Mining: Algorithms and Applications. Cham: Springer International Publishing, 2014, pp. 283–306.
- Y. Huang, L. Zhang, and P. Zhang, “A framework for mining sequential patterns from spatio-temporal event data sets,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 4, pp. 433–448, April 2008.
- Uk pollutant deposition data. [Online]. Available: http://www.pollutantdeposition.ceh.ac.uk/data
- J. Han, J. Wang, Y. Lu, and P. Tzvetkov, “Mining top-k frequent closed patterns without minimum support,” in 2002 IEEE International Conference on Data Mining, 2002. Proceedings., 2002, pp. 211–218.
- J. Han, J. Pei, Y. Yin, and R. Mao, “Mining frequent patterns without candidate generation: A frequent-pattern tree approach,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53–87, Jan 2004. [Online]. Available: http://dx.doi.org/10.1023/B:DAMI.0000005258.31418.83
- P. Tzvetkov, X. Yan, and J. Han, “Tsp: mining top-k closed sequential patterns,” in Third IEEE International Conference on Data Mining, Nov 2003, pp. 347–354.
- X. Yan, J. Han, and R. Afshar, CloSpan: Mining: Closed Sequential Patterns in Large Datasets, 2003, pp. 166–177. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/1.9781611972733.15
- P. Terlecki and K. Walczak, “Efficient discovery of top-k min-imal jumping emerging patterns,” in Rough Sets and Current Trends in Computing: 6th International Conference, RSCTC 2008 Akron, OH, USA, October 23-25, Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 438–447.
- R. Srikant and R. Agrawal, Mining sequential patterns: Generalizations and performance improvements. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996, pp. 1–17. [Online]. Available: http://dx.doi.org/10.1007/BFb0014140
- R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proceedings of the Eleventh International Conference on Data Engineering, Mar 1995, pp. 3–14.
- M. J. Zaki, “Spade: An efficient algorithm for mining frequent vol. 113, no. D20, pp. n/a–n/a, 2008, d20119. [Online]. sequences,” Machine Learning, vol. 42, no. 1, pp. 31– Available: http://dx.doi.org/10.1029/2008JD010201 60, Jan 2001. [Online]. Available: https://doi.org/10.1023/A: 1007652502315
- P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, Y. S. Koh, and R. Thomas, “A survey of sequential pattern mining,” Data Science and Pattern Recognition, vol. 1, no. 1, pp. 54–77, 2017.
- C. H. Mooney and J. F. Roddick, “Sequential pattern mining – approaches and algorithms,” ACM Comput. Surv., vol. 45, no. 2, pp. 19:1–19:39, Mar. 2013. [Online]. Available: http://doi.acm.org/10.1145/2431211.2431218
- C. W. Wu, B.-E. Shie, V. S. Tseng, and P. S. Yu, “Mining top-k high utility itemsets,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’12. New York, NY, USA: ACM, 2012, pp. 78–86. [Online]. Available: http: //doi.acm.org/10.1145/2339530.2339546
- J. Yin, Z. Zheng, L. Cao, Y. Song, and W. Wei, “Efficiently mining top-k high utility sequential patterns,” in 2013 IEEE 13th International Conference on Data Mining, Dec 2013, pp. 1259–1264.
- F. Petitjean, T. Li, N. Tatti, and G. I. Webb, “Skopus: Mining top-k sequential patterns under leverage,” Data Min. Knowl. Discov., vol. 30, no. 5, pp. 1086–1111, Sep. 2016. [Online]. Available: http://dx.doi.org/10.1007/s10618-016-0467-9
- S. Shekhar, M. R. Evans, J. M. Kang, and P. Mohan, “Identifying patterns in spatial information: A survey of methods,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 193–214, 6 2011.
- P. Mohan, S. Shekhar, J. A. Shine, and J. P. Rogers, “Cascading spatio-temporal pattern discovery,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 11, pp. 1977–1992, Nov 2012.
- C. H. Yu, W. Ding, M. Morabito, and P. Chen, “Hierarchical spatio-temporal pattern discovery and predictive modeling,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 4, pp. 979–993, April 2016.
- N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung, “Mining, indexing, and querying historical spatiotemporal data,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’04. New York, NY, USA: ACM, 2004, pp. 236–245. [Online]. Available: http://doi.acm.org/10.1145/1014052.1014080
- Z. Li, B. Ding, J. Han, and R. Kays, “Swarm: Mining relaxed temporal moving object clusters,” Proc. VLDB Endow., vol. 3, no. 1-2, pp. 723–734, Sep. 2010. [Online]. Available: http://dx.doi.org/10.14778/1920841.1920934
- Z. Li, J. Wang, and J. Han, “eperiodicity: Mining event periodicity from incomplete observations,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1219–1232, May 2015.
- P. Yin, M. Ye, W.-C. Lee, and Z. Li, “Mining gps data for trajectory recommendation,” in Advances in Knowledge Discovery and Data Mining, V. S. Tseng, T. B. Ho, Z.-H. Zhou, A. L. P. Chen, and H.-Y. Kao, Eds. Cham: Springer International Publishing, 2014, pp. 50–61.
- Y. Li, J. Bailey, L. Kulik, and J. Pei, “Mining probabilistic frequent spatio-temporal sequential patterns with gap constraints from uncertain databases,” in 2013 IEEE 13th International Conference on Data Mining, Dec 2013, pp. 448–457.
- L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter, “Scalable sweeping-based spatial join,” in Proceedings of the 24rd International Conference on Very Large Data Bases, ser. VLDB ’98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998, pp. 570–581. [Online]. Available: http://dl.acm.org/citation.cfm?id=645924.671340
- Department for environment food and rural affairs archive data. [Online]. Available: https://uk-air.defra.gov.uk/data/
- M. R. Haylock, N. Hofstra, A. M. G. Klein Tank, E. J. Klok, P. D. Jones, and M. New, “A european daily high-resolution gridded data set of surface temperature and precipitation for 1950–2006,” Journal of Geophysical Research: Atmospheres,