Automatic Feature Engineering for Prediction of Dangerous Seismic Activities in Coal Mines
Eftim Zdravevski, Petre Lameski, Andrea Kulakov
DOI: http://dx.doi.org/10.15439/2016F152
Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 245–248 (2016)
Abstract. In this paper we present our submission to the AAIA'16 Data Mining Challenge, where the objective was to predict dangerous seismic events based on hourly aggregated readings from different sensor and recent mining expert assessment of the conditions in the mine. During the course of the competition we have exploited a framework for automatic feature extraction from time series data that did not require any manual tuning. Furthermore, we have analyzed the impact of overlapping of input data on model robustness. We argue that training an ensemble of classifiers with distinct (i.e. non-overlapping) chronological data rather than one classifier with all available data can produce more reliable and robust prediction models. By doing that, we were able to avoid overfitting and obtain the same score performance on the evaluation and test datasets, despite the significant data drift in the datasets.
References
- A. Janusz, M. Sikora, Ł. Wróbel, and D. Ślezak, “Predicting Dangerous Seismic Events: AAIA16 Data Mining Challenge,” in Proceedings of FedCSIS 2016. IEEE, 2016, in print September 2016.
- M. Meina, A. Janusz, K. Rykaczewski, D. Slezak, B. Celmer, and A. Krasuski, “Tagging firefighter activities at the emergency scene: Summary of aaia’15 data mining competition at knowledge pit,” in Computer Science and Information Systems (FedCSIS), 2015 Federated Conference on, Sept 2015. http://dx.doi.org/10.15439/2015F426 pp. 367–373.
- J. Lasek and M. Gagolewski, “The winning solution to the aaia’15 data mining competition: Tagging firefighter activities at a fire scene,” in Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. P. M. Ganzha, L. Maciaszek, Ed., vol. 5. IEEE, 2015. http://dx.doi.org/10.15439/2015F418 pp. 375–380.
- A. Zagorecki, “A versatile approach to classification of multivariate time series data,” in Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. P. M. Ganzha, L. Maciaszek, Ed., vol. 5. IEEE, 2015. http://dx.doi.org/10.15439/2015F419 pp. 407–410.
- E. Zdravevski, P. Lameski, R. Mingov, A. Kulakov, and D. Gjorgjevikj, “Robust histogram-based feature engineering of time series data,” in Computer Science and Information Systems (FedCSIS), 2015 Federated Conference on, ser. Annals of Computer Science and Information Systems, M. P. M. Ganzha, L. Maciaszek, Ed., vol. 5. IEEE, Sept 2015. http://dx.doi.org/10.15439/2015F420 pp. 381–388.
- M. Grzegorowski and S. Stawicki, “Window-based feature engineering for prediction of methane threats in coal mines,” in Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, ser. Lecture Notes in Computer Science, Y. Yao, Q. Hu, H. Yu, and J. W. Grzymala-Busse, Eds. Springer International Publishing, 2015, vol. 9437, pp. 452–463. ISBN 978-3-319-25782-2
- A. Liaw and M. Wiener, “Classification and regression by randomforest,” R news, vol. 2, no. 3, pp. 18–22, 2002.
- P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol. 63, no. 1, pp. 3–42, 2006. http://dx.doi.org/10.1007/s10994-006-6226-1
- M. Boullé, “Tagging fireworkers activities from body sensors under distribution drift,” in Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 5. IEEE, 2015. http://dx.doi.org/10.15439/2015F423 pp. 389–396.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
- M. Boullé, Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing: 15th International Conference, RSFDGrC 2015, Tianjin, China, November 20-23, 2015, Proceedings. Cham: Springer International Publishing, 2015, ch. Prediction of Methane Outbreak in Coal Mines from Historical Sensor Data under Distribution Drift, pp. 439–451. ISBN 978-3-319-25783-9