Dealing with Imbalanced Data for GPS Trajectory Outlier Detection
Nguyen Van Chien, Van-Hau Nguyen, Le Van Quoc Anh
DOI: http://dx.doi.org/10.15439/2022R10
Citation: Proceedings of the 2022 Seventh International Conference on Research in Intelligent and Computing in Engineering, Vu Dinh Khoa, Shivani Agarwal, Gloria Jeanette Rincon Aponte, Nguyen Thi Hong Nga, Vijender Kumar Solanki, Ewa Ziemba (eds). ACSIS, Vol. 33, pages 69–74 (2022)
Abstract. Detecting abnormal GPS trajectories derived by the mobility of people, cars, buses, and taxis plays a crucial role in developing applications for intelligent transportation systems. Outlier detection based on classification models is among promising approaches but it faces the imbalanced data problem, where instances labeled as abnormal have a very low number of observations. In this paper, we propose a framework that employs methods to deal with imbalanced data to the problem of GPS trajectory outlier detection. Our experiments show that dealing with imbalanced data beforehand can improve the performance of outlier detection models.
References
- J. Yuan, Y. Zheng, X. Xie, and G. Sun, "Driving with knowledge from the physical world," in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, pp. 316-324.
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of artificial intelligence research, vol. 16, pp. 321-357, 2002.
- V. S. Spelmen and R. Porkodi, "A review on handling imbalanced data," in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 2018: IEEE, pp. 1-11.
- P. Nair and I. Kashyap, "Hybrid pre-processing technique for handling imbalanced data and detecting outliers for KNN classifier," in 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 2019: IEEE, pp. 460-464.
- H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), 2008: IEEE, pp. 1322-1328.
- L. Gautheron, A. Habrard, E. Morvant, and M. Sebban, "Metric learning from imbalanced data," in 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), 2019: IEEE, pp. 923-930.
- R. Mohammed, J. Rawashdeh, and M. Abdullah, "Machine learning with oversampling and undersampling techniques: overview study and experimental results," in 2020 11th international conference on information and communication systems (ICICS), 2020: IEEE, pp. 243-248.
- H. M. Nguyen, E. W. Cooper, and K. Kamei, "Borderline over-sampling for imbalanced data classification," in Proceedings: Fifth International Workshop on Computational Intelligence & Applications, 2009, vol. 2009, no. 1: IEEE SMC Hiroshima Chapter, pp. 24-29.
- V. Vapnik, The nature of statistical learning theory. Springer science & business media, 1999.
- C. Nguyen, T. Dinh, V.-H. Nguyen, N. P. Tran, and A. Le, "Histogram-based Feature Extraction for GPS Trajectory Clustering," EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, vol. 7, no. 22, 2020.
- J. Yuan, Y. Zheng, X. Xie, and G. Sun, "T-drive: Enhancing driving directions with taxi drivers' intelligence," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 220-232, 2011.