Logo PTI Logo rice

Proceedings of the 2022 Seventh International Conference on Research in Intelligent and Computing in Engineering

Annals of Computer Science and Information Systems, Volume 33

Analysis and Prediction for Air Quality Using Various Machine Learning Models

, , , , , ,

DOI: http://dx.doi.org/10.15439/2022R03

Citation: Proceedings of the 2022 Seventh International Conference on Research in Intelligent and Computing in Engineering, Vu Dinh Khoa, Shivani Agarwal, Gloria Jeanette Rincon Aponte, Nguyen Thi Hong Nga, Vijender Kumar Solanki, Ewa Ziemba (eds). ACSIS, Vol. 33, pages 8994 ()

Full text

Abstract. Air pollution has been a concern in recent years. Measuring the extent of pollution is important to know about the air quality. Previous research has used machine learning algorithms to forecast the Air Quality Index (AQI) in specific locations. Even though that research achieved quite reliable results, they still have some drawbacks that need to be taken into consideration, such as low accuracy or lack of data analysis.On a public dataset, we used Random Forest, XGBoost, and Neural Network to build a machine learning model for the purpose of making predictions about the air quality index (AQI) in a number of cities located in India. The performances of these models were evaluated by using their score errors, Root Mean Square Error (RMSE), and Coefficient Of Determination ($R^2$). This paper demonstrates the analysis of air pollutants from the dataset, which is an effective way to enhance the model's performance.


  1. D.-C. Nguyen, T. Duc-Tan, and D.-N. Tran, “Application of compressed sensing in effective power consumption of WSN for landslide scenario,” in 2015 Asia Pacific Conference on Multimedia and Broadcasting, 2015, pp. 1–5.
  2. D. T. Pham, D. C. Nguyen, V. V. Pham, B. C. Doan, and D. T. Tran, “Development of a Wireless Sensor Network for Indoor Air Quality Monitoring,” in The 2015 International Conference on Integrated Circuits, Design, and Verification, Vietnam, 2015, pp. 178–183.
  3. H. Gu, W. Yan, E. Elahi, and Y. Cao, “Air pollution risks human mental health: an implication of two-stages least squares estimation of interaction effects,” Environmental Science and Pollution Research, vol. 27, no. 2, pp. 2036–2043, 2020.
  4. S. Kumari and M. K. Jain, “A Critical Review on Air Quality Index,” Environmental Pollution, vol. 77, pp. 87–102, 2018.
  5. S. Lemeš, “Air Quality Index (AQI)—comparative study and assesment of an appropriate model For B&H,” in 2th Scientific/Research Symposium with International Participation ‘Metallic And Nonmetallic Materials. MNM, 2018, pp. 282–291.
  6. N. H. Van, P. Van Thanh, D. N. Tran, and D.-T. Tran, “A new model of air quality prediction using lightweight machine learning,” International Journal of Environmental Science and Technology, 2022. [Online]. Available: https://doi.org/10.1007/s13762-022-04185-w
  7. N. C. Minh, T. H. Dao, D. N. Tran, Q. H. Nguyen, T. T. Nguyen, and D. T. Tran, “Evaluation of Smartphone and Smartwatch Accelerometer Data in Activity Classification,” in 2021 8th NAFOSTED Conference on Information and Computer Science (NICS), 2021, pp. 33–38.
  8. N. T. Thu, T.-h. Dao, B. Q. Bao, D.-n. Tran, P. V. Thanh, and D.-T. Tran, “Real-Time Wearable-Device Based Activity recognition Using Machine Learning Methods,” International Journal of Computing and Digital Systems, vol. 12, no. 1, pp. 321–333, 2022. [Online]. Available: https://dx.doi.org/10.12785/ijcds/120126
  9. J. K. Sethi and M. Mittal, “A new feature selection method based on machine learning technique for air quality dataset,” Journal of Statistics and Management Systems, vol. 22, no. 4, pp. 697–705, 2019. [Online]. Available: https://doi.org/10.1080/09720510.2019.1609726
  10. H. Liu, Q. Li, D. Yu, and Y. Gu, “Air quality index and air pollutant concentration prediction based on machine learning algorithms,” Applied Sciences (Switzerland), vol. 9, no. 19, 2019.
  11. M. Castelli, F. M. Clemente, A. Popovič, S. Silva, and L. Vanneschi, “A Machine Learning Approach to Predict Air Quality in California,” Complexity, vol. 2020, pp. 1–23, 2020. [Online]. Available: https://doi.org/10.1155/2020/8049504
  12. P. Bhawan and E. A. Nagar, “Central Pollution Control Board,” pp. 1–93, 2019.
  13. R. R. Dickerson, D. C. Anderson, and X. Ren, “On the use of data from commercial NOx analyzers for air pollution studies,” Atmospheric Environment, vol. 214, no. June, p. 116873, 2019. [Online]. Available: https://doi.org/10.1016/j.atmosenv.2019.116873
  14. S. Zhang, “Nearest neighbor selection for iteratively kNN imputation,” Journal of Systems and Software, vol. 85, no. 11, pp. 2541–2552, 2012. [Online]. Available: http://dx.doi.org/10.1016/j.jss.2012.05.073
  15. Y. Liu, Y. Wang, and J. Zhang, “New machine learning algorithm: Random forest,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7473 LNCS, pp. 246–252, 2012.
  16. Y. Fang and J. Wang, “Selection of the number of clusters via the bootstrap method,” Computational Statistics and Data Analysis, vol. 56, no. 3, pp. 468–477, 2012. [Online]. Available: http://dx.doi.org/10.1016/j.csda.2011.09.003
  17. L. Yang, S. Liu, S. Tsoka, and L. G. Papageorgiou, “A regression tree approach using mathematical programming,” Expert Systems with Applications, vol. 78, pp. 347–357, 2017. [Online]. Available: http://dx.doi.org/10.1016/j.eswa.2017.02.013
  18. I. A. Ibrahim, T. Khatib, A. Mohamed, and W. Elmenreich, “Modeling of the output current of a photovoltaic grid-connected system using random forests technique,” Energy Exploration and Exploitation, vol. 36, no. 1, pp. 132–148, 2018.
  19. Y. Wang, Z. Pan, J. Zheng, L. Qian, and M. Li, “A hybrid ensemble method for pulsar candidate classification,” Astrophysics and Space Science, vol. 364, no. 8, 2019.
  20. D. Faraggi and R. Simon, “A neural network model for survival data,” Statistics in Medicine, vol. 14, no. 1, pp. 73–82, 1995.
  21. C. J. Willmott and K. Matsuura, “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance,” Climate Research, vol. 30, no. 1, pp. 79–82, 2005.
  22. A. H. Murphy, “The coefficients of correlation and determination as measures of performance in forecast verification,” Weather and Forecasting, vol. 10, no. 4, pp. 681–688, 1995.
  23. A. C. Müller and S. Guido, Introduction To Machine Learning With Python: A Guide For Data Scientists. O’Reilly Media, Inc., 2016.
  24. C. Song and X. Fu, “Research on different weight combination in air quality forecasting models,” Journal of Cleaner Production, vol. 261, p. 121169, 2020. [Online]. Available: https://doi.org/10.1016/j.jclepro.2020.121169