Abstract: Air pollution is a very serious problem facing urban the dwellers where various types of dangerous and poisonous air pollutants are discharged directly into the atmosphere on daily basis as a result of increased industrial and human activities due to increase in population and urbanization. These pollutants have serious and adverse impact on health and well-being of human beings and the environment. Air pollution prediction or forecasting can be adopted to predict or forecast the air quality index (AQI) of a city or area in advance before pollution occurs. This is helpful where air pollution monitors or stations are not installed or deployed. Awka Metropolis, the focus of this research, is a rapidly growing city due to the rising influx of people into it within the last ten years. The rapid population growth in Awka is as a result of it several important factors such as infrastructural, industrial and economic developments. Awka as a growing city has its own fair share of urbanization and environmental challenges. In this paper, ensemble technique of machine learning was used to develop a prediction model for air pollution one hour before time for PM2.5 (particulate matters) pollutant emissions within Awka Metropolis. A historical dataset consisting of about 12,958 one-minute of sensor readings for several air and noise pollutants such as PM1, PM2.5, PM10, TVOC (volatile organic compound), carbon dioxide, noise as well as historical weather or meteorological data comprising air temperature, humidity, pressure, light intensity were also used as input predictors to the model. Seven machine learning algorithms comprising about three traditional machine learning algorithms such as Linear Regression, Multi Layer Perceptron (MLP) Artificial Neural Network (ANN), Decision Tree, and four ensemble learning algorithms - Random Forest, XGBoost, AdaBoost, Extra Tree were used in the simulation modeling. Experimental results showed that the ensemble algorithms performed best in prediction accuracy having highest R2 values and lower RMSE (Root Mean Square Error) and MAE (Mean Absolute Error) scores. Random Forest and Extra Trees ensemble algorithms came first with the highest accuracy score (R2=0.9886), followed by XGBoost R2=0.9870, AdaBoost came fourth with R2=0.9854. Equally the ensemble learning algorithms have the lowest prediction residual errors when compared to the traditional machine learning algorithms. The experimental test-bed and programming was carried out in Anaconda, Python 3 and Python machine learning module Scikit-learn. Jupyter Notebook IDE was used as programming development and simulation environment.

Keywords: Air Pollution, Regression, PM2.5, Ensemble, Ensemble Algorithm, Machine Learning.

PDF | DOI: 10.17148/IJARCCE.2022.11107

Open chat
Chat with IJARCCE