I chose to model the population of Jordan. Jordan is a small country (both area and number of residents), so there were not very administrative boundaries, which meant that the data was easier for the model to process for the whole country.
The figure showing population difference across the country along with the statistics showing actual vs. predicted population numbers indicates that the linear regression model underpredicts the population. According to the simple validation shown above, the most underpredicted areas were the urban areas and the most overpredicted areas were the rural areas. Because the first validation method was very simple, I tried MAE and RMSE.
Both the MAE and RMSE further emphasize the underprediction in the urban areas. Maybe the Random Forest model will better predict population.
The figure showing population difference across the country along with the statistics showing actual vs. predicted population numbers indicates that the random forest model underpredicts the population. According to the simple validation shown above, the most underpredicted areas were the urban areas and the most overpredicted areas were the rural areas. Because the first validation method was very simple, I tried MAE and RMSE.
The linear regression model just barely predicted population better than the random forest model according to the simple difference validation. Linear regression had a 8959955 difference and random forest had a 9000918 difference.
Both the MAE and RMSE further emphasize the underprediction in the urban areas. The Random Forest model did not perform better. If we zoom in on the most populated area, we can see the error more clearly.
The spatial variation in the prediction is most likely because the most important variable to the models was night time lights. There will be more concentrated light in cities, but that doesn’t account for building height. If there is a sky scraper, there may be more people living in that one square km than the model predicts because there isn’t as much light being emitted per person as in rural areas.