File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Air pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data

TitleAir pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data
Authors
Issue Date2016
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhu, Y. [朱益萱]. (2016). Air pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801614.
AbstractAir pollution is a major problem in China, impacting billions of peoples’ health. Many cities have built on-the-ground monitoring stations to inform people the hourly concentration of major pollutants. However, these stations are geograph-ically sparse (e.g. only 36 monitoring stations in Beijing and 16 in Hong Kong), severely limiting evidence-based air quality decision-making, and leading to severe criticisms about the transparency and public relevance of the official Air Quality Index (AQI). Urban big data may fill this gap. By analyzing the causality among the spatio-temporal (ST) heterogeneous big data (e.g., air quality, meteorology and traffic, etc.), one can estimate fine-grained air pollution, forecast the future AQI, and identify the causal pathways of pollutants to inform public policy. In this thesis, three inter-related projects are investigated. The first project targets to estimate the fine-grained air pollution at locations not covered by monitoring stations. We proposed a Granger-causality-based model to deal with two challenges. The first challenge is due to the data diversity, i.e., there are different categories of urban dynamic data and some may be useless and even detrimental for the estimation. To overcome this, we extend the Granger causality model to the ST space to analyze all the causalities among urban dynamics in a consistent manner. Then by implementing non-causality test, we rule out the urban dynamics that do not “Granger” cause air pollution. The second challenge is due to the time complexity when processing the massive volume of data. We propose to discover the region of influence (ROI) by selecting data with the highest causality levels spatially and temporally. We verify our model with datasets in Shenzhen and Hong Kong, and determine it is indeed not necessary to process "all" the data. Better precision and time efficiency can be achieved when transforming "big data" into "the most influential data". The second project aims at forecasting future AQI with urban big data. We study the uncertainty caused by complex dependencies among urban data, and the over-fitting problems caused by training models. We compare the performance of the causality-based model with the well-used supervised learning models, such as linear regression, neural networks, as well as deep learning methods, and conclude that the causality-based model achieves relatively high and stable forecasting precision compared to other methodologies. The third project tries to identify the ST causal pathways for air pollutants, to inform public policy. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air pollution data; (2) the air pollution and meteorological data are usually huge; and (3) the causal pathways are complex in nature because of the interactions of multiple pollutants and the influence of en-vironmental factors. To accurately identify ST causal pathways for air pollutants, we present p-Causality, a novel pattern-aided causality analysis approach which combines the strengths of pattern mining and statistical modeling. The results based on three years’ worth of urban data in China show that our approach outperforms existing methods in time efficiency, inference accuracy, and interpretability.
DegreeDoctor of Philosophy
SubjectAir - Pollution
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/246689
HKU Library Item IDb5801614

 

DC FieldValueLanguage
dc.contributor.authorZhu, Yixuan-
dc.contributor.author朱益萱-
dc.date.accessioned2017-09-22T03:40:13Z-
dc.date.available2017-09-22T03:40:13Z-
dc.date.issued2016-
dc.identifier.citationZhu, Y. [朱益萱]. (2016). Air pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801614.-
dc.identifier.urihttp://hdl.handle.net/10722/246689-
dc.description.abstractAir pollution is a major problem in China, impacting billions of peoples’ health. Many cities have built on-the-ground monitoring stations to inform people the hourly concentration of major pollutants. However, these stations are geograph-ically sparse (e.g. only 36 monitoring stations in Beijing and 16 in Hong Kong), severely limiting evidence-based air quality decision-making, and leading to severe criticisms about the transparency and public relevance of the official Air Quality Index (AQI). Urban big data may fill this gap. By analyzing the causality among the spatio-temporal (ST) heterogeneous big data (e.g., air quality, meteorology and traffic, etc.), one can estimate fine-grained air pollution, forecast the future AQI, and identify the causal pathways of pollutants to inform public policy. In this thesis, three inter-related projects are investigated. The first project targets to estimate the fine-grained air pollution at locations not covered by monitoring stations. We proposed a Granger-causality-based model to deal with two challenges. The first challenge is due to the data diversity, i.e., there are different categories of urban dynamic data and some may be useless and even detrimental for the estimation. To overcome this, we extend the Granger causality model to the ST space to analyze all the causalities among urban dynamics in a consistent manner. Then by implementing non-causality test, we rule out the urban dynamics that do not “Granger” cause air pollution. The second challenge is due to the time complexity when processing the massive volume of data. We propose to discover the region of influence (ROI) by selecting data with the highest causality levels spatially and temporally. We verify our model with datasets in Shenzhen and Hong Kong, and determine it is indeed not necessary to process "all" the data. Better precision and time efficiency can be achieved when transforming "big data" into "the most influential data". The second project aims at forecasting future AQI with urban big data. We study the uncertainty caused by complex dependencies among urban data, and the over-fitting problems caused by training models. We compare the performance of the causality-based model with the well-used supervised learning models, such as linear regression, neural networks, as well as deep learning methods, and conclude that the causality-based model achieves relatively high and stable forecasting precision compared to other methodologies. The third project tries to identify the ST causal pathways for air pollutants, to inform public policy. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air pollution data; (2) the air pollution and meteorological data are usually huge; and (3) the causal pathways are complex in nature because of the interactions of multiple pollutants and the influence of en-vironmental factors. To accurately identify ST causal pathways for air pollutants, we present p-Causality, a novel pattern-aided causality analysis approach which combines the strengths of pattern mining and statistical modeling. The results based on three years’ worth of urban data in China show that our approach outperforms existing methods in time efficiency, inference accuracy, and interpretability.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshAir - Pollution-
dc.titleAir pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data-
dc.typePG_Thesis-
dc.identifier.hkulb5801614-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5801614-
dc.identifier.mmsid991043959797603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats