Machine Learning for Extreme Event Detection.
PhD Student: Olivier Graffeuille. The monitoring of water quality is an important field with impacts on local ecosystems, aquaculture and human health. An efficient way of monitoring water quality is to estimate the concentration of water constituents using remote sensing data, such as satellite data. However, this task is difficult, due to (1) the limited labels available to train models, (2) its ill-posed nature whereby different combinations of water constituents can combine to produce the same optical signal, and (3) the limited transferability of models between water bodies with different characteristics. Our research aims to develop machine learning techniques to overcome these challenges.This project is part of MBIE Taiao Programme https://taiao.ai/.
Supervisors: Assoc Prof Yun Sing Koh, Dr Jorg Wicker, Dr Moritz K Lehmann (Xerra, Waikato).
Keywords: Water Quality, Semi-supervised learning, Transfer Learning
Adaptive Predictive System for Life-long Learning on Data Streams.
PhD Student: Ben Halstead. Pollution from wood burners has serious health implications for residents of rural towns, even in developed countries. Monitoring the level of airborne particulate matter, PM2.5, in these areas often requires making inferences about missing or corrupted readings. Air Quality inference in these cases poses two key challenges. Firstly, air quality displays non-linear spatio-temporal relationships dependent on many factors. Secondly, these factors can evolve over time, changing the distribution of data. For example, changing wind directions can have a large impact on which neighboring sensors are most relevant to inference. Methods incorporating environmental factors to capture these changes, e.g. weather, traffic and points of interest, have found success in urban environments. However, many locations only have access to few if any of these features, thus, inference methods must employ alternate approaches to detect and adapt to changes. We propose a data stream based system, called AirStream, to infer missing PM2.5 levels that is able to detect and adapt to changes in unknown features. We deployed our approach on two air quality studies in New Zealand rural towns, and also tested it on a Beijing benchmark data set. We found gains in inference performance comparing AirStream against seven baseline methods. We further investigate the relationship between the changes we detected and changes in underlying weather conditions. We discovered a strong predictive link between the state of our system and current meteorological conditions. This project is part of Royal Society Marsden Fast-Start. Supervisors: Assoc Prof Yun Sing Koh, Dr Pat Riddle, Prof Mykola Pechenizkiy (TU/e Eindohoven), Prof Albert Bifet (Waikato).
Keywords: Air Pollution, Data Stream Mining, Continual Learning
In partnership with Dr Guy Coulson and Gustavo Olivares | NIWA
Prediction in Evolving Data Stream Using an Adaptive System.
PhD Student: Ocean Wu (Current), MS Data Science: Johnson Zhou (2021). Postdoc: Thomas Lacombe (2019). Many applications deal with data streams. Data streams can be perceived as a continuous sequence of data instances, often arriving at a high rate. In data streams, the underlying data distribution may change over time, causing decay in the predictive ability of the machine learning models. This phenomenon is known as concept drift.
Moreover, it is common for previously seen concepts to recur in real-world data streams, known as recurrent concept drifts. If a concept reappears, for example, a particular weather pattern, previously learned classifiers can be reused; thus the performance of the learning algorithm can be improved.
Scikit-ika is an open-source implementation of methods for handling recurrent concept drifts. It continuously models evolving data streams, providing accurate predictions in real-time, using probabilistic networks and meta-information to proactively predict a change in the data stream. The code developed for this project is available on GitHub and released as part of an open-source python library, as stated in the initial proposal, https://scikit-ika.github.io/.
This project is funded by ONRG Global. Supervisors: Assoc Prof Yun Sing Koh, Prof Gillian Dobbie