Anomaly Detection for Time Series using Diffusion Approaches

From ISLAB/CAISR
Title Anomaly Detection for Time Series using Diffusion Approaches
Summary Development of Anomaly Detection techniques based on diffusion models (instead of autoencoders) for time series data
Keywords
TimeFrame Fall 2024
References
Prerequisites
Author
Supervisor Sławomir Nowaczyk & TBD
Level Master
Status Open


Anomaly Detection (AD) is one of the key tasks in Artificial Intelligence (AI), particularly important in industrial settings where data from machines and processes needs to be monitored to ensure safety and efficiency. AD is particularly challenging for time series data, where trends and timing relations within sensor readings are key information sources to be exploited. Many “classical” AD techniques are not useable without extensive feature engineering, which is time-consuming and error-prone – especially for multivariate data. On the contrary, recent AI progress is mainly driven by the benefits of end-to-end solutions.

Current time series AD state-of-the-art are various techniques based on autoencoders (AEs), where reconstruction error is used as an anomaly score. In this project, we propose to use diffusion models instead. While most successful for images, diffusion models have several critical characteristics that hold great promise for time series data.

One of the key characteristics of time series data is the importance of different time scales, or granularities – both small details in the near vicinity of the time point in question are important, as well as the overall long-term trends and big-picture patterns. Autoencoders inherently lack the capability to handle that – and while some attempts are made to deal with those issues (for example, hierarchical RNNs, and, to some degree, transformer-based architectures), they are typically rather ad hoc techniques – a well-founded general solution to this problem is lacking. As an example, Han et al. (2023) investigate different views of the original signals by reconstructing them from the different transformations to learn more comprehensive representations of normal patterns. Specifically, their method is based on self-supervised multi-transformation learning by jointly capturing the noise and filter transformation of the normal time series, which allows capturing the anomaly simultaneously in both transformation patterns.

Diffusion-based approaches promise to address many issues with modern AD approaches. They have a much stronger theoretical foundation in a probabilistic setting, so instead of considering reconstruction error, one can actually calculate (or, at least, approximate in a rigorous fashion) relevant conditional probabilities. This helps with both false positives and false negatives. Most importantly, though, the diffusion process inherently aligns well with the multi-scale nature of time series data. Detailed and context-specific patterns can be captured early in the diffusion process, as the amount of noise is low. The overall trends and long-term dependencies can be captured in the late diffusion stages when the increased noise leaves out only the key skeleton of the time series by abstracting away the finer details. While AEs are generative models, they belong to the “single ground truth” family, i.e., one where an input is mapped to a single output. The anomaly detection task, however, is inherently better suited for Energy-Based Models (EBMs), cf LeCun et al. (2006), which address the partition function problem. Unlike AEs, diffusion-based models can be seen as ways to measure the compatibility between an observed variable X and a variable to be predicted Y through an energy function E(Y;X).