Using MP with spacecraft data for studying phenomena #426

hafarooki · 2021-07-01T17:36:27Z

hafarooki
Jul 1, 2021

Hello, I am working on trying to apply MP techniques to the research I am involved in. Basically there are events in the solar wind called small-scale flux rope (SFR) events. There is a paper (zhao et al. 2020) that outlines a method for detecting the data based on taking various properties from both magnetic field and proton data, running calculations and generating spectrograms. When a contoured region in all three spectrograms falls within their respective thresholds, it considers it to be an SFR event.

I tried running MP on just the magnetic field data for now to see what happens. I am confused by the results, particularly the fact that the big spike in the magnetic field magnitude for the big interplanetary coronal mass ejection (ICME) event is basically the top motif at most window sizes.

I would also appreciate any advice on how to use the multi dimensional data for anomaly detection to e.g. find candidates for SFR events without generating spectrograms, or to otherwise use MP to draw insights from the multi dimensional data.

Here is the WIP Jupyter notebook attached. The code, especially at the end, is thrown together just for experimentation, and the comments are very minimal

Matrix Profile SFR Demo.zip

Answered by JaKasb

Jul 1, 2021

Just to be clear about terminology:
Top Motif -> Lowest Distance to Nearest Neighbor -> Smallest Value in MatrixProfile
Top Outlier -> Highest Distance to Nearest Neighbor -> Largest Value in MatrixProfile

The red event contains a low distance at the start and a high distance at the end.
Maybe the window-size is too long.

Furthermore vanilla STOMP ignores the magnitude of the input data, because each window gets normalized to 0-mean and unit-std.
Therefore the physical magnitude of the data is lost. Only the shape matters for motif-detection.

You can transform the z-normalized euclidean distance to pearson-correlation.
Pearson-R is easier for interpretation.

On another topic:
If the probl…

View full answer

JaKasb · 2021-07-01T22:24:12Z

JaKasb
Jul 1, 2021

Just to be clear about terminology:
Top Motif -> Lowest Distance to Nearest Neighbor -> Smallest Value in MatrixProfile
Top Outlier -> Highest Distance to Nearest Neighbor -> Largest Value in MatrixProfile

The red event contains a low distance at the start and a high distance at the end.
Maybe the window-size is too long.

Furthermore vanilla STOMP ignores the magnitude of the input data, because each window gets normalized to 0-mean and unit-std.
Therefore the physical magnitude of the data is lost. Only the shape matters for motif-detection.

You can transform the z-normalized euclidean distance to pearson-correlation.
Pearson-R is easier for interpretation.

On another topic:
If the problem can be analyzed by a frequency-transform/wavelet-transform then matrix-profile may be ill-suited.
MatrixProfile only cares about the shape of subsequences.
Furthermore the linked paper mentions that the events have varying time-lengths, however STOMP analyzes a single window-size, whereas a continous wavelet transform can condense varying wavelet-scalings into a single 2D-spectrogram.

Wavelet-Transform:
How similar is the subsequence to the scaled wavelet.
STOMP:
How similar is the subsequence to another subsequence.

A necesary criteria to find a Motif with STOMP is that the data contains another event with roughly the same length/frequency.
Whereas wavelet transform can find a single event if the event is wave-ly enough.

Maybe you can replace STOMP with VALMOD to deal with the varying length.

Maybe it is helpful to perform lowpass and highpass filtering on the dataset if you are looking for an event with a specific frequency.

Maybe you can use the periodicity/auto-correlation as an Annotation-Vector for MatrixProfile, thereby penalizing time-spans with little periodicity/wavelet-similarity.

1 reply

seanlaw Jul 2, 2021
Maintainer

@MicleBrick Hopefully @JaKasb response provided some (excellent) guidance. I would also like add a couple of further comments for you as well as for others who may be trying something similar:

When looking for anomalies based on amplitude, it may be relevant to turn off z-normalization by setting normalize=False when calling stumpy.stump. This uses straight Euclidean distance when comparing subsequences and is typically "better" for identifying anomalies or spikes in your data.
If you choose to leave z-normalization "on" (this is the default setting), then when you compare subsequences it is not enough to plot them. Instead, you must compare the z-normalized subsequences. To reiterate, relatively speaking, low matrix profile values means "more similar" and high matrix profile values are "less similar"

I hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using MP with spacecraft data for studying phenomena #426

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using MP with spacecraft data for studying phenomena #426

Uh oh!

hafarooki Jul 1, 2021

Replies: 1 comment · 1 reply

Uh oh!

JaKasb Jul 1, 2021

Uh oh!

seanlaw Jul 2, 2021 Maintainer

hafarooki
Jul 1, 2021

Replies: 1 comment 1 reply

JaKasb
Jul 1, 2021

seanlaw Jul 2, 2021
Maintainer