-
Hello, Thank you for the very useful package! I am trying to adapt the doc example for mstump to find discords instead of motifs in multidimensional data.
Including code example below (reproduced from the documentation with changes to find discords). I first set Thank you.
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
@dbolotov Thank you for your question and welcome to the STUMPY community!
Yes, it is necessary to set
Yes, exactly! I admit that it may quite dense but I encourage you to (re-)read this section of the mstump tutorial (and feel free to ask questions). So, for a given multi-dimensional subsequence, we compute a corresponding multi-dimensional distance profile, In the case of motifs, Thus, responding to your original question, yes, the multi-dimensional distance profile is reversed when
Great question and the answer is a NO! Unfortunately, in order to get motifs, you must set
This sounds right and I believe that you've identified a possible error in the tutorial. We should indeed be sorting using
Given the points that I made above (and please follow up with questions if anything is unclear), you cannot use the multi-dimensional matrix profile computed with Additionally, in the case of motifs, we may use the "elbow" method to choose |
Beta Was this translation helpful? Give feedback.
-
Yes, you are absolutely right on both points! Originally, when the example was first written, it was a 3-dimensional example (i.e.,
I think the short answer is "no" but not because we aren't interested but because, if I recall correctly, it has only been brought up once on Twitter (was that you?).
These are just a few thoughts/considerations off the top of my head. Do you happen to have an application for
Great question. By definition, anomalies are just hard to find. From my (limited) experience, if you are looking for spikes in your time series then setting |
Beta Was this translation helpful? Give feedback.
-
Thanks for the answers. Our application is anomaly detection in the IoT space, working with streaming multivariate sensor data, but not bound to any specific type of measurements. We were just researching currently available methods, and looking for something user-friendly and with few parameters. We started by looking at algorithms like robust random cut forest, and eventually found out about matrix profiles & STUMPY (it was not me asking about it on Twitter :)). Thanks for the list of challenges with a streaming version. I will have some internal discussions with my team about how much time we can spend contributing to implementation of |
Beta Was this translation helpful? Give feedback.
Yes, you are absolutely right on both points! Originally, when the example was first written, it was a 3-dimensional example (i.e.,
T1, T2, T3
) and then we realized that, sincem = 3
, it was less ambiguous/confusing to make it 4-dimensional (i.e.,T1, T2, T3, T4
). Of course, in doing so, we failed update the two points accordingly. I've just updated the tutorial but please let me know if you come across anything else. Your questions have been just as valuable and I'm sure that other users will appreciate your keen eye to detail as well so thank you!