Replies: 6 comments
-
@Darveesh Thank you for your question and welcome to the STUMPY community! There are a few things that come to mind. Firstly, anomalies are really really hard. One needs to first start off by clearly defining "what is an anomaly" as it relates to your data. This may require establishing what is "normal" and then setting thresholds. Secondly, Maybe what you may want to do is to do is to compare the maximum matrix profile value at each iteration and see how it is changing and if that max value is "significant" relative to other values in the current window. This can be done with something like:
This checks and sees if the max value (for a window) is 2 standard deviations higher than the rest of the other values. But, to reiterate, anomaly detection is hard and you'll need to define "what is an anomaly" first before you can really proceed. 🤷♂️ |
Beta Was this translation helpful? Give feedback.
-
@seanlaw Thank you for the response and more importantly perhaps for making stumpy available and being a resource for us curious folks. I hear you about being clear regarding the definition of an anomaly. In my real world application we will have to be think on that a bit more as you suggest. However, as I learn about this library it's great to know about its capabilities and limitations. As to the little experiment I am running here, I will try your suggestion. Initially I had the idea that maybe I should track the location of the max index from one iteration to the next, but if I I am understanding your suggestion correctly you are limiting the "analysis" to the iteration at hand and not "remembering" anything from past iterations. Thank you for the tip. Will see how it performs although I will now keep a broader mind about the difficulty in finding deviations in the signal. |
Beta Was this translation helpful? Give feedback.
-
@Darveesh Awesome! Having an open discussion also helps me think through these things and I learn just as much from others. Let me know how it goes! |
Beta Was this translation helpful? Give feedback.
-
So I ended up combining the two approaches and also working around precision issues. I.e. I classify an anomaly per your suggestion (max value more than 2 std devs away from mean) in a particular iteration. However, I saw that rule alone wasn't good enough since in the next iteration, the same rule would get fired (except now it was shifted by one index). I didn't want to generate another anomaly "alert" in this new iteration. So when an anomaly is classified / detected in the current iteration, I save the (max) index. In the next iteration if an anomaly is detected but the (max) index now is merely one shift, I skip the alerting mechanism. The other case is when the remembered (max) index eventually goes out the sliding window. In that case I reset to the current window (max) index and raise an alert as well. It's not perfect perhaps but the results are not bad: Code:
Output:
If I break the linear progression (to some random values):
|
Beta Was this translation helpful? Give feedback.
-
Very cool and thank you for sharing @Darveesh! I think this is the intended approach when leveraging the STUMPY package. That is, our philosophy is (hopefully) to do the "hardest" (read: "most computationally intensive/complex") part for you and, in only a few extra lines of code, the user can make it accomplish their specific goals! This way, our job is to focus on the core and allow the user to build what is unique to their use case. |
Beta Was this translation helpful? Give feedback.
-
Indeed, thank you again for doing the heavy lifting and your support. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to learn about stumpi and how I can possibly use it to detect anomaly in a real-time time series signal. As a proof of concept, I basically have a linear signal with 10 existing/historical data points. I then proceed to add another 15 data points one at a time following the linear progression. Intuitively/conceptually, I'd expect there to be no discord as data is added to the time series signal. My proof of concept code is below. If you examine the output you will see that the global maximum index is indeed changing - giving the impression that a discord is out there. Now, the values are very small, and the library prints out a warning of sorts along these lines but I don't know how to work around that if that's really the issue here. Looking for advice / suggestion on how to go about my poc. In my experiment, is there a better way to show that as data points come in, there is indeed no discord? Conversely, I plan to introduce a few large values for my incoming data. I'd hope whatever scheme is used, would also identify real discords. Thank you.
Output:
Beta Was this translation helpful? Give feedback.
All reactions