Replies: 17 comments 6 replies
-
Hi, @mloning! I hope you and your team are doing well. I really appreciate all of the work that you and your group have done with |
Beta Was this translation helpful? Give feedback.
-
Hi @seanlaw, thanks for reaching out! It would be great to collaborate! One of our main motivations is to make the ecosystem more unified and interoperable. At first place, the matrix profile seems to be a series-to-series transformation and hence would be a transformer. But we're happy to have a discussion to see how to best combine efforts and integrate our projects. |
Beta Was this translation helpful? Give feedback.
-
Yes, if I understand your terminology correctly, a "series-to-series" transformer sounds about right. Essentially, for matrix profiles, you'd take a 1-D
From an ML modeling perspective, I think all 4 columns of data are potentially useful as features. I have joined the Slack channel but, in case it matters, for official STUMPY business, I tend to prefer having discussions in the Github issues because:
|
Beta Was this translation helpful? Give feedback.
-
Yes, so STUMPY would be a series-to-series transformer that takes a univariate series as input and returns a multivariate series as output. We'd love to interface STUMPY in sktime so that people can use it with the other functionality. We're currently working on refactoring the transformer API and improving the support for multivariate series. So perhaps it's best to wait until we've made some progress with that. Do you provide any additional tools for composition, pipelining and so on that I can take a look at? I completely agree about having the discussion openly! cc: @fkiraly @TonyBagnall |
Beta Was this translation helpful? Give feedback.
-
Sounds good. Just so you're aware, we've purposely designed STUMPY to be (mostly) flat and, more importantly, stateless. So, the majority of our library is just collection of independent functions (or a wrapper function calling another function). Classes are only present in some rare cases where keeping track of the state is important but your case should not be affected by this. So, for a single time series, one would do something like:
In case it matters, we also have support for multi-dimensional time series (see
What would you say is average length of the time series that your users will be transforming? Also, note that we have GPU support for the 1-D case as well via:
No, since a unique input produces a unique matrix profile output (with 4 columns), we've purposely limited the scope of STUMPY to be solely focused on being fast, scalable, and efficient at computing the matrix profile and with minimal dependencies. It's important for STUMPY not to step too far outside of this scope as it can dramatically increase the things we'd need to support. It should essentially feel like calling a native |
Beta Was this translation helpful? Give feedback.
-
@mloning, I believe we already had some custom implementation of matrix profile a while ago (implemented by Claudia Sanchez)? Not suggesting that it is better than |
Beta Was this translation helpful? Give feedback.
-
Hi @fkiraly, yes I pointed that out in the Gitter chat! :-) The matrix profile in sktime is currently a series-as-features transformer, but it may be useful in other settings too. As discussed above, fundamentally it's a series-to-series transformer and doesn't require multiple series as input. |
Beta Was this translation helpful? Give feedback.
-
there are two forms of the matrix profile, single series transform, or multiple series join/distance measure so it can require multiple series as input |
Beta Was this translation helpful? Give feedback.
-
Yes, the multiple series version in STUMPY is
|
Beta Was this translation helpful? Give feedback.
-
If I may be honest, I would consider avoiding multi-dimensional cases as the published definition (and what is implemented faithfully in STUMPY) isn't what you think it might be and people will get really confused. Additionally, computing the matrix profile for multiple dimensions can be extremely costly, computationally speaking. I would recommend just sticking to the 1-D case. Do you have any rough idea as to what the average time series length is that people are transforming? Or how long people are willing to wait? |
Beta Was this translation helpful? Give feedback.
-
That really depends - e.g., whether the use case is as a component in forecasting, or in time serires classification. Could be a length of 10, or 1000.
Given the above, should we maybe start with the series-to-series transformer? |
Beta Was this translation helpful? Give feedback.
-
For 1-D time series, the computation is no longer instantaneous (i.e., less than a few seconds) at around 50K data points (100K takes ~40s to process). Anything less than 10K data points should be fast!
Yes, I think that would be wise and then I recommend letting the user submit a future request if they truly want to compute a multi-dimensional matrix profile. Again, this capability is available in STUMPY already but I am fairly certain that 99% of people who use it blindly will interpret the multi-dimensional matrix profile output incorrectly. I want to save us all time from having to support something due to user error/misunderstanding/ignorance if that makes sense? |
Beta Was this translation helpful? Give feedback.
-
Just a quick update: we're working on the transformer typing here (sktime/sktime#420), I'll ping you once it's done. |
Beta Was this translation helpful? Give feedback.
-
Awesome! Thanks for the update |
Beta Was this translation helpful? Give feedback.
-
Hi @seanlaw, we merged the update and I created an issue here: sktime/sktime#450 to interface stumpy |
Beta Was this translation helpful? Give feedback.
-
Great! Thanks for letting me know and feel free to ping me with any questions |
Beta Was this translation helpful? Give feedback.
-
Hi @seanlaw - are you aware of a good default value (or procedure to infer a good value from the data) for the window length (m) in cc @utsavcoding who's working on the implementation |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
There was a recent comment/discussion in the
sktime
Gitter channelIt would be useful to the time series community to openly discuss whether there is potential for collaboration here as there could be some nice synergies.
Beta Was this translation helpful? Give feedback.
All reactions