Dimensionality reduction and reconstruction for time series #443
Replies: 3 comments
-
@astrogilda Welcome to the STUMPY community and thank you for your question. Unfortunately, I do not believe that STUMPY would be well suited for your problem. |
Beta Was this translation helpful? Give feedback.
-
Classification:
Reconstruction:LOL. From 5 features to 50 samples, nice joke... |
Beta Was this translation helpful? Give feedback.
-
The 1d-SAX might help you with reducing the dimensionality of time-series data. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I learned about STUMPY a couple of hours ago from Sean's SCIPY 2021 presentation, and have been exploring it ever since :) I have a gut feeling it might be able to help me get over a hump in my project, but given that I don't completely understand it, I am not sure how to go about this. I am describing my problem here in the hope that this community might be able to point me in the right direction.
TL;DR -- if I have 20k one-dimensional time series, each 50 units long, how do I compress this dataset to smaller size, say 20k x 5, and then reconstruct the original time series. PCA and kernel PCA work, but not really very well. I am looking for a time series-based approach.
Background
I have 20,000 total samples. The inputs (i.e., features) are only 20 units long, while the vector outputs are time series and are 50 units long. In other words, my input dataset is of shape 20k x 20, and the corresponding output dataset of shape 20k x 50. I want to train a model to predict the output given the input. Each of the 20k time series varies only between 0 and 0.5. Most time series evolve smoothly, whereas some have periods of quiet (0 magnitude) interspersed with peaks (not more than 3 in any of the 20k samples). Some peaks are sharp, others are spread out.
Current approach
Given the paucity of features, this is inherently difficult. My current approach is to use kernel PCA to reduce the dimensionality of the output from 50 to 5, a number that is much more manageable. However, when I reconstruct the input time series, I notice that said reconstruction is excellent for time series that are smoothly varying, and sucks for those that are peaky. In other words, my current approach is:
(i) 20k x 50 ---> (PCA) --> 20k x 5.
(ii) Train ML model on (20k x 20, 20k x 5) input-output pair. Validate on a different validation set.
(iii) Make predictions on a held out test set, say of shape 2k x 20.
(iv) Inverse transform the predictions, of shape 2k x 5, to 2k x 50.
STUMP to the rescue?
My guess is this issue is fundamentally rooted in the time series nature of the problem. PCA is agnostic to ordering of features, and is unable to capture the temporal nature of my problem. I am hoping there is reversible step (so I can inverse_transform) even before Step (i) above, that can help capture the temporal nature of my problem.
I would really appreciate any help!
Beta Was this translation helpful? Give feedback.
All reactions