Dimensionality reduction and reconstruction for time series #443

astrogilda · 2021-07-24T06:56:58Z

astrogilda
Jul 24, 2021

Hello!

I learned about STUMPY a couple of hours ago from Sean's SCIPY 2021 presentation, and have been exploring it ever since :) I have a gut feeling it might be able to help me get over a hump in my project, but given that I don't completely understand it, I am not sure how to go about this. I am describing my problem here in the hope that this community might be able to point me in the right direction.

TL;DR -- if I have 20k one-dimensional time series, each 50 units long, how do I compress this dataset to smaller size, say 20k x 5, and then reconstruct the original time series. PCA and kernel PCA work, but not really very well. I am looking for a time series-based approach.

Background
I have 20,000 total samples. The inputs (i.e., features) are only 20 units long, while the vector outputs are time series and are 50 units long. In other words, my input dataset is of shape 20k x 20, and the corresponding output dataset of shape 20k x 50. I want to train a model to predict the output given the input. Each of the 20k time series varies only between 0 and 0.5. Most time series evolve smoothly, whereas some have periods of quiet (0 magnitude) interspersed with peaks (not more than 3 in any of the 20k samples). Some peaks are sharp, others are spread out.

Current approach
Given the paucity of features, this is inherently difficult. My current approach is to use kernel PCA to reduce the dimensionality of the output from 50 to 5, a number that is much more manageable. However, when I reconstruct the input time series, I notice that said reconstruction is excellent for time series that are smoothly varying, and sucks for those that are peaky. In other words, my current approach is:

(i) 20k x 50 ---> (PCA) --> 20k x 5.
(ii) Train ML model on (20k x 20, 20k x 5) input-output pair. Validate on a different validation set.
(iii) Make predictions on a held out test set, say of shape 2k x 20.
(iv) Inverse transform the predictions, of shape 2k x 5, to 2k x 50.

STUMP to the rescue?
My guess is this issue is fundamentally rooted in the time series nature of the problem. PCA is agnostic to ordering of features, and is unable to capture the temporal nature of my problem. I am hoping there is reversible step (so I can inverse_transform) even before Step (i) above, that can help capture the temporal nature of my problem.

I would really appreciate any help!

seanlaw · 2021-07-24T12:51:27Z

seanlaw
Jul 24, 2021
Maintainer

@astrogilda Welcome to the STUMPY community and thank you for your question. Unfortunately, I do not believe that STUMPY would be well suited for your problem.

0 replies

JaKasb · 2021-07-26T12:40:42Z

JaKasb
Jul 26, 2021

Classification:

Find n different Motifs
Motif Clustering, Matrix Profile XV Ostinato
Compute Feature Matrix MASS(sequence, motif_i) -> 20k x n
Train Classifier -> Ordinal Label 20k x 1

Reconstruction:

LOL. From 5 features to 50 samples, nice joke...
That's only possible if the sequences are extremely compressible, i.e. sparse in the frequency domain or if the sequences can be deconstructed into building blocks.
I assume there is some neuronal network architecture that can generate temporal sequences, but I also assume that 20k training samples are too little data for the training of said network.

0 replies

NimaSarajpoor · 2021-07-26T20:59:43Z

NimaSarajpoor
Jul 26, 2021
Collaborator

@astrogilda

The 1d-SAX might help you with reducing the dimensionality of time-series data.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dimensionality reduction and reconstruction for time series #443

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Dimensionality reduction and reconstruction for time series #443

Uh oh!

Uh oh!

astrogilda Jul 24, 2021

Replies: 3 comments

Uh oh!

seanlaw Jul 24, 2021 Maintainer

Uh oh!

JaKasb Jul 26, 2021

Classification:

Reconstruction:

Uh oh!

NimaSarajpoor Jul 26, 2021 Collaborator

astrogilda
Jul 24, 2021

seanlaw
Jul 24, 2021
Maintainer

JaKasb
Jul 26, 2021

NimaSarajpoor
Jul 26, 2021
Collaborator