Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListDataset cache error with pandas 1.1.3 #3240

Open
VictorJouault opened this issue Jan 10, 2025 · 2 comments
Open

ListDataset cache error with pandas 1.1.3 #3240

VictorJouault opened this issue Jan 10, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@VictorJouault
Copy link

Description

It appears that the package is currently incompatible with pandas=1.1.3.

Using gluonts latest with pandas 1.1.3. When running the ListDataset command, we get the following bug, which happens in the _as_period function ( https://github.com/awslabs/gluonts/blob/v0.16.x/src/gluonts/dataset/common.py#L262 ). It would appear that in the version of pandas, the frequency pandas._libs.tslibs.offsets.Hour is not hashable, which is incompatible with the use of the cache (@lru_cache used with _as_period).

Upgrading to pandas=1.4.1 resolves the issue.

To Reproduce

Reproduced the hashable error locally using different pandas version. The following code succeeds with pandas 1.4.1, but fails with pandas 1.1.3

from pandas.tseries.frequencies import to_offset
from functools import lru_cache

# Copy / pasting the `_as_period` function
@lru_cache(maxsize=10_000)
def _as_period(val, freq):
    return pd.Period(val, freq)

data = dict()
data["start"] = "2022-01-16 00:00:00"
data["start"] = _as_period(data["start"], to_offset("H"))

Error message or code output

[CPython39-test]     def __call__(self, data: DataEntry) -> DataEntry:
[CPython39-test]         try:
[CPython39-test]             if self.use_timestamp:
[CPython39-test]                 data[self.name] = pd.Timestamp(data[self.name])
[CPython39-test]             else:
[CPython39-test]                 data[self.name] = _as_period(data[self.name], self.freq)
[CPython39-test]         except (TypeError, ValueError) as e:
[CPython39-test] >           raise GluonTSDataError(
[CPython39-test]                 f'Error "{e}" occurred, when reading field "{self.name}" with data "{data[self.name]}" and freq "{self.freq}"'
[CPython39-test]             ) from e
[CPython39-test] E           gluonts.exceptions.GluonTSDataError: Error "unhashable type: 'pandas._libs.tslibs.offsets.Hour'" occurred, when reading field "start" with data "2022-01-16 00:00:00" and freq "<Hour>"

Environment

  • Operating system:
  • Python version: 3.9
  • GluonTS version: 0.16
  • MXNet version:

(Add as much information about your environment as possible, e.g. dependencies versions.)

@VictorJouault VictorJouault added the bug Something isn't working label Jan 10, 2025
@lostella
Copy link
Contributor

@VictorJouault thanks for spotting this. Unfortunately I think it's very hard to maintain backward-compatibility with very old versions of Pandas, due to the many breaking changes here and there.

In my opinion, the fix here is to upgrade the minimum version for pandas to 1.4.1 (is that the earliest version that works?). What do you think?

@VictorJouault
Copy link
Author

Yep agreed with the rational. That's also what I'm doing on my side. Upgrading the requirements seems like an adequate resolution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants