Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Union of two DateTimeIndexes is incorrectly calculated #60816

Open
2 of 3 tasks
filmor opened this issue Jan 29, 2025 · 5 comments
Open
2 of 3 tasks

BUG: Union of two DateTimeIndexes is incorrectly calculated #60816

filmor opened this issue Jan 29, 2025 · 5 comments
Labels
Bug Needs Discussion Requires discussion from core team before further action Non-Nano datetime64/timedelta64 with non-nanosecond resolution Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@filmor
Copy link
Contributor

filmor commented Jan 29, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from pandas import DatetimeIndex

l = DatetimeIndex(['2023-05-24 00:00:00+00:00', '2023-05-24 00:15:00+00:00',
               '2023-05-24 00:30:00+00:00', '2023-05-24 00:45:00+00:00',
               '2023-05-24 01:00:00+00:00'],
              dtype='datetime64[ms, UTC]', name='ts', freq='15min') 

r = DatetimeIndex(['2023-05-24 00:00:00+00:00', '2023-05-24 00:30:00+00:00',
               '2023-05-24 01:00:00+00:00'],
              dtype='datetime64[ms, UTC]', name='ts', freq='30min') 

union = r.union(l)

print(union)

assert len(union) == len(l)
assert all(r.union(l) == l)

Issue Description

The union of two datetime-indexes as given in the reproducible example is calculated incorrectly, the result on newer Pandas versions is

DatetimeIndex(['2023-05-24 00:00:00+00:00', '2051-11-29 16:00:00+00:00',
               '2080-06-06 08:00:00+00:00'],
              dtype='datetime64[ms, UTC]', name='ts', freq='15T')

The first failing version is the one I put into "Installed Versions". The error happens exactly from Pandas 2.1.0 onwards, Pandas 1.* and up to 2.0.3 work fine. Neither the numpy nor the Python version matter.

Expected Behavior

The expected result in the given case is that l is returned.

Installed Versions

INSTALLED VERSIONS

commit : ba1cccd
python : 3.10.16.final.0
python-bits : 64
OS : Linux
OS-release : 6.12.10-200.fc41.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Fri Jan 17 18:05:24 UTC 2025
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.0
numpy : 1.26.4
pytz : 2024.2
dateutil : 2.9.0.post0
tzdata : 2025.1

@filmor filmor added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 29, 2025
@filmor
Copy link
Contributor Author

filmor commented Jan 30, 2025

I bisected this to commit 436f5eb, will verify and check if I can provide a fix.

@asishm
Copy link
Contributor

asishm commented Jan 30, 2025

I believe this is already fixed on main in #59037 It's currently milestoned to be released with the 3.0 release.

@filmor
Copy link
Contributor Author

filmor commented Jan 30, 2025

Looks like it, I'll try it. Shouldn't the fix be backported to 2.1 and 2.2 as well?

@asishm
Copy link
Contributor

asishm commented Jan 30, 2025

There are no further releases planned in the 2.1/2.2 branches. There will be a 2.3 release (before 3.0).

Maybe someone from the core team can comment on backports.

@rhshadrach
Copy link
Member

Assuming #59037 is not difficult to backport, I'm open to releasing this fix in 2.3. @WillAyd @jorisvandenbossche - any objection?

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action Non-Nano datetime64/timedelta64 with non-nanosecond resolution Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 2, 2025
@rhshadrach rhshadrach added this to the 2.3 milestone Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Discussion Requires discussion from core team before further action Non-Nano datetime64/timedelta64 with non-nanosecond resolution Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

3 participants