Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: time deltas are always inferred as pd.Timedelta #56010

Open
phofl opened this issue Nov 16, 2023 · 3 comments
Open

BUG: time deltas are always inferred as pd.Timedelta #56010

phofl opened this issue Nov 16, 2023 · 3 comments
Assignees
Labels

Comments

@phofl
Copy link
Member

phofl commented Nov 16, 2023

We always infer time deltas as pd.Timedelta, see #55538 (comment) for more context

ser = pd.Series(["x"])
td = pd.Timedelta("9 days").to_pytimedelta()
ser["td"] = td

assert isinstance(ser["td"], pd.Timedelta)  # <- OP finds this weird
@phofl phofl added the Timedelta Timedelta data type label Nov 16, 2023
@jbrockmendel
Copy link
Member

my opinion depends on how difficult this is to change/"fix". Im guessing somewhere in the indexing code when we do setitem-with-expansion we do something like

new_ser = Series([new_item], index=[new_key])
result = concat([old_ser, new_ser])
old_ser._mgr = result._mgr

Assuming this guess is correct, we could fix the OP example by passing dtype=object when constructing new_ser, but that would mean that result would always end up with object dtype, which we definitely dont want. So maybe just do that when old_ser.dtype == object?

In general if users want tight control over dtypes they shouldn't be using setitem-with-expansion.

Note that we also do inference on the index in setitem-with-expansion, which i think we shouldn't, xref #55257, #51363

@xiaohuanlin
Copy link
Contributor

take

@xiaohuanlin
Copy link
Contributor

This problem is related to these two issues:

  1. Timedelta interpreted as int upon first insertion into Series #22717
  2. Series: inconsistent behavior of setting value with timestamp dtype for existed index and newly-added #26031

So I am curious about how to deal with this problem. Apparently, for numpy.timedelta64 and datetime.timedelta, these two types will be convert to pd.Timedelta according to these unittests

@pytest.mark.parametrize(
    "td",
    [
        Timedelta("9 days"),
        Timedelta("9 days").to_timedelta64(),
        Timedelta("9 days").to_pytimedelta(),
    ],
)
def test_append_timedelta_does_not_cast(self, td, using_infer_string, request):
    # GH#22717 inserting a Timedelta should _not_ cast to int64
    if using_infer_string and not isinstance(td, Timedelta):
        # TODO: GH#56010
        request.applymarker(pytest.mark.xfail(reason="inferred as string"))

    expected = Series(["x", td], index=[0, "td"], dtype=object)

    ser = Series(["x"])
    ser["td"] = td
    tm.assert_series_equal(ser, expected)
    assert isinstance(ser["td"], Timedelta)

    ser = Series(["x"])
    ser.loc["td"] = Timedelta("9 days")
    tm.assert_series_equal(ser, expected)
    assert isinstance(ser["td"], Timedelta)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants