BUG: time deltas are always inferred as pd.Timedelta #56010

phofl · 2023-11-16T23:03:42Z

We always infer time deltas as pd.Timedelta, see #55538 (comment) for more context

ser = pd.Series(["x"])
td = pd.Timedelta("9 days").to_pytimedelta()
ser["td"] = td

assert isinstance(ser["td"], pd.Timedelta)  # <- OP finds this weird

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2023-11-16T23:44:52Z

my opinion depends on how difficult this is to change/"fix". Im guessing somewhere in the indexing code when we do setitem-with-expansion we do something like

new_ser = Series([new_item], index=[new_key])
result = concat([old_ser, new_ser])
old_ser._mgr = result._mgr

Assuming this guess is correct, we could fix the OP example by passing dtype=object when constructing new_ser, but that would mean that result would always end up with object dtype, which we definitely dont want. So maybe just do that when old_ser.dtype == object?

In general if users want tight control over dtypes they shouldn't be using setitem-with-expansion.

Note that we also do inference on the index in setitem-with-expansion, which i think we shouldn't, xref #55257, #51363

xiaohuanlin · 2024-01-11T19:53:18Z

take

xiaohuanlin · 2024-01-11T20:51:08Z

This problem is related to these two issues:

Timedelta interpreted as int upon first insertion into Series #22717
Series: inconsistent behavior of setting value with timestamp dtype for existed index and newly-added #26031

So I am curious about how to deal with this problem. Apparently, for numpy.timedelta64 and datetime.timedelta, these two types will be convert to pd.Timedelta according to these unittests

@pytest.mark.parametrize(
    "td",
    [
        Timedelta("9 days"),
        Timedelta("9 days").to_timedelta64(),
        Timedelta("9 days").to_pytimedelta(),
    ],
)
def test_append_timedelta_does_not_cast(self, td, using_infer_string, request):
    # GH#22717 inserting a Timedelta should _not_ cast to int64
    if using_infer_string and not isinstance(td, Timedelta):
        # TODO: GH#56010
        request.applymarker(pytest.mark.xfail(reason="inferred as string"))

    expected = Series(["x", td], index=[0, "td"], dtype=object)

    ser = Series(["x"])
    ser["td"] = td
    tm.assert_series_equal(ser, expected)
    assert isinstance(ser["td"], Timedelta)

    ser = Series(["x"])
    ser.loc["td"] = Timedelta("9 days")
    tm.assert_series_equal(ser, expected)
    assert isinstance(ser["td"], Timedelta)

phofl added the Timedelta Timedelta data type label Nov 16, 2023

jbrockmendel added the setitem-with-expansion label Nov 16, 2023

github-actions bot assigned xiaohuanlin Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: time deltas are always inferred as pd.Timedelta #56010

BUG: time deltas are always inferred as pd.Timedelta #56010

phofl commented Nov 16, 2023 •

edited by jbrockmendel

Loading

jbrockmendel commented Nov 16, 2023

xiaohuanlin commented Jan 11, 2024

xiaohuanlin commented Jan 11, 2024

BUG: time deltas are always inferred as pd.Timedelta #56010

BUG: time deltas are always inferred as pd.Timedelta #56010

Comments

phofl commented Nov 16, 2023 • edited by jbrockmendel Loading

jbrockmendel commented Nov 16, 2023

xiaohuanlin commented Jan 11, 2024

xiaohuanlin commented Jan 11, 2024

phofl commented Nov 16, 2023 •

edited by jbrockmendel

Loading