Adjust Series specific tests for string option #55538

phofl · 2023-10-15T20:49:41Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

…lar value

jbrockmendel · 2023-10-19T21:49:37Z

doc/source/whatsnew/v2.1.2.rst

@@ -39,7 +39,7 @@ Bug fixes
 Other
 ~~~~~
 - Fixed non-working installation of optional dependency group ``output_formatting``. Replacing underscore ``_`` with a dash ``-`` fixes broken dependency resolution. A correct way to use now is ``pip install pandas[output-formatting]``.
-
+- Setting the environment variable ``PANDAS_INFER_STRING`` to ``"1"`` will now enable the option ``pd.options.future.infer_string`` (:issue:`55533`)^


wasnt this a separate PR?

Yeah good point, removed

jbrockmendel · 2023-10-19T21:59:53Z

pandas/_testing/asserters.py

@@ -440,7 +440,10 @@ def assert_is_sorted(seq) -> None:
    if isinstance(seq, (Index, Series)):
        seq = seq.values
    # sorting does not change precisions
-    assert_numpy_array_equal(seq, np.sort(np.array(seq)))
+    if isinstance(seq, np.ndarray):
+        assert_numpy_array_equal(seq, np.sort(np.array(seq)))


might be faster to use libalgos.is_monotonic?

This is used in one place, tbh I want to get rid of the method completely, but that wasn't in scope here

jbrockmendel · 2023-10-19T22:00:10Z

pandas/conftest.py

+@pytest.fixture
+def using_infer_string() -> bool:
+    """
+    Fixture to check if Copy-on-Write is enabled.


docstring looks copy/pasted, needs updating

Thx for catching

jbrockmendel · 2023-10-20T01:27:50Z

pandas/tests/series/indexing/test_getitem.py

@@ -71,7 +71,7 @@ def test_getitem_unrecognized_scalar(self):
    def test_getitem_negative_out_of_bounds(self):
        ser = Series(["a"] * 10, index=["a"] * 10)

-        msg = "index -11 is out of bounds for axis 0 with size 10"
+        msg = "index -11 is out of bounds for axis 0 with size 10|index out of bounds"


can you use a "|".join pattern

I don't feel super strong here but I prefer the pipe pattern for 2 options

pandas/tests/series/accessors/test_dt_accessor.py

# Conflicts: # pandas/tests/series/test_constructors.py

doc/source/whatsnew/v2.1.2.rst

mroeschke · 2023-11-16T22:37:03Z

pandas/tests/series/indexing/test_setitem.py

        # GH#22717 inserting a Timedelta should _not_ cast to int64
        expected = Series(["x", td], index=[0, "td"], dtype=object)

        ser = Series(["x"])
        ser["td"] = td
        tm.assert_series_equal(ser, expected)
-        assert isinstance(ser["td"], Timedelta)
+        if using_infer_string and not isinstance(td, Timedelta):
+            assert not isinstance(ser["td"], Timedelta)


What would ser["td"] be here?

one of those 2:

Timedelta("9 days").to_timedelta64(), Timedelta("9 days").to_pytimedelta(),

maybe we should xfail that test, not sure anymore what my reasoning here was

Yeah I find it weird to begin with that this always promoted timedeltas to pd.Timedelta.

Agree to xfail this test for now to nail down the correct behavior in a follow up

mroeschke · 2023-11-16T22:40:18Z

pandas/tests/series/test_arithmetic.py

+            msg = "has no kernel"
+            # with tm.assert_produces_warning(DeprecationWarning, match="comparison"):
+            with pytest.raises(pa.lib.ArrowNotImplementedError, match=msg):
+                s == s2


This should probably return all False in the future?

Yes if we want to patch the arrow behaviour, makes probably sense

OK could you add a TODO here as a reminder? Would be good to include it in an issue referencing things that would need fixing before 3.0

phofl · 2023-11-16T22:50:41Z

I'll open issues for them and link them here

phofl · 2023-11-16T23:05:49Z

Opened issues for both

mroeschke

LGTM. Will can merge in a few days unless others have comments

phofl added 11 commits October 15, 2023 16:03

Add env variable for infer string option

9e9f567

Add whatsnew

23595ab

Fix method tests for using infer string option

f3bb720

Fix tests

2c8de1b

Fix more tests

3b5e7e7

TST: Fix assert_is_sorted for eas

b613e5a

BUG: Series inferring new string dtype even if dtype is given for sca…

93ee4d7

…lar value

Merge branch 'ser_string_storage' into series_string_tests

c0f629f

Merge branch 'is_sorted' into series_string_tests

43a27a1

Fix tests for series folder

224b7f6

Fix

8cf3af5

mroeschke added the Strings String extension data type and string data label Oct 16, 2023

jbrockmendel reviewed Oct 19, 2023

View reviewed changes

jbrockmendel reviewed Oct 20, 2023

View reviewed changes

phofl added 3 commits October 22, 2023 14:31

Update v2.1.2.rst

e49c77b

Update conftest.py

f32ef80

Merge branch 'main' into series_string_tests

f546925

mroeschke reviewed Oct 24, 2023

View reviewed changes

pandas/tests/series/accessors/test_dt_accessor.py Show resolved Hide resolved

Merge remote-tracking branch 'upstream/main' into series_string_tests

3d24dc2

# Conflicts: # pandas/tests/series/test_constructors.py

mroeschke reviewed Nov 16, 2023

View reviewed changes

doc/source/whatsnew/v2.1.2.rst Outdated Show resolved Hide resolved

Update v2.1.2.rst

75f1ae1

mroeschke reviewed Nov 16, 2023

View reviewed changes

Update test_arithmetic.py

ec54b76

phofl mentioned this pull request Nov 16, 2023

BUG: time deltas are always inferred as pd.Timedelta #56010

Open

Add gh refs

f93f01b

Add gh refs

07336dd

mroeschke added this to the 2.2 milestone Nov 17, 2023

mroeschke approved these changes Nov 17, 2023

View reviewed changes

mroeschke added the Testing pandas testing functions or related to the test suite label Nov 17, 2023

phofl merged commit d22adf6 into pandas-dev:main Nov 19, 2023

phofl deleted the series_string_tests branch November 19, 2023 23:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust Series specific tests for string option #55538

Adjust Series specific tests for string option #55538

phofl commented Oct 15, 2023

jbrockmendel Oct 19, 2023

phofl Oct 22, 2023

jbrockmendel Oct 19, 2023

phofl Oct 22, 2023

jbrockmendel Oct 19, 2023

phofl Oct 22, 2023

jbrockmendel Oct 20, 2023

phofl Oct 22, 2023

mroeschke Nov 16, 2023

phofl Nov 16, 2023

mroeschke Nov 16, 2023

mroeschke Nov 16, 2023

phofl Nov 16, 2023

mroeschke Nov 16, 2023

phofl commented Nov 16, 2023

phofl commented Nov 16, 2023

mroeschke left a comment

Adjust Series specific tests for string option #55538

Adjust Series specific tests for string option #55538

Conversation

phofl commented Oct 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl commented Nov 16, 2023

phofl commented Nov 16, 2023

mroeschke left a comment

Choose a reason for hiding this comment