Skip to content

Alternative to CompliantDataFrame.__getitem__ #2423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dangotbanned opened this issue Apr 23, 2025 · 0 comments
Open

Alternative to CompliantDataFrame.__getitem__ #2423

dangotbanned opened this issue Apr 23, 2025 · 0 comments
Labels

Comments

@dangotbanned
Copy link
Member

dangotbanned commented Apr 23, 2025

Important

This issue is not proposing changes to nw.DataFrame

Original comment

https://github.com/narwhals-dev/narwhals/pull/2393/files#r2052510250

if rows is None:
return self._with_compliant(compliant[:, columns])
if columns is None:
return self._with_compliant(compliant[rows, :])
return self._with_compliant(compliant[rows, columns])

@MarcoGorelli how would you feel about removing CompliantDataFrame.__getitem__ (and children) entirely?

At this stage - I think - we should have a fairly good idea of what method to call in EagerDataFrame.

But when we pass it down to EagerDataFrame.__getitem__ - all it knows is we have a tuple - then requiring more narrowing than I think we strictly have to 🤔

Description

I forgot about that particular comment.

But kept thinking about what an alternative might look like after discovering polars._utils.slice.
An easy way to avoid falling back into the patterns prior to (#2390) is to simply remove __getitem__ from the internal classes.

If we were to put slicing behind a namespace (e.g. CompliantSlice) then these methods could be entirely separate from the class(es) they're on now.

def _gather(self, rows: SizedMultiIndexSelector[NativeSeriesT]) -> Self: ...
def _gather_slice(self, rows: _SliceIndex | range) -> Self: ...
def _select_multi_index(
self, columns: SizedMultiIndexSelector[NativeSeriesT]
) -> Self: ...
def _select_multi_name(
self, columns: SizedMultiNameSelector[NativeSeriesT]
) -> Self: ...
def _select_slice_index(self, columns: _SliceIndex | range) -> Self: ...
def _select_slice_name(self, columns: _SliceName) -> Self: ...
def __getitem__(
self,
item: tuple[
SingleIndexSelector | MultiIndexSelector[EagerSeriesT],
MultiIndexSelector[EagerSeriesT] | MultiColSelector[EagerSeriesT],
],
) -> Self:

So calls might look like this:

compliant._slice.gather(...)

Where _slice is a property returning a type like

@property
def _dataframe(self) -> type[ArrowDataFrame]:
return ArrowDataFrame

Or the property initializes a class like:

@property
def cat(self) -> EagerExprCatNamespace[Self]:
return EagerExprCatNamespace(self)

Benefits

  • All internal code is forced to use more explicit selection methods
  • The method names can be much shorter, since they're specific to the *Slice class
  • There's an easy way to share code between DataFrame, Series (feat: Improve DataFrame.__getitem__ consistency #2393 (comment))
  • Opting-out of slicing can be done in a granular way or full-hog
    • Depending on where you place a not_implemented 😏
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant