Skip to content

feat: Allow expressions as group_by keys #2325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 124 commits into from
Apr 27, 2025
Merged

Conversation

FBruzzesi
Copy link
Member

@FBruzzesi FBruzzesi commented Apr 1, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@MarcoGorelli there is one test failing - which is grouping by a pandas column with a boolean name (and similarly with integer name in scikit-lego)

@dangotbanned I am crying for help with the typing πŸ™ˆ

Other comment will address in the code itself

@FBruzzesi FBruzzesi added the enhancement New feature or request label Apr 1, 2025
@dangotbanned
Copy link
Member

dangotbanned commented Apr 1, 2025

https://github.com/narwhals-dev/narwhals/actions/runs/14204180755/job/39797880528?pr=2325

@dangotbanned I am crying for help with the typing πŸ™ˆ

@FBruzzesi πŸ˜‚ I'll try taking a look soon

Update

Ah the issue is PolarsExpr isn't compliant

@dangotbanned
Copy link
Member

#2325 (comment)

@FBruzzesi for polars (only) just add an Incomplete in places where you need to use CompliantExpr.

# NOTE: `PolarsSeries`, `PolarsExpr` still have gaps
when: Method[CompliantWhen[PolarsDataFrame, Incomplete, Incomplete]]

PolarsSeries and PolarsExpr are almost entirely defined using __getattr__ - which isn't enough to satisfy typing.

They essentially need something like this, but I've been putting off doing it since they have waaaaaay more methods πŸ˜”

class PolarsDataFrame:
clone: Method[Self]
collect: Method[CompliantDataFrame[Any, Any, Any]]
drop_nulls: Method[Self]
estimated_size: Method[int | float]
explode: Method[Self]
filter: Method[Self]
gather_every: Method[Self]
item: Method[Any]
iter_rows: Method[Iterator[tuple[Any, ...]] | Iterator[Mapping[str, Any]]]
is_unique: Method[PolarsSeries]
join_asof: Method[Self]
rename: Method[Self]
row: Method[tuple[Any, ...]]
rows: Method[Sequence[tuple[Any, ...]] | Sequence[Mapping[str, Any]]]
sample: Method[Self]
select: Method[Self]
sort: Method[Self]
to_arrow: Method[pa.Table]
to_numpy: Method[_2DArray]
to_pandas: Method[pd.DataFrame]
unique: Method[Self]
with_columns: Method[Self]
# NOTE: `write_csv` requires an `@overload` for `str | None`
# Can't do that here 😟
write_csv: Method[Any]
write_parquet: Method[None]

@dangotbanned dangotbanned changed the title fear: Allow expressions as group_by keys feat: Allow expressions as group_by keys Apr 1, 2025
@dangotbanned
Copy link
Member

#2325 (comment)

😨

Resolves them depending on `list.copy`

> Cannot access attribute "copy" for class "Sequence[str]"   Attribute "copy" is unknown
@dangotbanned

This comment was marked as resolved.

@dangotbanned dangotbanned self-assigned this Apr 2, 2025
- Don't use `pl.Expr` for `_keys`
  - We technically don't need that attribute for `polars` anyway
- Use new `IncompletePolarsExpr`
- Don't require `_parse_expr` for `Polars*`
  - It doesn't need that method
  - Still planning to change it later in (#2325 (comment))
@dangotbanned dangotbanned self-requested a review April 21, 2025 18:44
Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @FBruzzesi πŸŽ‰

I've left some comments/suggestions, but nothing blocking.

This is an exciting step towards expressifying more of narwhals πŸš€

@FBruzzesi
Copy link
Member Author

Great work @FBruzzesi πŸŽ‰

I've left some comments/suggestions, but nothing blocking.

Thanks @dangotbanned - I am a bit short on time this week and the next - so apologies for my slow feedback loop.
I should have addressed all your comments/suggestion 😎

This is an exciting step towards expressifying more of narwhals πŸš€

It definitly is - exciting times!

Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2325 (comment)

Thanks again @FBruzzesi

I'll leave open until the end of the day in case @MarcoGorelli wants time chime in - but otherwise I'll merge later ❀️

@MarcoGorelli
Copy link
Member

thanks both, really amazing stuff!

i'll just trying breaking this a bit more, then i'd say we can ship it

@dangotbanned
Copy link
Member

thanks both, really amazing stuff!

i'll just trying breaking this a bit more, then i'd say we can ship it

@MarcoGorelli wanted to check in to see if you were okay with me merging?

No worries at all if not!
I'm happy with holding off (#2325 (review)) if you still wanna strees-test some more πŸ˜…

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted a change to an existing test - in general i'd suggest to just add a new test if you're testing something different

For anything this close to the library core and to expression parsing, please do wait for me to take a final look

Anyway, this looks great, thanks @FBruzzesi and @dangotbanned ! You have a lot to be proud of here. I'm certainly proud that all together we've arrived at such a simple-looking solution

type checking does pass in CI, but there is the occasional squiggly line in vscode. happy to resolve that separately though

@MarcoGorelli MarcoGorelli merged commit 8944823 into main Apr 27, 2025
29 checks passed
@MarcoGorelli MarcoGorelli deleted the feat/allow-expr-in-group-by branch April 27, 2025 14:02
@dangotbanned
Copy link
Member

@MarcoGorelli would we be able to set up something to preserve co-authors?

I think I'm missing again πŸ˜”
8944823

@MarcoGorelli
Copy link
Member

ah so sorry 😩 😳

the combination of 124 commit messages was too long so i removed the whole thing, but it looks like i should've kept the "co-authored by" part at the end for co-authors to show up

I'll be very careful to always do that in the future πŸ‘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enh]: Ability to use nw.col / Expr for .group_by
3 participants