-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] usability improvements for a "minimal" pyarrow #38536
Comments
That should already have been tackled (although there might be some libs that were missed): #36553 / #36554. |
Ah, nice! I guess we can patch that, though it would be nice to have a template that we only need to patch once1, rather than have to chase every import that's potentially affected by a missing DSO. Footnotes
|
Hi @h-vetinari! Do you know if the package split introduced for conda-forge is coming to PyPi? BTW, should this issue still be opened? |
There's some discussions in order to split the wheels and do something "similar" but apart from some exploration there hasn't been anything concrete done yet. I plan to work on it but I have to schedule it along with other priorities |
Thanks for all your hard work, guys! 👍 |
Hi everyone! Still no news regarding PyPi package split? 🙏 |
Unfortunately not. This is a big chunk of work and we need someone to be able to work on it. |
Describe the enhancement requested
Providing slimmer variants of arrow has been a topic for quite a while, but got more urgent with pandas plan to depend on pyarrow, which would bring quite a substantial installation size increase, due to the way pyarrow gets packaged (this is true even more so in conda-forge, where we package a "maximal" version of arrow -- since it's so hard to build from source -- that generally contains more in terms of transitive dependencies than the wheels).
Through work on the feedstock, the conda-forge side of arrow is now ready to split up libarrow 14.0 into several pieces (currently
libarrow-{acero,dataset,flight,flight-sql,gandiva,substrait}
+libparquet
), but we're still having pyarrow depend on the entirety of libarrow, not least because the python bindings link to everything butlibarrow-flight-sql
directly:While it would be theoretically possible to also build various
pyarrow-*
variants, that's quite unappealing IMO from a packaging perspective, and it would be nicer ifpyarrow
just depended on the (core)libarrow
, but provided helpful error messages where any missinglibarrow-*
libraries actually get used. In such a scenario (c.f. discussion in conda-forge/arrow-cpp-feedstock#1035),Such an approach would presumably also make it easier for the wheel side of things (i.e. not having N
pyarrow-*
variants), though of course, providing the equivalent of thelibarrow-*
outputs from conda-forge through wheels would be quite a headache. It's possible that the best solution for wheels looks different (or ends up being sliced differently, like e.g. having two wheelspyarrow
andpyarrow-minimal
, orpyarrow
andpyarrow[full]
).Note also that (from conda-forge/arrow-cpp-feedstock#1175):
This is now being tracked in #38309.
Component(s)
Packaging, Python
The text was updated successfully, but these errors were encountered: