-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
micropip.freeze imports field for PyPi dependencies is insufficient #207
Comments
Here's what I was able to code up so far: import importlib.metadata
from pathlib import Path
packages = set(
v for vs in importlib.metadata.packages_distributions().values() for v in vs
)
for p in sorted(packages):
files = importlib.metadata.files(p)
imports = set()
tree = dict()
for f in files:
# ignore special folders
if Path(f.parts[0]).suffix in [".libs", ".dist-info", ".data"]:
continue
# include top-level single-file packages
if len(f.parts) == 1 and f.suffix == ".py":
imports.add(f.stem)
continue
# build a tree of all other files
t = tree
for r in f.parts:
if t.get(r, None) is None:
t[r] = dict()
t = t[r]
# extract folders that only have folders but no files as children,
# these are package candidates
queue = [([k], t) for k, t in tree.items()]
while len(queue) > 0:
ps, tree = queue.pop()
if len(tree) == 0:
continue
imports.add('.'.join(ps))
is_package = True
add_to_queue = []
for k, t in tree.items():
if len(t) == 0:
is_package = False
add_to_queue.append((ps + [k], t))
if is_package:
queue += add_to_queue
# remove prefixes from the list
new_imports = []
for i in imports:
if not any(j.startswith(i) for j in imports if j != i):
new_imports.append(i)
print(p, sorted(new_imports)) For
The results seem to be quite good :) |
Thanks for opening the issue. Yes, I totally agree that using Combining the two approaches that you mentioned sounds reasonable to me. We also have a similar logic (iterating through the package directory and finding Python files) in pyodide-build, so probably you can take a look too. |
Thank you for these links! I further adapted my code a bit so that it works for the cpython modules and for namespace packages. Unfortunately, Pyodide needs better handling of namespace packages than setuptools, since just giving the top level import (which top-level.txt does) is insufficient: when we parse the imports to generate the map from imports to packages to load, several namespace packages can fight over the top-level and you end up in a situation where |
Here's what I have now: def get_imports_for_package(p: str) -> list[str]:
def valid_package_name(n: str) -> bool:
return all(invalid_chr not in n for invalid_chr in ".- ")
imports = set()
tree = dict()
for f in importlib.metadata.files(p):
# ignore special folders
if Path(f.parts[0]).suffix in [".libs", ".dist-info", ".data"]:
continue
# include top-level single-file packages
if len(f.parts) == 1 and f.suffix in [".py", ".pyc", ".so"]:
stem = f.name.split('.')[0] if f.suffix == ".so" else f.stem
if valid_package_name(stem):
imports.add(stem)
continue
# build a tree of all other files
t = tree
for r in f.parts:
if t.get(r, None) is None:
t[r] = dict()
t = t[r]
# extract folders that only have folders but no files as children,
# these are package candidates
queue = [
([k], t) for k, t in tree.items()
if len(t) > 0 and valid_package_name(k)
]
while len(queue) > 0:
ps, tree = queue.pop()
imports.add('.'.join(ps))
is_package = True
add_to_queue = []
for k, t in tree.items():
if len(t) > 0:
if valid_package_name(k):
add_to_queue.append((ps + [k], t))
else:
is_package = False
if is_package:
queue += add_to_queue
# remove prefixes from the list
new_imports = []
for i in imports:
if not any(j.startswith(f"{i}.") for j in imports if j != i):
new_imports.append(i)
return new_imports |
Thanks @juntyr. Feel free to open a PR when it is ready, then I'll start reviewing the code. |
micropip.freeze
currently checks for the non-standardtop_level.txt
file to fill theimports
field. This has a few issues:namespace.package
insteadA brief online search didn't reveal any good solutions:
RECORDS
file to extract all non-special directories. I would suggest to go even one step further and extract the common prefix for all non-special directories so that we could also translatenamespace/package
intonamespace.package
. To allow several adjacent namespace packages, perhaps the rule would be to look for directory prefixes until we find the first__init__.py
The text was updated successfully, but these errors were encountered: