Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove binary files from repository #165

Open
agriyakhetarpal opened this issue Dec 12, 2024 · 3 comments
Open

Remove binary files from repository #165

agriyakhetarpal opened this issue Dec 12, 2024 · 3 comments

Comments

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Dec 12, 2024

Similar to @Carreau's suggestion in #139: we should remove the existing wheel files from tests/test_data/wheel/, since they will only bloat the repository as we proceed to add more commits to the default branch. They can be stored elsewhere

Yes, maybe let's discuss it in a separate issue. I don't like to put bunch of big files in the repository for testing, but I am also not sure whether putting it in a separate place is a good way to go.

Originally posted by @ryanking13 in #139 (comment)


  • My idea is to use a dummy repository and add the wheels to the GitHub release, and then download them at the time of running the tests using https://github.com/fatiando/pooch (which can cache the files as well). It is used by the scikit-image and scikit-learn test suites to download data files (from SciPy's datasets, in the case for the latter), and I use it as well, for PyBaMM. However, it will store them in a separate place indeed, which might not be what we want.

  • Another approach we could take to avoid storing them elsewhere is to remove them in a commit, and use the raw GitHub permalink for the files before said commit to keep accessing them and download them. This gives us the added benefit of the fact that GitHub also sets CORS headers on such URLs. However, we won't be able to update the files as easily with this method.

@Carreau
Copy link
Contributor

Carreau commented Dec 13, 2024

I think 2 of these wheels are already on PyPI, so we don't need to re-store we can "just" store a hash and redownload them checking the hash.

test_wheel_uninstall-1.0.0-py3-none-any.whl is small, and I think the problem I was pointing was that it's not auditable. I think It would be ok to store it in deflated form and zip then rename during the tests.

@ryanking13
Copy link
Member

Thanks for the suggestion! pooch looks interesting and I like the point that it can cache files. One thing that I am worried about is that if we remove the remote files for some reason (or if the URL changes for some reason), the test will break, and users will not be able to handle it easily.

But we are already quite relying on the GitHub infra (storing xbuildenv and the metadata), so everything will break if there is an issue in GitHub anyway... so would be fine to utilize GitHub to store the binary files.

My idea is to use a dummy repository and add the wheels to the GitHub release, and then download them at the time of running the tests using https://github.com/fatiando/pooch (which can cache the files as well).

I think 2 of these wheels are already on PyPI, so we don't need to re-store we can "just" store a hash and redownload them checking the hash.

Yeah, I think we can start with downloading them from PyPI, test_wheel_uninstall-1.0.0-py3-none-any.whl can be replaced with any other package with some complex file structure, so I think we can replace it with some real package in PyPI.

@agriyakhetarpal
Copy link
Member Author

Yes, pooch can be overkill if we don't have a lot of test data and if the files are small. We can also archive the dummy repository so that no one except an administrator will be able to remove the remote files from the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants