Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve portability of reproducible tarballs by replacing external tar command with tarfile module from Python standard library #4660

Merged
merged 31 commits into from
Dec 18, 2024

Conversation

lexming
Copy link
Contributor

@lexming lexming commented Sep 27, 2024

fixes #4657

  • use more portable --date argument for touch
  • catch failed commands inside the pipeline
  • move generation of command to make reproducible archives intro its own method
  • replace harcoded pattern in tests of reproducible archives command for call to filetools.reproducible_archive_cmd
  • add required argument to filetools.find_extensions()
  • make new implementation of reproducible_archive_cmd using the tarfile module
  • added new filetools.make_archive() method and related unit test
  • change handling of filename argument in filetools.get_source_tarball_from_git() to allow any extension and pass it to make_archive()
  • make_archive() now supports uncompressed tarballs and compressed tarballs in GZIP, BZIP2 and XZ
  • make_archive() can create reproducible tarballs in .tar or .tar.gz format
  • checksums of sources from git repos will be verified in Python 3.9+

@lexming lexming force-pushed the reprod-tarballs-mac branch from 73b39e2 to d0a55ba Compare September 30, 2024 07:39
@boegel boegel added this to the 5.0 milestone Oct 2, 2024
@boegel boegel changed the title use more portable --date argument for touch command used in reproducible tarballs use more portable --date argument for touch command used in reproducible tarballs Oct 2, 2024
@lexming lexming changed the title use more portable --date argument for touch command used in reproducible tarballs improve portability of reproducible tarballs by replacing external tar command with tarfile module Oct 9, 2024
@lexming
Copy link
Contributor Author

lexming commented Nov 6, 2024

@boegel This one is ready. As discussed, archives will be made with tarfile instead of the tar command and checksums of sources from git repos will only be checked with Python 3.9+.

@boegel boegel changed the title improve portability of reproducible tarballs by replacing external tar command with tarfile module improve portability of reproducible tarballs by replacing external tar command with tarfile module Nov 13, 2024
@boegel
Copy link
Member

boegel commented Nov 13, 2024

@lexming We need to make sure that git_config entries that still use .tar.gz as extension don't leading to clone+tar+xz every time, we also need to check whether the source filename with .tar.xz already exists?

@boegel
Copy link
Member

boegel commented Nov 28, 2024

TODO here:

  • change so function that creates tarballs for git_config sources honors the provided file extension;
  • only .tar.xz can be produced reproducibly (and only using Python >= 3.9);

@lexming
Copy link
Contributor Author

lexming commented Dec 3, 2024

@boegel this is ready on my side, summary of the final behaviour:

  • (no change) git_config continues to provide filename with extension
  • (enhancement) get_source_tarball_from_git will respect file extension
  • (enhancement) checksums of sources in git_config will be verified whenever:
    1. checksum is provided in easyconfig
    2. Python 3.9+
  • (enhancement) regular archives of git repos can now be created for uncompressed .tar or compressed tarballs in GZIP, XZ and BZIP2
  • (enhancement) reproducible archives are supported for:
    1. Uncompressed tarball or compressed with XZ
    2. Python 3.9+

This means that existing easyconfigs with .tar.gz sources and no checksums will continue to behave in the same way (no verification) across all Python versions. Only when a checksum is manually added, it will trigger checksum verification on Python 3.9+.

We can migrate existing easyconfigs to .tar.xz and include a checksum to start using reproducible tarballs. Users on older Pythons will see a deprecation warning and fallback to current behaviour (no verification) and installation will proceed.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel boegel enabled auto-merge December 18, 2024 11:19
@boegel boegel merged commit 9fcee6b into easybuilders:5.0.x Dec 18, 2024
39 checks passed
@lexming lexming deleted the reprod-tarballs-mac branch December 18, 2024 13:24
@boegel boegel changed the title improve portability of reproducible tarballs by replacing external tar command with tarfile module improve portability of reproducible tarballs by replacing external tar command with tarfile module from Python standard library Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants