Skip to content

Refactor pytorch_dev_ubuntu_24.04 Dockerfile to use nightly release #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marbre opened this issue Apr 8, 2025 · 5 comments · Fixed by #388
Closed

Refactor pytorch_dev_ubuntu_24.04 Dockerfile to use nightly release #372

marbre opened this issue Apr 8, 2025 · 5 comments · Fixed by #388
Assignees
Labels
CICD Related to CI/CD Infra

Comments

@marbre
Copy link
Member

marbre commented Apr 8, 2025

The Dockerfile contains multiple stages (build_rocm, pytorch_sources, pytorch_build, pytorch) which could be at least partly refactored. As a first step build_rocm, which builds ROCm via build_rocm.sh

RUN --mount=type=bind,target=/therock/src,rw bash /build_rocm.sh "$AMDGPU_TARGETS"

can be switched over to use the nightly tarball instead of building ROCm "again".

@stellaraccident
Copy link
Collaborator

Let's just make sure we keep the path paved so that people can make changes to rocm and produce a pt docker on their own. Folks don't love how slow it is to build this all together but it does enable a development flow that works and we need to not lose that.

@erman-gurses erman-gurses self-assigned this Apr 8, 2025
@erman-gurses erman-gurses moved this from TODO to In Progress in TheRock CI/CD 🪨🚀 Apr 9, 2025
@erman-gurses erman-gurses linked a pull request Apr 14, 2025 that will close this issue
@amd-chrissosa amd-chrissosa added the CICD Related to CI/CD Infra label Apr 17, 2025
@marbre
Copy link
Member Author

marbre commented Apr 24, 2025

Synced with @geomin12 two days ago regarding the artifacts https://therock-artifacts.s3.us-east-2.amazonaws.com/14366952921/index-gfx110X-dgpu.html https://github.com/ROCm/TheRock/blob/main/build_tools/fetch_artifacts.py https://github.com/ROCm/TheRock/blob/main/.github/actions/setup_test_environment/action.yml#L56-L61

These are build artifacts, produced by build_linux_packages.yml whereas the nightly release is produced within portable_linux_package_matrix.yml and the tarball is uploaded to https://github.com/ROCm/TheRock/releases/tag/nightly-release. With build_tools/provision.py there is a script that can help to get either or but as the issue title says I suggest to use the nightly release.

@erman-gurses
Copy link
Contributor

erman-gurses commented Apr 25, 2025

The total duration of building reduced to 34 minutes.
Image

@erman-gurses
Copy link
Contributor

erman-gurses commented Apr 29, 2025

For the remaining items pytorch_sources, pytorch_build, and pytorch, this issue can be opened again in the future however for now build_rocm refactoring is sufficient so issue can be closed after the 388 is landed
cc: @marbre

@github-project-automation github-project-automation bot moved this from In Progress to Done in TheRock CI/CD 🪨🚀 Apr 29, 2025
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD Related to CI/CD Infra
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants