-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPU v4 install guide #108
base: main
Are you sure you want to change the base?
TPU v4 install guide #108
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
sudo apt remove unattended-upgrades | ||
sudo apt update | ||
export PJRT_DEVICE=TPU | ||
artus-LYTiQ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
export PATH="$HOME/.local/bin:$PATH" | ||
pip install build | ||
pip install --upgrade setuptools | ||
sudo apt install python3.10-venv | ||
|
||
git clone https://github.com/huggingface/optimum-tpu.git | ||
|
||
cd optimum-tpu | ||
make | ||
make build_dist_install_tools | ||
make build_dist | ||
|
||
python -m venv optimum_tpu_env | ||
source optimum_tpu_env/bin/activate | ||
Comment on lines
+16
to
+17
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do you need a virtual environment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The regular install of optimum-tpu always tried to go for a system wide installation which would then fail. I had to choose between --install-option="--prefix=/SOME/DIR/" and a venv and considered the venv my prefered way of handling this (and future) conflicts. I wanted a pip -e install as I was actively developing against some of the files. YMMV for a package install. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand, but this is a user choice too. Some people might prefer venv, others virtualenv, conda or even a docker image. I think it would be better to take it out from the script, leaving other users the freedom to choose their environment. |
||
|
||
pip install torch==2.4.0 torch_xla[tpu]==2.4.0 torchvision -f https://storage.googleapis.com/libtpu-releases/index.html | ||
pip uninstall torchvision # it might insist von 2.4.1 | ||
artus-LYTiQ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
pip install -e . | ||
|
||
huggingface-cli login | ||
gsutil cp -r gs://entropix/huggingface_hub ~/.cache/huggingface/hub | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is this for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be rejected. Local install for custom changes and experiments. The bucket is one of our project buckets anyway. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,10 +61,11 @@ tests = ["pytest", "safetensors"] | |
quality = ["black", "ruff", "isort"] | ||
# Jetstream/Pytorch support is experimental for now, it needs to be installed manually. | ||
# Pallas is pulled because it will install a compatible version of jax[tpu]. | ||
jetstream-pt = [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you do not need to comment this: you will only install it if you do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok |
||
"jetstream-pt", | ||
"torch-xla[pallas] == 2.4.0" | ||
] | ||
# pallas and jetstream are not supported before v5e. Therefore, comment out on v4 and earlier | ||
#jetstream-pt = [ | ||
# "jetstream-pt", | ||
# "torch-xla[pallas] == 2.4.0" | ||
#] | ||
|
||
[project.urls] | ||
Homepage = "https://hf.co/hardware" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove unattended-upgrades?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They kicked off twice, each after a sudo apt update and kept the TPU VM stuck for more than 90 minutes before I decided to just kill them. I consider the lifetime of a TPU VM to be short and the VM not to be exposed to the outside world. Hence, I think getting a stuck (costly) VM due to some potentially non-critical updates seems worse than not having this service and instead doing updates as per your own schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the issue, but I think that depends on the distribution you are using (I haven't experienced it so far), not necessarily related to
optimum-tpu
, that should provide tools for machine learning on TPUs. Please remove this command, consider doing the command when you are setting up your machine, before usingoptimum-tpu
.