Skip to content

Commit d9a5f33

Browse files
hongxiayangAlvant
authored andcommitted
[Hardware][AMD] ROCm6.2 upgrade (vllm-project#8674)
Signed-off-by: Alvant <alvasian@yandex.ru>
1 parent 2e78064 commit d9a5f33

File tree

2 files changed

+61
-60
lines changed

2 files changed

+61
-60
lines changed

Dockerfile.rocm

Lines changed: 19 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,18 @@
1-
# Default ROCm 6.1 base image
2-
ARG BASE_IMAGE="rocm/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging"
1+
# Default ROCm 6.2 base image
2+
ARG BASE_IMAGE="rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0"
33

44
# Default ROCm ARCHes to build vLLM for.
55
ARG PYTORCH_ROCM_ARCH="gfx908;gfx90a;gfx942;gfx1100"
66

77
# Whether to install CK-based flash-attention
88
# If 0, will not install flash-attention
99
ARG BUILD_FA="1"
10-
# If `TRY_FA_WHEEL=1`, we will try installing flash-attention from `FA_WHEEL_URL`
11-
# If this succeeds, we use the downloaded wheel and skip building flash-attention.
12-
# Otherwise, ROCm flash-attention from `FA_BRANCH` will be built for the
13-
# architectures specified in `FA_GFX_ARCHS`
14-
ARG TRY_FA_WHEEL="1"
15-
ARG FA_WHEEL_URL="https://github.com/ROCm/flash-attention/releases/download/v2.5.9post1-cktile-vllm/flash_attn-2.5.9.post1-cp39-cp39-linux_x86_64.whl"
1610
ARG FA_GFX_ARCHS="gfx90a;gfx942"
17-
ARG FA_BRANCH="23a2b1c2"
11+
ARG FA_BRANCH="3cea2fb"
1812

1913
# Whether to build triton on rocm
2014
ARG BUILD_TRITON="1"
21-
ARG TRITON_BRANCH="e0fc12c"
15+
ARG TRITON_BRANCH="e192dba"
2216

2317
### Base image build stage
2418
FROM $BASE_IMAGE AS base
@@ -50,16 +44,17 @@ RUN python3 -m pip install --upgrade pip
5044
# Remove sccache so it doesn't interfere with ccache
5145
# TODO: implement sccache support across components
5246
RUN apt-get purge -y sccache; python3 -m pip uninstall -y sccache; rm -f "$(which sccache)"
53-
# Install torch == 2.5.0 on ROCm
47+
48+
# Install torch == 2.6.0 on ROCm
5449
RUN --mount=type=cache,target=/root/.cache/pip \
5550
case "$(ls /opt | grep -Po 'rocm-[0-9]\.[0-9]')" in \
56-
*"rocm-6.1"*) \
51+
*"rocm-6.2"*) \
5752
python3 -m pip uninstall -y torch torchvision \
5853
&& python3 -m pip install --pre \
59-
torch==2.5.0.dev20240726 \
60-
cmake>=3.26 ninja packaging setuptools-scm>=8 wheel jinja2 \
61-
torchvision==0.20.0.dev20240726 \
62-
--extra-index-url https://download.pytorch.org/whl/nightly/rocm6.1 ;; \
54+
torch==2.6.0.dev20240918 \
55+
setuptools-scm>=8 \
56+
torchvision==0.20.0.dev20240918 \
57+
--extra-index-url https://download.pytorch.org/whl/nightly/rocm6.2;; \
6358
*) ;; esac
6459

6560
ENV LLVM_SYMBOLIZER_PATH=/opt/rocm/llvm/bin/llvm-symbolizer
@@ -81,25 +76,18 @@ RUN cd /opt/rocm/share/amd_smi \
8176
### Flash-Attention wheel build stage
8277
FROM base AS build_fa
8378
ARG BUILD_FA
84-
ARG TRY_FA_WHEEL
85-
ARG FA_WHEEL_URL
8679
ARG FA_GFX_ARCHS
8780
ARG FA_BRANCH
8881
# Build ROCm flash-attention wheel if `BUILD_FA = 1`
8982
RUN --mount=type=cache,target=${CCACHE_DIR} \
9083
if [ "$BUILD_FA" = "1" ]; then \
91-
if [ "${TRY_FA_WHEEL}" = "1" ] && python3 -m pip install "${FA_WHEEL_URL}"; then \
92-
# If a suitable wheel exists, we download it instead of building FA
93-
mkdir -p /install && wget -N "${FA_WHEEL_URL}" -P /install; \
94-
else \
95-
mkdir -p libs \
96-
&& cd libs \
97-
&& git clone https://github.com/ROCm/flash-attention.git \
98-
&& cd flash-attention \
99-
&& git checkout "${FA_BRANCH}" \
100-
&& git submodule update --init \
101-
&& GPU_ARCHS="${FA_GFX_ARCHS}" python3 setup.py bdist_wheel --dist-dir=/install; \
102-
fi; \
84+
mkdir -p libs \
85+
&& cd libs \
86+
&& git clone https://github.com/ROCm/flash-attention.git \
87+
&& cd flash-attention \
88+
&& git checkout "${FA_BRANCH}" \
89+
&& git submodule update --init \
90+
&& GPU_ARCHS="${FA_GFX_ARCHS}" python3 setup.py bdist_wheel --dist-dir=/install; \
10391
# Create an empty directory otherwise as later build stages expect one
10492
else mkdir -p /install; \
10593
fi
@@ -114,6 +102,7 @@ RUN --mount=type=cache,target=${CCACHE_DIR} \
114102
if [ "$BUILD_TRITON" = "1" ]; then \
115103
mkdir -p libs \
116104
&& cd libs \
105+
&& python3 -m pip install ninja cmake wheel pybind11 \
117106
&& git clone https://github.com/OpenAI/triton.git \
118107
&& cd triton \
119108
&& git checkout "${TRITON_BRANCH}" \
@@ -143,13 +132,6 @@ RUN --mount=type=cache,target=${CCACHE_DIR} \
143132
--mount=type=bind,source=.git,target=.git \
144133
--mount=type=cache,target=/root/.cache/pip \
145134
python3 -m pip install -Ur requirements-rocm.txt \
146-
&& case "$(ls /opt | grep -Po 'rocm-[0-9]\.[0-9]')" in \
147-
*"rocm-6.1"*) \
148-
# Bring in upgrades to HIP graph earlier than ROCm 6.2 for vLLM
149-
wget -N https://github.com/ROCm/vllm/raw/fa78403/rocm_patch/libamdhip64.so.6 -P /opt/rocm/lib \
150-
# Prevent interference if torch bundles its own HIP runtime
151-
&& rm -f "$(python3 -c 'import torch; print(torch.__path__[0])')"/lib/libamdhip64.so* || true;; \
152-
*) ;; esac \
153135
&& python3 setup.py clean --all \
154136
&& python3 setup.py develop
155137

docs/source/getting_started/amd-installation.rst

Lines changed: 42 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,17 @@
33
Installation with ROCm
44
======================
55

6-
vLLM supports AMD GPUs with ROCm 6.1.
6+
vLLM supports AMD GPUs with ROCm 6.2.
77

88
Requirements
99
------------
1010

1111
* OS: Linux
12-
* Python: 3.8 -- 3.11
12+
* Python: 3.9 -- 3.12
1313
* GPU: MI200s (gfx90a), MI300 (gfx942), Radeon RX 7900 series (gfx1100)
14-
* ROCm 6.1
14+
* ROCm 6.2
15+
16+
Note: PyTorch 2.5+/ROCm6.2 dropped the support for python 3.8.
1517

1618
Installation options:
1719

@@ -27,7 +29,7 @@ You can build and install vLLM from source.
2729

2830
First, build a docker image from `Dockerfile.rocm <https://github.com/vllm-project/vllm/blob/main/Dockerfile.rocm>`_ and launch a docker container from the image.
2931

30-
`Dockerfile.rocm <https://github.com/vllm-project/vllm/blob/main/Dockerfile.rocm>`_ uses ROCm 6.1 by default, but also supports ROCm 5.7 and 6.0 in older vLLM branches.
32+
`Dockerfile.rocm <https://github.com/vllm-project/vllm/blob/main/Dockerfile.rocm>`_ uses ROCm 6.2 by default, but also supports ROCm 5.7, 6.0 and 6.1 in older vLLM branches.
3133
It provides flexibility to customize the build of docker image using the following arguments:
3234

3335
* `BASE_IMAGE`: specifies the base image used when running ``docker build``, specifically the PyTorch on ROCm base image.
@@ -39,13 +41,13 @@ It provides flexibility to customize the build of docker image using the followi
3941
Their values can be passed in when running ``docker build`` with ``--build-arg`` options.
4042

4143

42-
To build vllm on ROCm 6.1 for MI200 and MI300 series, you can use the default:
44+
To build vllm on ROCm 6.2 for MI200 and MI300 series, you can use the default:
4345

4446
.. code-block:: console
4547
4648
$ DOCKER_BUILDKIT=1 docker build -f Dockerfile.rocm -t vllm-rocm .
4749
48-
To build vllm on ROCm 6.1 for Radeon RX7900 series (gfx1100), you should specify ``BUILD_FA`` as below:
50+
To build vllm on ROCm 6.2 for Radeon RX7900 series (gfx1100), you should specify ``BUILD_FA`` as below:
4951

5052
.. code-block:: console
5153
@@ -79,9 +81,8 @@ Option 2: Build from source
7981

8082
- `ROCm <https://rocm.docs.amd.com/en/latest/deploy/linux/index.html>`_
8183
- `PyTorch <https://pytorch.org/>`_
82-
- `hipBLAS <https://rocm.docs.amd.com/projects/hipBLAS/en/latest/install.html>`_
8384

84-
For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging`, `rocm/pytorch-nightly`.
85+
For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0`, `rocm/pytorch-nightly`.
8586

8687
Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTorch installation guide in PyTorch `Getting Started <https://pytorch.org/get-started/locally/>`_
8788

@@ -90,34 +91,53 @@ Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTor
9091

9192
Install ROCm's Triton flash attention (the default triton-mlir branch) following the instructions from `ROCm/triton <https://github.com/ROCm/triton/blob/triton-mlir/README.md>`_
9293

94+
.. code-block:: console
95+
96+
$ python3 -m pip install ninja cmake wheel pybind11
97+
$ pip uninstall -y triton
98+
$ git clone https://github.com/OpenAI/triton.git
99+
$ cd triton
100+
$ git checkout e192dba
101+
$ cd python
102+
$ pip3 install .
103+
$ cd ../..
104+
105+
.. note::
106+
- If you see HTTP issue related to downloading packages during building triton, please try again as the HTTP error is intermittent.
107+
108+
93109
2. Optionally, if you choose to use CK flash attention, you can install `flash attention for ROCm <https://github.com/ROCm/flash-attention/tree/ck_tile>`_
94110

111+
95112
Install ROCm's flash attention (v2.5.9.post1) following the instructions from `ROCm/flash-attention <https://github.com/ROCm/flash-attention/tree/ck_tile#amd-gpurocm-support>`_
96113
Alternatively, wheels intended for vLLM use can be accessed under the releases.
97114

98-
.. note::
99-
- You might need to downgrade the "ninja" version to 1.10 it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4`)
115+
For example, for ROCm 6.2, suppose your gfx arch is `gfx90a`.
116+
Note to get your gfx architecture, run `rocminfo |grep gfx`.
100117

101-
3. Build vLLM.
102-
103-
.. code-block:: console
118+
.. code-block:: console
104119
105-
$ cd vllm
106-
$ pip install -U -r requirements-rocm.txt
107-
$ python setup.py develop # This may take 5-10 minutes. Currently, `pip install .` does not work for ROCm installation
120+
$ git clone https://github.com/ROCm/flash-attention.git
121+
$ cd flash-attention
122+
$ git checkout 3cea2fb
123+
$ git submodule update --init
124+
$ GPU_ARCHS="gfx90a" python3 setup.py install
125+
$ cd ..
108126
127+
.. note::
128+
- You might need to downgrade the "ninja" version to 1.10 it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4`)
109129

110-
.. tip::
130+
3. Build vLLM.
111131

112-
For example, vLLM v0.5.3 on ROCM 6.1 can be built with the following steps:
132+
For example, vLLM on ROCM 6.2 can be built with the following steps:
113133

114134
.. code-block:: console
115135
116136
$ pip install --upgrade pip
117137
118138
$ # Install PyTorch
119139
$ pip uninstall torch -y
120-
$ pip install --no-cache-dir --pre torch==2.5.0.dev20240726 --index-url https://download.pytorch.org/whl/nightly/rocm6.1
140+
$ pip install --no-cache-dir --pre torch==2.6.0.dev20240918 --index-url https://download.pytorch.org/whl/nightly/rocm6.2
121141
122142
$ # Build & install AMD SMI
123143
$ pip install /opt/rocm/share/amd_smi
@@ -127,15 +147,14 @@ Alternatively, wheels intended for vLLM use can be accessed under the releases.
127147
$ pip install "numpy<2"
128148
$ pip install -r requirements-rocm.txt
129149
130-
$ # Apply the patch to ROCM 6.1 (requires root permission)
131-
$ wget -N https://github.com/ROCm/vllm/raw/fa78403/rocm_patch/libamdhip64.so.6 -P /opt/rocm/lib
132-
$ rm -f "$(python3 -c 'import torch; print(torch.__path__[0])')"/lib/libamdhip64.so*
133-
134150
$ # Build vLLM for MI210/MI250/MI300.
135151
$ export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
136152
$ python3 setup.py develop
137153
138154
155+
This may take 5-10 minutes. Currently, `pip install .`` does not work for ROCm installation
156+
157+
139158
.. tip::
140159

141160
- Triton flash attention is used by default. For benchmarking purposes, it is recommended to run a warm up step before collecting perf numbers.

0 commit comments

Comments
 (0)