You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: model-deployment/containers/llama2/README.md
+1-5
Original file line number
Diff line number
Diff line change
@@ -233,10 +233,6 @@ The Container creation process is going to be the same as TGI. All associated fi
233
233
* For `13b llama2` model, use the custom environment variable to override the default tensor parallelism as 2, to shard the model on 2 GPU cards.
234
234
* Set custom environment variable key `TENSOR_PARALLELISM` with value `2`
235
235
* You can override more vllm bootstrapping configuration using `PARAMS` environment configuration. For details of configurations, please refer the official vLLM [doc](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html).
236
-
* If you are downloading models directly from source, we will need these additional environment variable configurations:
237
-
* Set custom environment variable key `TOKEN_FILE` with value `/opt/ds/model/deployed_model/token`, as the token will be available at this path.
238
-
* Set custom environment variable key `MODEL` with value `meta-llama/Llama-2-13b-hf`, this is the model that will be downloaded during container start.
239
-
* Set custom environment variable key `STORAGE_SIZE_IN_GB` with value `950` for 7b model. This is required as model will be downloaded at runtime, so we need to keep extra storage size to accommodate various model sizes.
240
236
* Since in the api server file, we have already changed the prediction endpoint to /predict, we don't need any other overrides.
241
237
* Under `Models` click on the `Select` button and selectthe Model Catalog entry we created earlier
242
238
* Under `Compute` and then`Specialty and previous generation`selectthe`VM.GPU3.2` instance
@@ -411,4 +407,4 @@ For more detailed level of debugging, user can refer [README-DEBUG.md](./README-
411
407
412
408
`make shell.vllm` to launch container with shell prompt
0 commit comments