Skip to content

Commit 40561fd

Browse files
committedMay 25, 2024
Readme Updated with Images
1 parent 154088c commit 40561fd

File tree

3 files changed

+1
-5
lines changed

3 files changed

+1
-5
lines changed
 

‎model-deployment/containers/llama2/README.md

+1-5
Original file line numberDiff line numberDiff line change
@@ -233,10 +233,6 @@ The Container creation process is going to be the same as TGI. All associated fi
233233
* For `13b llama2` model, use the custom environment variable to override the default tensor parallelism as 2, to shard the model on 2 GPU cards.
234234
* Set custom environment variable key `TENSOR_PARALLELISM` with value `2`
235235
* You can override more vllm bootstrapping configuration using `PARAMS` environment configuration. For details of configurations, please refer the official vLLM [doc](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html).
236-
* If you are downloading models directly from source, we will need these additional environment variable configurations:
237-
* Set custom environment variable key `TOKEN_FILE` with value `/opt/ds/model/deployed_model/token`, as the token will be available at this path.
238-
* Set custom environment variable key `MODEL` with value `meta-llama/Llama-2-13b-hf`, this is the model that will be downloaded during container start.
239-
* Set custom environment variable key `STORAGE_SIZE_IN_GB` with value `950` for 7b model. This is required as model will be downloaded at runtime, so we need to keep extra storage size to accommodate various model sizes.
240236
* Since in the api server file, we have already changed the prediction endpoint to /predict, we don't need any other overrides.
241237
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier
242238
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU3.2` instance
@@ -411,4 +407,4 @@ For more detailed level of debugging, user can refer [README-DEBUG.md](./README-
411407
412408
`make shell.vllm` to launch container with shell prompt
413409
414-
`make stop.vllm` to stop the running container
410+
`make stop.vllm` to stop the running container
Loading
Loading

0 commit comments

Comments
 (0)
Failed to load comments.