|
| 1 | +# Deploy Meta-Llama-3-8B-Instruct with Oracle Service Managed vLLM(0.3.0) Container |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +This how-to will show how to use the Oracle Data Science Service Managed Containers - part of the Quick Actions feature, to inference with a model downloaded from Hugging Face. For this we will use [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) from Meta. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. |
| 6 | + |
| 7 | +## Required IAM Policies |
| 8 | + |
| 9 | +Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#required-iam-policies) to grant access to OCI services. |
| 10 | + |
| 11 | +## Setup |
| 12 | + |
| 13 | +```python |
| 14 | +# Install required python packages |
| 15 | + |
| 16 | +!pip install oracle-ads |
| 17 | +!pip install oci |
| 18 | +!pip install huggingface_hub |
| 19 | +``` |
| 20 | + |
| 21 | +```python |
| 22 | +# Uncomment this code and set the correct proxy links if have to setup proxy for internet |
| 23 | +# import os |
| 24 | +# os.environ['http_proxy']="http://myproxy" |
| 25 | +# os.environ['https_proxy']="http://myproxy" |
| 26 | + |
| 27 | +# Use os.environ['no_proxy'] to route traffic directly |
| 28 | +``` |
| 29 | + |
| 30 | +```python |
| 31 | +import ads |
| 32 | +ads.set_auth("resource_principal") |
| 33 | +``` |
| 34 | + |
| 35 | +```python |
| 36 | +# Extract region information from the Notebook environment variables and signer. |
| 37 | +ads.common.utils.extract_region() |
| 38 | +``` |
| 39 | + |
| 40 | +### Common variables |
| 41 | + |
| 42 | +```python |
| 43 | +# change as required for your environment |
| 44 | +compartment_id = os.environ["PROJECT_COMPARTMENT_OCID"] |
| 45 | +project_id = os.environ["PROJECT_OCID"] |
| 46 | + |
| 47 | +log_group_id = "ocid1.loggroup.oc1.xxx.xxxxx" |
| 48 | +log_id = "cid1.log.oc1.xxx.xxxxx" |
| 49 | + |
| 50 | +instance_shape = "VM.GPU.A10.1" |
| 51 | +container_image = "dsmc://odsc-vllm-serving:0.3.0.7" |
| 52 | +region = "us-ashburn-1" |
| 53 | +``` |
| 54 | + |
| 55 | +The container image referenced above (`dsmc://odsc-vllm-serving:0.3.0.7`) is an Oracle Service Managed container that was build with: |
| 56 | + |
| 57 | +- Oracle Linux 8 - Slim |
| 58 | +- CUDA 12.4 |
| 59 | +- cuDNN 9 |
| 60 | +- Torch 2.1.2 |
| 61 | +- Python 3.11.5 |
| 62 | +- vLLM v0.3.0 |
| 63 | + |
| 64 | +## Prepare The Model Artifacts |
| 65 | + |
| 66 | +To prepare Model artifacts for LLM model deployment: |
| 67 | + |
| 68 | +- Download the model files from huggingface to local directory using a valid huggingface token (only needed for gated models). If you don't have Huggingface Token, refer [this](https://huggingface.co/docs/hub/en/security-tokens) to generate one. |
| 69 | +- Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the `namespace`, `compartment`, and `bucketname`. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console. |
| 70 | +- Create model catalog entry for the model using the Object storage path |
| 71 | + |
| 72 | +### Model Download from HuggingFace Model Hub |
| 73 | + |
| 74 | +```python |
| 75 | +# Login to huggingface using env variable |
| 76 | +HUGGINGFACE_TOKEN = "<HUGGINGFACE_TOKEN>" # Your huggingface token |
| 77 | +!huggingface-cli login --token $HUGGINGFACE_TOKEN |
| 78 | +``` |
| 79 | + |
| 80 | +[This](https://huggingface.co/docs/huggingface_hub/guides/download#download-an-entire-repository) provides more information on using `snapshot_download()` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository. |
| 81 | + |
| 82 | +```python |
| 83 | +# Download the LLama3 model from Hugging Face to a local folder. |
| 84 | +# |
| 85 | + |
| 86 | +from huggingface_hub import snapshot_download |
| 87 | +from tqdm.auto import tqdm |
| 88 | + |
| 89 | +model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # copy from https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct |
| 90 | +local_dir = "models/Meta-Llama-3-8B-Instruct" |
| 91 | + |
| 92 | +snapshot_download(repo_id=model_name, local_dir=local_dir, force_download=True, tqdm_class=tqdm) |
| 93 | + |
| 94 | +print(f"Downloaded model {model_name} to {local_dir}") |
| 95 | +``` |
| 96 | + |
| 97 | +## Upload Model to OCI Object Storage |
| 98 | + |
| 99 | +```python |
| 100 | +model_prefix = "Meta-Llama-3-8B-Instruct/" #"<bucket_prefix>" |
| 101 | +bucket= "<bucket_name>" # this should be a versioned bucket |
| 102 | +namespace = "<bucket_namespace>" |
| 103 | + |
| 104 | +!oci os object bulk-upload --src-dir $local_dir --prefix $model_prefix -bn $bucket -ns $namespace --auth "resource_principal" |
| 105 | +``` |
| 106 | + |
| 107 | +## Create Model by Reference using ADS |
| 108 | + |
| 109 | +```python |
| 110 | +from ads.model.datascience_model import DataScienceModel |
| 111 | + |
| 112 | +artifact_path = f"oci://{bucket}@{namespace}/{model_prefix}" |
| 113 | + |
| 114 | +model = (DataScienceModel() |
| 115 | + .with_compartment_id(compartment_id) |
| 116 | + .with_project_id(project_id) |
| 117 | + .with_display_name("Meta-Llama-3-8B-Instruct") |
| 118 | + .with_artifact(artifact_path) |
| 119 | +) |
| 120 | + |
| 121 | +model.create(model_by_reference=True) |
| 122 | +``` |
| 123 | + |
| 124 | +### Import Model Deployment Modules |
| 125 | + |
| 126 | +```python |
| 127 | +from ads.model.deployment import ( |
| 128 | + ModelDeployment, |
| 129 | + ModelDeploymentContainerRuntime, |
| 130 | + ModelDeploymentInfrastructure, |
| 131 | + ModelDeploymentMode, |
| 132 | +) |
| 133 | +``` |
| 134 | + |
| 135 | +### Setup Model Deployment Infrastructure |
| 136 | + |
| 137 | +```python |
| 138 | +infrastructure = ( |
| 139 | + ModelDeploymentInfrastructure() |
| 140 | + .with_project_id(project_id) |
| 141 | + .with_compartment_id(compartment_id) |
| 142 | + .with_shape_name(instance_shape) |
| 143 | + .with_bandwidth_mbps(10) |
| 144 | + .with_replica(1) |
| 145 | + .with_web_concurrency(10) |
| 146 | + .with_access_log( |
| 147 | + log_group_id=log_group_id, |
| 148 | + log_id=log_id, |
| 149 | + ) |
| 150 | + .with_predict_log( |
| 151 | + log_group_id=log_group_id, |
| 152 | + log_id=log_id, |
| 153 | + ) |
| 154 | +) |
| 155 | +``` |
| 156 | + |
| 157 | +### Configure Model Deployment Runtime |
| 158 | + |
| 159 | +```python |
| 160 | +env_var = { |
| 161 | + 'BASE_MODEL': model_prefix, |
| 162 | + 'PARAMS': '--served-model-name odsc-llm --seed 42', |
| 163 | + 'MODEL_DEPLOY_PREDICT_ENDPOINT': '/v1/completions', |
| 164 | + 'MODEL_DEPLOY_ENABLE_STREAMING': 'true' |
| 165 | +} |
| 166 | + |
| 167 | +container_runtime = ( |
| 168 | + ModelDeploymentContainerRuntime() |
| 169 | + .with_image(container_image) |
| 170 | + .with_server_port(8080) |
| 171 | + .with_health_check_port(8080) |
| 172 | + .with_env(env_var) |
| 173 | + .with_deployment_mode(ModelDeploymentMode.HTTPS) |
| 174 | + .with_model_uri(model.id) |
| 175 | + .with_region(region) |
| 176 | +) |
| 177 | +``` |
| 178 | + |
| 179 | +### Deploy Model Using Container Runtime |
| 180 | + |
| 181 | +```python |
| 182 | +deployment = ( |
| 183 | + ModelDeployment() |
| 184 | + .with_display_name(f"Meta-Llama-3-8B-Instruct with vLLM SMC") |
| 185 | + .with_description("Deployment of Meta-Llama-3-8B-Instruct MD with vLLM(0.3.0) container") |
| 186 | + .with_infrastructure(infrastructure) |
| 187 | + .with_runtime(container_runtime) |
| 188 | +).deploy(wait_for_completion=False) |
| 189 | +``` |
| 190 | + |
| 191 | +### Inference |
| 192 | + |
| 193 | +Once the model deployment has reached the Active state, we can invoke the model deployment endpoint to interact with the LLM. More details on different ways for accessing MD endpoints is documented [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/ai-quick-actions/model-deployment-tips.md). |
| 194 | + |
| 195 | + |
| 196 | +#### How to prompt Llama 3 |
| 197 | + |
| 198 | +The base models have no prompt format. The Instruct versions use the following conversation structure: |
| 199 | + |
| 200 | +```xml |
| 201 | +<|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| 202 | + |
| 203 | +{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|> |
| 204 | + |
| 205 | +{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
| 206 | + |
| 207 | +{{ model_answer_1 }}<|eot_id|> |
| 208 | +``` |
| 209 | + |
| 210 | +This format has to be exactly reproduced for effective use. |
| 211 | + |
| 212 | + |
| 213 | +```python |
| 214 | +import requests |
| 215 | +import ads |
| 216 | +from string import Template |
| 217 | + |
| 218 | +ads.set_auth("resource_principal") |
| 219 | + |
| 220 | +requests.post( |
| 221 | + "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{deployment.model_deployment_id}/predict", |
| 222 | + json={ |
| 223 | + "model": "odsc-llm", |
| 224 | + "prompt": Template( |
| 225 | + """"|begin_of_text|><|start_header_id|>user<|end_header_id|> $prompt <|eot_id|><|start_header_id|>assistant<|end_header_id|>""" |
| 226 | + ).substitute( |
| 227 | + prompt="What amateur radio band can a general class license holder use?" |
| 228 | + ), |
| 229 | + "max_tokens": 250, |
| 230 | + "temperature": 0.7, |
| 231 | + "top_p": 0.8, |
| 232 | + }, |
| 233 | + auth=ads.common.auth.default_signer()["signer"], |
| 234 | + headers={}, |
| 235 | +).json() |
| 236 | +``` |
| 237 | + |
| 238 | +#### Output |
| 239 | + |
| 240 | +The LLM produced a great output: |
| 241 | + |
| 242 | +> A) All amateur radio bands |
| 243 | +B) All amateur radio bands except the 10-meter band |
| 244 | +C) All amateur radio bands except the 160-meter band |
| 245 | +D) All amateur radio bands except the 2200-meter band |
| 246 | +> |
| 247 | +Answer: A) All amateur radio bands |
| 248 | +\</s\> |
| 249 | +> |
| 250 | +The FCC grants a general class license to amateur radio operators who pass the Element 3 exam. The exam covers the following topics: |
| 251 | +> |
| 252 | +* FCC rules and regulations |
| 253 | +* Amateur radio practices and procedures |
| 254 | +* Radio theory and operating practices |
| 255 | +* Antennas and transmission lines |
| 256 | +* Electronic circuits and devices |
| 257 | +* RF safety and environmental concerns |
| 258 | +> |
| 259 | +As a general class license holder, an amateur radio operator is authorized to use all amateur radio bands, which include: |
| 260 | +> |
| 261 | +* 160 meters (1.8 MHz to 2 MHz) |
| 262 | +* 80 meters (3.5 MHz to 4 MHz) |
| 263 | +* 40 meters (7 MHz to 7.3 MHz) |
| 264 | +* 30 meters (10.1 MHz to 10.3 MHz) |
| 265 | +* 20 meters (14 MHz to 14.3 MHz) |
| 266 | +* 17 meters (18.1 MHz to 18.3 MHz) |
| 267 | +* 15 meters (21 MHz to 21.3 MHz |
| 268 | + |
| 269 | +The raw output: |
| 270 | + |
| 271 | +```json |
| 272 | +{ |
| 273 | + "id": "cmpl-2d57a83cb6544b768abc00c1b3b7ffc5", |
| 274 | + "object": "text_completion", |
| 275 | + "created": 2277, |
| 276 | + "model": "odsc-llm", |
| 277 | + "choices": [ |
| 278 | + { |
| 279 | + "index": 0, |
| 280 | + "text": "A) All amateur radio bands\nB) All amateur radio bands except the 10-meter band\nC) All amateur radio bands except the 160-meter band\nD) All amateur radio bands except the 2200-meter band\n\nAnswer: A) All amateur radio bands\n</s>\n\nThe FCC grants a general class license to amateur radio operators who pass the Element 3 exam. The exam covers the following topics:\n\n* FCC rules and regulations\n* Amateur radio practices and procedures\n* Radio theory and operating practices\n* Antennas and transmission lines\n* Electronic circuits and devices\n* RF safety and environmental concerns\n\nAs a general class license holder, an amateur radio operator is authorized to use all amateur radio bands, which include:\n\n* 160 meters (1.8 MHz to 2 MHz)\n* 80 meters (3.5 MHz to 4 MHz)\n* 40 meters (7 MHz to 7.3 MHz)\n* 30 meters (10.1 MHz to 10.3 MHz)\n* 20 meters (14 MHz to 14.3 MHz)\n* 17 meters (18.1 MHz to 18.3 MHz)\n* 15 meters (21 MHz to 21.3 MHz", |
| 281 | + "logprobs": null, |
| 282 | + "finish_reason": "length" |
| 283 | + } |
| 284 | + ], |
| 285 | + "usage": { |
| 286 | + "prompt_tokens": 19, |
| 287 | + "total_tokens": 269, |
| 288 | + "completion_tokens": 250 |
| 289 | + } |
| 290 | +} |
| 291 | +``` |
| 292 | + |
| 293 | +#### Using the model from [LangChain](https://python.langchain.com/v0.1/docs/integrations/llms/oci_model_deployment_endpoint/) |
| 294 | + |
| 295 | +```python |
| 296 | +import ads |
| 297 | +from langchain_community.llms import OCIModelDeploymentVLLM |
| 298 | +from string import Template |
| 299 | + |
| 300 | +ads.set_auth("resource_principal") |
| 301 | + |
| 302 | +llm = OCIModelDeploymentVLLM( |
| 303 | + endpoint="https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{deployment.model_deployment_id}/predict", |
| 304 | + model="odsc-llm", |
| 305 | +) |
| 306 | + |
| 307 | +llm.invoke( |
| 308 | + input=Template( |
| 309 | + """"|begin_of_text|><|start_header_id|>user<|end_header_id|> $prompt <|eot_id|><|start_header_id|>assistant<|end_header_id|>""" |
| 310 | + ).substitute( |
| 311 | + prompt="What amateur radio bands are best to use when there are solar flares?" |
| 312 | + ), |
| 313 | + max_tokens=500, |
| 314 | + temperature=0, |
| 315 | + p=0.9, |
| 316 | + stop=["<|eot_id|>"], |
| 317 | + skip_special_tokens=False, |
| 318 | +) |
| 319 | +``` |
| 320 | + |
| 321 | +Output: |
| 322 | + |
| 323 | +> During solar flares, the ionosphere can become highly ionized, causing radio signals to be refracted and scattered in unpredictable ways. This can make it challenging to communicate on certain amateur radio bands. However, some bands are more affected than others. Here's a general guideline on which amateur radio bands to use during solar flares: |
| 324 | +> |
| 325 | +**Avoid:** |
| 326 | +> |
| 327 | +1. **HF (3-30 MHz) bands**: These bands are most affected by solar flares, as the ionosphere can become highly ionized, causing signal refraction and scattering. Signals may be severely attenuated or even completely absorbed. |
| 328 | +2. **20m (14 MHz) and 15m (21 MHz) bands**: These bands are also prone to significant signal degradation due to the ionosphere's increased ionization. |
| 329 | +> |
| 330 | +**Use:** |
| 331 | +> |
| 332 | +1. **VHF (50-250 MHz) bands**: These bands are less affected by solar flares, as the ionosphere's ionization has less impact on signal propagation. Signals are more likely to follow a more predictable path. |
| 333 | +2. **UHF (300-3000 MHz) bands**: These bands are even less affected by solar flares, as the ionosphere's ionization has a minimal impact on signal propagation. |
| 334 | +3. **SHF (3-30 GHz) bands**: These bands are generally not affected by solar flares, as the ionosphere's ionization has a negligible impact on signal propagation. |
| 335 | +> |
| 336 | +**Tips:** |
| 337 | +> |
| 338 | +1. **Monitor propagation conditions**: Keep an eye on propagation forecasts and reports from other amateur radio operators to adjust your operating frequency and mode accordingly. |
| 339 | +2. **Use digital modes**: Digital modes like PSK31, FT8, and JT65 are more resistant to signal degradation and can be a good choice during solar flares. |
| 340 | +3. **Experiment with different frequencies**: If you're experiencing difficulties on a particular frequency, try switching to a different frequency within the same band to see if the signal improves. |
| 341 | +4. **Keep an eye on the solar flare's impact**: Monitor the solar flare's intensity and duration to adjust your operating strategy accordingly. |
| 342 | +> |
| 343 | +Remember, solar flares can have unpredictable effects on radio propagation, so it's essential to stay flexible and adapt to changing conditions. |
0 commit comments