Skip to content

Commit e16d2f4

Browse files
authored
Merge pull request oracle-samples#454 from oracle-samples/daren-llama3-blog
Daren llama3 blog
2 parents 63a05fa + dda1f6f commit e16d2f4

File tree

2 files changed

+345
-1
lines changed

2 files changed

+345
-1
lines changed

ai-quick-actions/README.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ Manager (ORM) or manually. For further information see [Policies](policies/READM
1515

1616
How-To Blogs:
1717

18-
1. [Deploy ELYZA-japanese-Llama-2-13b-instruct with Oracle Service Managed vLLM(0.3.0) Container](deploy-with-smc.md)
18+
1. [Deploy Meta-Llama-3-8B-Instruct with Oracle Service Managed vLLM(0.3.0) Container](llama3-with-smc.md)
19+
2. [Deploy ELYZA-japanese-Llama-2-13b-instruct with Oracle Service Managed vLLM(0.3.0) Container](deploy-with-smc.md)
1920

2021
---
2122

ai-quick-actions/llama3-with-smc.md

+343
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,343 @@
1+
# Deploy Meta-Llama-3-8B-Instruct with Oracle Service Managed vLLM(0.3.0) Container
2+
3+
![LLama3](https://huggingface.co/blog/assets/llama3/thumbnail.jpg)
4+
5+
This how-to will show how to use the Oracle Data Science Service Managed Containers - part of the Quick Actions feature, to inference with a model downloaded from Hugging Face. For this we will use [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) from Meta. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.
6+
7+
## Required IAM Policies
8+
9+
Add these [policies](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#required-iam-policies) to grant access to OCI services.
10+
11+
## Setup
12+
13+
```python
14+
# Install required python packages
15+
16+
!pip install oracle-ads
17+
!pip install oci
18+
!pip install huggingface_hub
19+
```
20+
21+
```python
22+
# Uncomment this code and set the correct proxy links if have to setup proxy for internet
23+
# import os
24+
# os.environ['http_proxy']="http://myproxy"
25+
# os.environ['https_proxy']="http://myproxy"
26+
27+
# Use os.environ['no_proxy'] to route traffic directly
28+
```
29+
30+
```python
31+
import ads
32+
ads.set_auth("resource_principal")
33+
```
34+
35+
```python
36+
# Extract region information from the Notebook environment variables and signer.
37+
ads.common.utils.extract_region()
38+
```
39+
40+
### Common variables
41+
42+
```python
43+
# change as required for your environment
44+
compartment_id = os.environ["PROJECT_COMPARTMENT_OCID"]
45+
project_id = os.environ["PROJECT_OCID"]
46+
47+
log_group_id = "ocid1.loggroup.oc1.xxx.xxxxx"
48+
log_id = "cid1.log.oc1.xxx.xxxxx"
49+
50+
instance_shape = "VM.GPU.A10.1"
51+
container_image = "dsmc://odsc-vllm-serving:0.3.0.7"
52+
region = "us-ashburn-1"
53+
```
54+
55+
The container image referenced above (`dsmc://odsc-vllm-serving:0.3.0.7`) is an Oracle Service Managed container that was build with:
56+
57+
- Oracle Linux 8 - Slim
58+
- CUDA 12.4
59+
- cuDNN 9
60+
- Torch 2.1.2
61+
- Python 3.11.5
62+
- vLLM v0.3.0
63+
64+
## Prepare The Model Artifacts
65+
66+
To prepare Model artifacts for LLM model deployment:
67+
68+
- Download the model files from huggingface to local directory using a valid huggingface token (only needed for gated models). If you don't have Huggingface Token, refer [this](https://huggingface.co/docs/hub/en/security-tokens) to generate one.
69+
- Upload the model folder to a [versioned bucket](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/usingversioning.htm) in Oracle Object Storage. If you don’t have an Object Storage bucket, create one using the OCI SDK or the Console. Create an Object Storage bucket. Make a note of the `namespace`, `compartment`, and `bucketname`. Configure the policies to allow the Data Science service to read and write the model artifact to the Object Storage bucket in your tenancy. An administrator must configure the policies in IAM in the Console.
70+
- Create model catalog entry for the model using the Object storage path
71+
72+
### Model Download from HuggingFace Model Hub
73+
74+
```python
75+
# Login to huggingface using env variable
76+
HUGGINGFACE_TOKEN = "<HUGGINGFACE_TOKEN>" # Your huggingface token
77+
!huggingface-cli login --token $HUGGINGFACE_TOKEN
78+
```
79+
80+
[This](https://huggingface.co/docs/huggingface_hub/guides/download#download-an-entire-repository) provides more information on using `snapshot_download()` to download an entire repository at a given revision. Models in the HuggingFace hub are stored in their own repository.
81+
82+
```python
83+
# Download the LLama3 model from Hugging Face to a local folder.
84+
#
85+
86+
from huggingface_hub import snapshot_download
87+
from tqdm.auto import tqdm
88+
89+
model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # copy from https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
90+
local_dir = "models/Meta-Llama-3-8B-Instruct"
91+
92+
snapshot_download(repo_id=model_name, local_dir=local_dir, force_download=True, tqdm_class=tqdm)
93+
94+
print(f"Downloaded model {model_name} to {local_dir}")
95+
```
96+
97+
## Upload Model to OCI Object Storage
98+
99+
```python
100+
model_prefix = "Meta-Llama-3-8B-Instruct/" #"<bucket_prefix>"
101+
bucket= "<bucket_name>" # this should be a versioned bucket
102+
namespace = "<bucket_namespace>"
103+
104+
!oci os object bulk-upload --src-dir $local_dir --prefix $model_prefix -bn $bucket -ns $namespace --auth "resource_principal"
105+
```
106+
107+
## Create Model by Reference using ADS
108+
109+
```python
110+
from ads.model.datascience_model import DataScienceModel
111+
112+
artifact_path = f"oci://{bucket}@{namespace}/{model_prefix}"
113+
114+
model = (DataScienceModel()
115+
.with_compartment_id(compartment_id)
116+
.with_project_id(project_id)
117+
.with_display_name("Meta-Llama-3-8B-Instruct")
118+
.with_artifact(artifact_path)
119+
)
120+
121+
model.create(model_by_reference=True)
122+
```
123+
124+
### Import Model Deployment Modules
125+
126+
```python
127+
from ads.model.deployment import (
128+
ModelDeployment,
129+
ModelDeploymentContainerRuntime,
130+
ModelDeploymentInfrastructure,
131+
ModelDeploymentMode,
132+
)
133+
```
134+
135+
### Setup Model Deployment Infrastructure
136+
137+
```python
138+
infrastructure = (
139+
ModelDeploymentInfrastructure()
140+
.with_project_id(project_id)
141+
.with_compartment_id(compartment_id)
142+
.with_shape_name(instance_shape)
143+
.with_bandwidth_mbps(10)
144+
.with_replica(1)
145+
.with_web_concurrency(10)
146+
.with_access_log(
147+
log_group_id=log_group_id,
148+
log_id=log_id,
149+
)
150+
.with_predict_log(
151+
log_group_id=log_group_id,
152+
log_id=log_id,
153+
)
154+
)
155+
```
156+
157+
### Configure Model Deployment Runtime
158+
159+
```python
160+
env_var = {
161+
'BASE_MODEL': model_prefix,
162+
'PARAMS': '--served-model-name odsc-llm --seed 42',
163+
'MODEL_DEPLOY_PREDICT_ENDPOINT': '/v1/completions',
164+
'MODEL_DEPLOY_ENABLE_STREAMING': 'true'
165+
}
166+
167+
container_runtime = (
168+
ModelDeploymentContainerRuntime()
169+
.with_image(container_image)
170+
.with_server_port(8080)
171+
.with_health_check_port(8080)
172+
.with_env(env_var)
173+
.with_deployment_mode(ModelDeploymentMode.HTTPS)
174+
.with_model_uri(model.id)
175+
.with_region(region)
176+
)
177+
```
178+
179+
### Deploy Model Using Container Runtime
180+
181+
```python
182+
deployment = (
183+
ModelDeployment()
184+
.with_display_name(f"Meta-Llama-3-8B-Instruct with vLLM SMC")
185+
.with_description("Deployment of Meta-Llama-3-8B-Instruct MD with vLLM(0.3.0) container")
186+
.with_infrastructure(infrastructure)
187+
.with_runtime(container_runtime)
188+
).deploy(wait_for_completion=False)
189+
```
190+
191+
### Inference
192+
193+
Once the model deployment has reached the Active state, we can invoke the model deployment endpoint to interact with the LLM. More details on different ways for accessing MD endpoints is documented [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/ai-quick-actions/model-deployment-tips.md).
194+
195+
196+
#### How to prompt Llama 3
197+
198+
The base models have no prompt format. The Instruct versions use the following conversation structure:
199+
200+
```xml
201+
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
202+
203+
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
204+
205+
{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
206+
207+
{{ model_answer_1 }}<|eot_id|>
208+
```
209+
210+
This format has to be exactly reproduced for effective use.
211+
212+
213+
```python
214+
import requests
215+
import ads
216+
from string import Template
217+
218+
ads.set_auth("resource_principal")
219+
220+
requests.post(
221+
"https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{deployment.model_deployment_id}/predict",
222+
json={
223+
"model": "odsc-llm",
224+
"prompt": Template(
225+
""""|begin_of_text|><|start_header_id|>user<|end_header_id|> $prompt <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
226+
).substitute(
227+
prompt="What amateur radio band can a general class license holder use?"
228+
),
229+
"max_tokens": 250,
230+
"temperature": 0.7,
231+
"top_p": 0.8,
232+
},
233+
auth=ads.common.auth.default_signer()["signer"],
234+
headers={},
235+
).json()
236+
```
237+
238+
#### Output
239+
240+
The LLM produced a great output:
241+
242+
> A) All amateur radio bands
243+
B) All amateur radio bands except the 10-meter band
244+
C) All amateur radio bands except the 160-meter band
245+
D) All amateur radio bands except the 2200-meter band
246+
>
247+
Answer: A) All amateur radio bands
248+
\</s\>
249+
>
250+
The FCC grants a general class license to amateur radio operators who pass the Element 3 exam. The exam covers the following topics:
251+
>
252+
* FCC rules and regulations
253+
* Amateur radio practices and procedures
254+
* Radio theory and operating practices
255+
* Antennas and transmission lines
256+
* Electronic circuits and devices
257+
* RF safety and environmental concerns
258+
>
259+
As a general class license holder, an amateur radio operator is authorized to use all amateur radio bands, which include:
260+
>
261+
* 160 meters (1.8 MHz to 2 MHz)
262+
* 80 meters (3.5 MHz to 4 MHz)
263+
* 40 meters (7 MHz to 7.3 MHz)
264+
* 30 meters (10.1 MHz to 10.3 MHz)
265+
* 20 meters (14 MHz to 14.3 MHz)
266+
* 17 meters (18.1 MHz to 18.3 MHz)
267+
* 15 meters (21 MHz to 21.3 MHz
268+
269+
The raw output:
270+
271+
```json
272+
{
273+
"id": "cmpl-2d57a83cb6544b768abc00c1b3b7ffc5",
274+
"object": "text_completion",
275+
"created": 2277,
276+
"model": "odsc-llm",
277+
"choices": [
278+
{
279+
"index": 0,
280+
"text": "A) All amateur radio bands\nB) All amateur radio bands except the 10-meter band\nC) All amateur radio bands except the 160-meter band\nD) All amateur radio bands except the 2200-meter band\n\nAnswer: A) All amateur radio bands\n</s>\n\nThe FCC grants a general class license to amateur radio operators who pass the Element 3 exam. The exam covers the following topics:\n\n* FCC rules and regulations\n* Amateur radio practices and procedures\n* Radio theory and operating practices\n* Antennas and transmission lines\n* Electronic circuits and devices\n* RF safety and environmental concerns\n\nAs a general class license holder, an amateur radio operator is authorized to use all amateur radio bands, which include:\n\n* 160 meters (1.8 MHz to 2 MHz)\n* 80 meters (3.5 MHz to 4 MHz)\n* 40 meters (7 MHz to 7.3 MHz)\n* 30 meters (10.1 MHz to 10.3 MHz)\n* 20 meters (14 MHz to 14.3 MHz)\n* 17 meters (18.1 MHz to 18.3 MHz)\n* 15 meters (21 MHz to 21.3 MHz",
281+
"logprobs": null,
282+
"finish_reason": "length"
283+
}
284+
],
285+
"usage": {
286+
"prompt_tokens": 19,
287+
"total_tokens": 269,
288+
"completion_tokens": 250
289+
}
290+
}
291+
```
292+
293+
#### Using the model from [LangChain](https://python.langchain.com/v0.1/docs/integrations/llms/oci_model_deployment_endpoint/)
294+
295+
```python
296+
import ads
297+
from langchain_community.llms import OCIModelDeploymentVLLM
298+
from string import Template
299+
300+
ads.set_auth("resource_principal")
301+
302+
llm = OCIModelDeploymentVLLM(
303+
endpoint="https://modeldeployment.us-ashburn-1.oci.customer-oci.com/{deployment.model_deployment_id}/predict",
304+
model="odsc-llm",
305+
)
306+
307+
llm.invoke(
308+
input=Template(
309+
""""|begin_of_text|><|start_header_id|>user<|end_header_id|> $prompt <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
310+
).substitute(
311+
prompt="What amateur radio bands are best to use when there are solar flares?"
312+
),
313+
max_tokens=500,
314+
temperature=0,
315+
p=0.9,
316+
stop=["<|eot_id|>"],
317+
skip_special_tokens=False,
318+
)
319+
```
320+
321+
Output:
322+
323+
> During solar flares, the ionosphere can become highly ionized, causing radio signals to be refracted and scattered in unpredictable ways. This can make it challenging to communicate on certain amateur radio bands. However, some bands are more affected than others. Here's a general guideline on which amateur radio bands to use during solar flares:
324+
>
325+
**Avoid:**
326+
>
327+
1. **HF (3-30 MHz) bands**: These bands are most affected by solar flares, as the ionosphere can become highly ionized, causing signal refraction and scattering. Signals may be severely attenuated or even completely absorbed.
328+
2. **20m (14 MHz) and 15m (21 MHz) bands**: These bands are also prone to significant signal degradation due to the ionosphere's increased ionization.
329+
>
330+
**Use:**
331+
>
332+
1. **VHF (50-250 MHz) bands**: These bands are less affected by solar flares, as the ionosphere's ionization has less impact on signal propagation. Signals are more likely to follow a more predictable path.
333+
2. **UHF (300-3000 MHz) bands**: These bands are even less affected by solar flares, as the ionosphere's ionization has a minimal impact on signal propagation.
334+
3. **SHF (3-30 GHz) bands**: These bands are generally not affected by solar flares, as the ionosphere's ionization has a negligible impact on signal propagation.
335+
>
336+
**Tips:**
337+
>
338+
1. **Monitor propagation conditions**: Keep an eye on propagation forecasts and reports from other amateur radio operators to adjust your operating frequency and mode accordingly.
339+
2. **Use digital modes**: Digital modes like PSK31, FT8, and JT65 are more resistant to signal degradation and can be a good choice during solar flares.
340+
3. **Experiment with different frequencies**: If you're experiencing difficulties on a particular frequency, try switching to a different frequency within the same band to see if the signal improves.
341+
4. **Keep an eye on the solar flare's impact**: Monitor the solar flare's intensity and duration to adjust your operating strategy accordingly.
342+
>
343+
Remember, solar flares can have unpredictable effects on radio propagation, so it's essential to stay flexible and adapt to changing conditions.

0 commit comments

Comments
 (0)