Skip to content

Commit 2f03a3a

Browse files
Align parameters for "max_token, repetition_penalty,presence_penalty,frequency_penalty" (#726)
Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 372d78c commit 2f03a3a

File tree

24 files changed

+110
-72
lines changed

24 files changed

+110
-72
lines changed

AudioQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ curl http://${host_ip}:3006/generate \
108108
# llm microservice
109109
curl http://${host_ip}:3007/v1/chat/completions\
110110
-X POST \
111-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
111+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
112112
-H 'Content-Type: application/json'
113113

114114
# speecht5 service

AudioQnA/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ curl http://${host_ip}:3006/generate \
108108
# llm microservice
109109
curl http://${host_ip}:3007/v1/chat/completions\
110110
-X POST \
111-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
111+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
112112
-H 'Content-Type: application/json'
113113

114114
# speecht5 service

AudioQnA/tests/test_gmc_on_gaudi.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ function validate_audioqa() {
3434
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
3535
echo "$CLIENT_POD"
3636
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}")
37-
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
37+
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
3838
echo "$byte_str" > $LOG_PATH/curl_audioqa.log
3939
if [ -z "$byte_str" ]; then
4040
echo "audioqa failed, please check the logs in ${LOG_PATH}!"

AudioQnA/tests/test_gmc_on_xeon.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ function validate_audioqa() {
3434
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
3535
echo "$CLIENT_POD"
3636
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}")
37-
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
37+
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
3838
echo "$byte_str" > $LOG_PATH/curl_audioqa.log
3939
if [ -z "$byte_str" ]; then
4040
echo "audioqa failed, please check the logs in ${LOG_PATH}!"

ChatQnA/benchmark/benchmark.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ test_cases:
4141
run_test: false
4242
service_name: "llm-svc" # Replace with your service name
4343
parameters:
44-
max_new_tokens: 128
44+
max_tokens: 128
4545
temperature: 0.01
4646
top_k: 10
4747
top_p: 0.95

ChatQnA/chatqna_no_wrapper.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,12 @@ def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **k
6969
next_inputs = {}
7070
next_inputs["model"] = "tgi" # specifically clarify the fake model to make the format unified
7171
next_inputs["messages"] = [{"role": "user", "content": inputs["inputs"]}]
72-
next_inputs["max_tokens"] = llm_parameters_dict["max_new_tokens"]
72+
next_inputs["max_tokens"] = llm_parameters_dict["max_tokens"]
7373
next_inputs["top_p"] = llm_parameters_dict["top_p"]
7474
next_inputs["stream"] = inputs["streaming"]
75-
next_inputs["frequency_penalty"] = inputs["repetition_penalty"]
75+
next_inputs["frequency_penalty"] = inputs["frequency_penalty"]
76+
next_inputs["presence_penalty"] = inputs["presence_penalty"]
77+
next_inputs["repetition_penalty"] = inputs["repetition_penalty"]
7678
next_inputs["temperature"] = inputs["temperature"]
7779
inputs = next_inputs
7880

ChatQnA/docker_compose/intel/cpu/aipc/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ OLLAMA_HOST=${host_ip}:11434 ollama run $OLLAMA_MODEL
229229
```bash
230230
curl http://${host_ip}:9000/v1/chat/completions\
231231
-X POST \
232-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
232+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
233233
-H 'Content-Type: application/json'
234234
```
235235

ChatQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -438,18 +438,31 @@ docker compose -f compose_vllm.yaml up -d
438438
This service depends on above LLM backend service startup. It will be ready after long time, to wait for them being ready in first startup.
439439

440440
```bash
441+
# TGI service
441442
curl http://${host_ip}:9000/v1/chat/completions\
442443
-X POST \
443-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
444+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
444445
-H 'Content-Type: application/json'
445446
```
446447

448+
For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
449+
450+
```bash
451+
# vLLM Service
452+
curl http://${your_ip}:9000/v1/chat/completions \
453+
-X POST \
454+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
455+
-H 'Content-Type: application/json'
456+
```
457+
458+
For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
459+
447460
8. MegaService
448461

449462
```bash
450-
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
451-
"messages": "What is the revenue of Nike in 2023?"
452-
}'
463+
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
464+
"messages": "What is the revenue of Nike in 2023?"
465+
}'
453466
```
454467

455468
9. Dataprep Microservice(Optional)

ChatQnA/docker_compose/intel/cpu/xeon/README_qdrant.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ docker compose -f compose_qdrant.yaml up -d
304304
```bash
305305
curl http://${host_ip}:6047/v1/chat/completions\
306306
-X POST \
307-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
307+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
308308
-H 'Content-Type: application/json'
309309
```
310310

ChatQnA/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -442,18 +442,41 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
442442
7. LLM Microservice
443443

444444
```bash
445+
# TGI service
446+
curl http://${host_ip}:9000/v1/chat/completions\
447+
-X POST \
448+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
449+
-H 'Content-Type: application/json'
450+
```
451+
452+
For parameters in TGI mode, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
453+
454+
```bash
455+
# vLLM Service
445456
curl http://${host_ip}:9000/v1/chat/completions \
457+
-X POST \
458+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
459+
-H 'Content-Type: application/json'
460+
```
461+
462+
For parameters in vLLM Mode, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
463+
464+
```bash
465+
# vLLM-on-Ray Service
466+
curl http://${your_ip}:9000/v1/chat/completions \
446467
-X POST \
447-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
468+
-d '{"query":"What is Deep Learning?","max_tokens":17,"presence_penalty":1.03","streaming":false}' \
448469
-H 'Content-Type: application/json'
449470
```
450471

472+
For parameters in vLLM-on-Ray mode, can refer to [LangChain ChatOpenAI API](https://python.langchain.com/v0.2/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)
473+
451474
8. MegaService
452475

453476
```bash
454477
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
455-
"messages": "What is the revenue of Nike in 2023?"
456-
}'
478+
"messages": "What is the revenue of Nike in 2023?"
479+
}'
457480
```
458481

459482
9. Dataprep Microservice(Optional)

ChatQnA/docker_compose/intel/hpu/gaudi/how_to_validate_service.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ and the log shows model warm up, please wait for a while and try it later.
278278
```
279279
curl http://${host_ip}:9000/v1/chat/completions\
280280
-X POST \
281-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
281+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
282282
-H 'Content-Type: application/json'
283283
```
284284

ChatQnA/docker_compose/nvidia/gpu/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ docker compose up -d
280280
```bash
281281
curl http://${host_ip}:9000/v1/chat/completions \
282282
-X POST \
283-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
283+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
284284
-H 'Content-Type: application/json'
285285
```
286286

CodeGen/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ Two ways of consuming CodeGen Service:
132132
http_proxy=""
133133
curl http://${host_ip}:8028/generate \
134134
-X POST \
135-
-d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' \
135+
-d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_tokens":256, "do_sample": true}}' \
136136
-H 'Content-Type: application/json'
137137
```
138138

CodeGen/docker_compose/intel/cpu/xeon/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ docker compose up -d
138138
```bash
139139
curl http://${host_ip}:9000/v1/chat/completions\
140140
-X POST \
141-
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_new_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
141+
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
142142
-H 'Content-Type: application/json'
143143
```
144144

CodeGen/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ docker compose up -d
119119
```bash
120120
curl http://${host_ip}:9000/v1/chat/completions\
121121
-X POST \
122-
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_new_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
122+
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
123123
-H 'Content-Type: application/json'
124124
```
125125

CodeGen/tests/test_gmc_on_gaudi.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ function validate_codegen() {
3434
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
3535
echo "$CLIENT_POD"
3636
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='codegen')].status.accessUrl}")
37-
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codegen.log
37+
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_tokens":256, "do_sample": true}}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codegen.log
3838
exit_code=$?
3939
if [ $exit_code -ne 0 ]; then
4040
echo "chatqna failed, please check the logs in ${LOG_PATH}!"

CodeGen/tests/test_gmc_on_xeon.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ function validate_codegen() {
3434
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
3535
echo "$CLIENT_POD"
3636
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='codegen')].status.accessUrl}")
37-
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codegen.log
37+
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_tokens":256, "do_sample": true}}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codegen.log
3838
exit_code=$?
3939
if [ $exit_code -ne 0 ]; then
4040
echo "chatqna failed, please check the logs in ${LOG_PATH}!"

CodeTrans/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ By default, the UI runs on port 5173 internally.
127127
http_proxy=""
128128
curl http://${host_ip}:8008/generate \
129129
-X POST \
130-
-d '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_new_tokens":17, "do_sample": true}}' \
130+
-d '{"inputs":" ### System: Please translate the following Golang codes into Python codes. ### Original codes: '\'''\'''\''Golang \npackage main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n '\'''\'''\'' ### Translated codes:","parameters":{"max_tokens":17, "do_sample": true}}' \
131131
-H 'Content-Type: application/json'
132132
```
133133

DocSum/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ Two ways of consuming Document Summarization Service:
149149
http_proxy=""
150150
curl http://${host_ip}:8008/generate \
151151
-X POST \
152-
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
152+
-d '{"inputs":"What is Deep Learning?","parameters":{"max_tokens":17, "do_sample": true}}' \
153153
-H 'Content-Type: application/json'
154154
```
155155

ProductivitySuite/docker_compose/intel/cpu/xeon/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ Please refer to [keycloak_setup_guide](keycloak_setup_guide.md) for more detail
271271
```bash
272272
curl http://${host_ip}:9000/v1/chat/completions\
273273
-X POST \
274-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
274+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
275275
-H 'Content-Type: application/json'
276276
```
277277

SearchQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ curl http://${host_ip}:3006/generate \
140140
# llm microservice
141141
curl http://${host_ip}:3007/v1/chat/completions\
142142
-X POST \
143-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
143+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
144144
-H 'Content-Type: application/json'
145145

146146
```

SearchQnA/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ curl http://${host_ip}:3006/generate \
150150
# llm microservice
151151
curl http://${host_ip}:3007/v1/chat/completions\
152152
-X POST \
153-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
153+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
154154
-H 'Content-Type: application/json'
155155

156156
```

VisualQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -138,28 +138,28 @@ Follow the instructions to validate MicroServices.
138138

139139
2. MegaService
140140

141-
```bash
142-
curl http://${host_ip}:8888/v1/visualqna -H "Content-Type: application/json" -d '{
143-
"messages": [
144-
{
145-
"role": "user",
146-
"content": [
147-
{
148-
"type": "text",
149-
"text": "What'\''s in this image?"
150-
},
151-
{
152-
"type": "image_url",
153-
"image_url": {
154-
"url": "https://www.ilankelman.org/stopsigns/australia.jpg"
155-
}
156-
}
157-
]
158-
}
159-
],
160-
"max_tokens": 300
161-
}'
162-
```
141+
```bash
142+
curl http://${host_ip}:8888/v1/visualqna -H "Content-Type: application/json" -d '{
143+
"messages": [
144+
{
145+
"role": "user",
146+
"content": [
147+
{
148+
"type": "text",
149+
"text": "What'\''s in this image?"
150+
},
151+
{
152+
"type": "image_url",
153+
"image_url": {
154+
"url": "https://www.ilankelman.org/stopsigns/australia.jpg"
155+
}
156+
}
157+
]
158+
}
159+
],
160+
"max_tokens": 300
161+
}'
162+
```
163163

164164
## 🚀 Launch the UI
165165

0 commit comments

Comments
 (0)