Skip to content

Commit 7d5f30f

Browse files
committed
fix: normalize ports.
1 parent 31ff8f4 commit 7d5f30f

File tree

2 files changed

+8
-8
lines changed

2 files changed

+8
-8
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ You always need the root node and you can add 2^n - 1 worker nodes to speed up t
6868
| `--model <path>` | Path to model. | `dllama_model_meta-llama-3-8b_q40.m` |
6969
| `--tokenizer <path>` | Tokenizer to model. | `dllama_tokenizer_llama3.t` |
7070
| `--buffer-float-type <type>` | Float precision of synchronization. | `q80` |
71-
| `--workers <workers>` | Addresses of workers (ip:port), separated by space. | `10.0.0.1:9991 10.0.0.2:9991` |
71+
| `--workers <workers>` | Addresses of workers (ip:port), separated by space. | `10.0.0.1:9999 10.0.0.2:9999` |
7272
| `--max-seq-len <n>` | The maximum sequence length, it helps to reduce the RAM usage. | `4096` |
7373

7474
Inference, Chat, Worker, API
@@ -145,17 +145,17 @@ Continue to point 3.
145145
3. Transfer weights and the tokenizer file to the root computer.
146146
4. Run worker nodes on worker computers:
147147
```sh
148-
./dllama worker --port 9998 --nthreads 4
148+
./dllama worker --port 9999 --nthreads 4
149149
```
150150
5. Run root node on the root computer:
151151
```sh
152-
./dllama inference --model dllama_model_meta-llama-3-8b_q40.m --tokenizer dllama_tokenizer_llama3.t --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 4 --workers 192.168.0.1:9998
152+
./dllama inference --model dllama_model_meta-llama-3-8b_q40.m --tokenizer dllama_tokenizer_llama3.t --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 4 --workers 192.168.0.1:9999
153153
```
154154

155155
To add more worker nodes, just add more addresses to the `--workers` argument.
156156

157157
```
158-
./dllama inference ... --workers 192.168.0.1:9998 192.168.0.2:9998 192.168.0.3:9998
158+
./dllama inference ... --workers 192.168.0.1:9999 192.168.0.2:9999 192.168.0.3:9999
159159
```
160160

161161
</details>
@@ -192,17 +192,17 @@ sudo ip addr add 10.0.0.2/24 dev eth0 # 2th device
192192
```
193193
8. Run worker nodes on worker devices:
194194
```sh
195-
sudo nice -n -20 ./dllama worker --port 9998 --nthreads 4
195+
sudo nice -n -20 ./dllama worker --port 9999 --nthreads 4
196196
```
197197
9. Run root node on the root device:
198198
```sh
199-
sudo nice -n -20 ./dllama inference --model dllama_model_meta-llama-3-8b_q40.m --tokenizer dllama_tokenizer_llama3.t --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 4 --workers 10.0.0.2:9998
199+
sudo nice -n -20 ./dllama inference --model dllama_model_meta-llama-3-8b_q40.m --tokenizer dllama_tokenizer_llama3.t --buffer-float-type q80 --prompt "Hello world" --steps 16 --nthreads 4 --workers 10.0.0.2:9999
200200
```
201201

202202
To add more worker nodes, just add more addresses to the `--workers` argument.
203203

204204
```
205-
./dllama inference ... --workers 10.0.0.2:9998 10.0.0.3:9998 10.0.0.4:9998
205+
./dllama inference ... --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
206206
```
207207

208208
</details>

examples/chat-api-client.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
// 2. Run this script: `node examples/chat-api-client.js`
77

88
const HOST = process.env.HOST ? process.env.HOST : '127.0.0.1';
9-
const PORT = process.env.PORT ? Number(process.env.PORT) : 9990;
9+
const PORT = process.env.PORT ? Number(process.env.PORT) : 9999;
1010

1111
async function chat(messages, maxTokens) {
1212
const response = await fetch(`http://${HOST}:${PORT}/v1/chat/completions`, {

0 commit comments

Comments
 (0)