You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DocSum/docker_compose/intel/cpu/xeon/README.md
+89-1Lines changed: 89 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -223,11 +223,12 @@ You will have the following Docker Images:
223
223
Text:
224
224
225
225
```bash
226
+
## json input
226
227
curl -X POST http://${host_ip}:8888/v1/docsum \
227
228
-H "Content-Type: application/json" \
228
229
-d '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
229
230
230
-
#Use English mode (default).
231
+
#form input, use English mode (default).
231
232
curl http://${host_ip}:8888/v1/docsum \
232
233
-H "Content-Type: multipart/form-data" \
233
234
-F "type=text" \
@@ -290,6 +291,93 @@ You will have the following Docker Images:
290
291
-F "stream=true"
291
292
```
292
293
294
+
7. MegaService with long context
295
+
296
+
If you want to deal with long context, can set following parameters and select suitable summary type.
297
+
298
+
- "summary_type": can be "auto", "stuff", "truncate", "map_reduce", "refine", default is "auto"
299
+
- "chunk_size": max token length for each chunk. Set to be different default value according to "summary_type".
300
+
- "chunk_overlap": overlap token length between each chunk, default is 0.1\*chunk_size
301
+
302
+
**summary_type=auto**
303
+
304
+
"summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
305
+
306
+
```bash
307
+
curl http://${host_ip}:8888/v1/docsum \
308
+
-H "Content-Type: multipart/form-data" \
309
+
-F "type=text" \
310
+
-F "messages=" \
311
+
-F "max_tokens=32" \
312
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
313
+
-F "language=en" \
314
+
-F "summary_type=auto"
315
+
```
316
+
317
+
**summary_type=stuff**
318
+
319
+
In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
320
+
321
+
```bash
322
+
curl http://${host_ip}:8888/v1/docsum \
323
+
-H "Content-Type: multipart/form-data" \
324
+
-F "type=text" \
325
+
-F "messages=" \
326
+
-F "max_tokens=32" \
327
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
328
+
-F "language=en" \
329
+
-F "summary_type=stuff"
330
+
```
331
+
332
+
**summary_type=truncate**
333
+
334
+
Truncate mode will truncate the input text and keep only the first chunk, whose length is equal to `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
335
+
336
+
```bash
337
+
curl http://${host_ip}:8888/v1/docsum \
338
+
-H "Content-Type: multipart/form-data" \
339
+
-F "type=text" \
340
+
-F "messages=" \
341
+
-F "max_tokens=32" \
342
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
343
+
-F "language=en" \
344
+
-F "summary_type=truncate"
345
+
```
346
+
347
+
**summary_type=map_reduce**
348
+
349
+
Map_reduce mode will split the inputs into multiple chunks, map each document to an individual summary, then consolidate those summaries into a single global summary. `streaming=True` is not allowed here.
350
+
351
+
In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
352
+
353
+
```bash
354
+
curl http://${host_ip}:8888/v1/docsum \
355
+
-H "Content-Type: multipart/form-data" \
356
+
-F "type=text" \
357
+
-F "messages=" \
358
+
-F "max_tokens=32" \
359
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
360
+
-F "language=en" \
361
+
-F "summary_type=map_reduce"
362
+
```
363
+
364
+
**summary_type=refine**
365
+
366
+
Refin mode will split the inputs into multiple chunks, generate summary for the first one, then combine with the second, loops over every remaining chunks to get the final summary.
367
+
368
+
In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - 2 * input.max_tokens - 128, MAX_INPUT_TOKENS)`.
369
+
370
+
```bash
371
+
curl http://${host_ip}:8888/v1/docsum \
372
+
-H "Content-Type: multipart/form-data" \
373
+
-F "type=text" \
374
+
-F "messages=" \
375
+
-F "max_tokens=32" \
376
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
377
+
-F "language=en" \
378
+
-F "summary_type=refine"
379
+
```
380
+
293
381
## 🚀 Launch the UI
294
382
295
383
Several UI options are provided. If you need to work with multimedia documents, .doc, or .pdf files, suggested to use Gradio UI.
Copy file name to clipboardExpand all lines: DocSum/docker_compose/intel/hpu/gaudi/README.md
+93-6Lines changed: 93 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -207,18 +207,19 @@ You will have the following Docker Images:
207
207
Text:
208
208
209
209
```bash
210
+
## json input
210
211
curl -X POST http://${host_ip}:8888/v1/docsum \
211
212
-H "Content-Type: application/json" \
212
213
-d '{"type": "text", "messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
213
214
214
-
# Use English mode (default).
215
+
#form input. Use English mode (default).
215
216
curl http://${host_ip}:8888/v1/docsum \
216
217
-H "Content-Type: multipart/form-data" \
217
218
-F "type=text" \
218
219
-F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \
219
220
-F "max_tokens=32" \
220
221
-F "language=en" \
221
-
-F "stream=true"
222
+
-F "stream=True"
222
223
223
224
# Use Chinese mode.
224
225
curl http://${host_ip}:8888/v1/docsum \
@@ -227,7 +228,7 @@ You will have the following Docker Images:
@@ -271,7 +271,94 @@ You will have the following Docker Images:
271
271
-F "messages=convert your video to base64 data type" \
272
272
-F "max_tokens=32" \
273
273
-F "language=en" \
274
-
-F "stream=true"
274
+
-F "stream=True"
275
+
```
276
+
277
+
7. MegaService with long context
278
+
279
+
If you want to deal with long context, can set following parameters and select suitable summary type.
280
+
281
+
- "summary_type": can be "auto", "stuff", "truncate", "map_reduce", "refine", default is "auto"
282
+
- "chunk_size": max token length for each chunk. Set to be different default value according to "summary_type".
283
+
- "chunk_overlap": overlap token length between each chunk, default is 0.1\*chunk_size
284
+
285
+
**summary_type=auto**
286
+
287
+
"summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
288
+
289
+
```bash
290
+
curl http://${host_ip}:8888/v1/docsum \
291
+
-H "Content-Type: multipart/form-data" \
292
+
-F "type=text" \
293
+
-F "messages=" \
294
+
-F "max_tokens=32" \
295
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
296
+
-F "language=en" \
297
+
-F "summary_type=auto"
298
+
```
299
+
300
+
**summary_type=stuff**
301
+
302
+
In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
303
+
304
+
```bash
305
+
curl http://${host_ip}:8888/v1/docsum \
306
+
-H "Content-Type: multipart/form-data" \
307
+
-F "type=text" \
308
+
-F "messages=" \
309
+
-F "max_tokens=32" \
310
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
311
+
-F "language=en" \
312
+
-F "summary_type=stuff"
313
+
```
314
+
315
+
**summary_type=truncate**
316
+
317
+
Truncate mode will truncate the input text and keep only the first chunk, whose length is equal to `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
318
+
319
+
```bash
320
+
curl http://${host_ip}:8888/v1/docsum \
321
+
-H "Content-Type: multipart/form-data" \
322
+
-F "type=text" \
323
+
-F "messages=" \
324
+
-F "max_tokens=32" \
325
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
326
+
-F "language=en" \
327
+
-F "summary_type=truncate"
328
+
```
329
+
330
+
**summary_type=map_reduce**
331
+
332
+
Map_reduce mode will split the inputs into multiple chunks, map each document to an individual summary, then consolidate those summaries into a single global summary. `streaming=True` is not allowed here.
333
+
334
+
In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - input.max_tokens - 50, MAX_INPUT_TOKENS)`
335
+
336
+
```bash
337
+
curl http://${host_ip}:8888/v1/docsum \
338
+
-H "Content-Type: multipart/form-data" \
339
+
-F "type=text" \
340
+
-F "messages=" \
341
+
-F "max_tokens=32" \
342
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
343
+
-F "language=en" \
344
+
-F "summary_type=map_reduce"
345
+
```
346
+
347
+
**summary_type=refine**
348
+
349
+
Refin mode will split the inputs into multiple chunks, generate summary for the first one, then combine with the second, loops over every remaining chunks to get the final summary.
350
+
351
+
In this mode, default `chunk_size` is set to be `min(MAX_TOTAL_TOKENS - 2 * input.max_tokens - 128, MAX_INPUT_TOKENS)`.
352
+
353
+
```bash
354
+
curl http://${host_ip}:8888/v1/docsum \
355
+
-H "Content-Type: multipart/form-data" \
356
+
-F "type=text" \
357
+
-F "messages=" \
358
+
-F "max_tokens=32" \
359
+
-F "files=@/path to your file (.txt, .docx, .pdf)" \
360
+
-F "language=en" \
361
+
-F "summary_type=refine"
275
362
```
276
363
277
364
> More detailed tests can be found here `cd GenAIExamples/DocSum/test`
0 commit comments