Investigate support for streaming mode #6

HennerM · 2023-06-22T12:28:52Z

Previously reported in #2 (comment) by @aamir-s18

Streaming mode could be useful for very big models. It can help in real-time use cases where we can improve User Experience by generating one token at a time.

Triton Server supports streaming with decoupled models

It needs to be investigated how CTranslate can be used to get decoded tokens one-by-one. Additionally this might be trickier in a beam decode setting, unless we are willing to always return the best guess which could flip previous words

aamir-s18 · 2023-06-23T14:34:18Z

I think this here is relevant for this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate support for streaming mode #6

Investigate support for streaming mode #6

HennerM commented Jun 22, 2023

aamir-s18 commented Jun 23, 2023

Investigate support for streaming mode #6

Investigate support for streaming mode #6

Comments

HennerM commented Jun 22, 2023

aamir-s18 commented Jun 23, 2023