You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can you provide your screenshot of cli when you do inference, thought there is already gen_texts shown (you could just go for corresponding code in utils_infer.py)
The current inference is based on pre-segmented text. It can split input text at the sentence level for processing, but this approach may not be suitable for word-level splitting, as individual words might be too short for efficient TTS generation. Moreover, the performance of inference after segmentation suffers a loss; the inference for the same text becomes slower after segmentation. Therefore, it would be ideal if the inference process itself could align the results with words or sentences. This is similar to the functionality provided by the standard Web Speech API in browsers, which supports event callback processing at both the word and sentence levels.
Checks
Environment Details
Ubuntu 22, Python 11.
Dear All,
Is there any way to trigger events on word or sentence boundaries during generation? Or is it possible to detect the boundaries with the result?
Thanks
Steps to Reproduce
Call the API with python to generate voice for text
✔️ Expected Behavior
It would trigger events on the boundaries to handle, or produce meta info along with the tts result.
❌ Actual Behavior
It produces bytes array without meta info or events
The text was updated successfully, but these errors were encountered: