Introduce EstimatorWrapper for managing multiple estimators #1115

hexisyztem · 2025-03-28T07:42:18Z

向前兼容原本的代码逻辑。
添加 EstimatorWrapper，用于多个 cuda stream 同时调用 Estimator 进行处理的情况

Replaces inline locking logic with an improved estimator acquisition and release mechanism. Adopts `execute_async_v3` for more efficient execution and ensures proper CUDA stream synchronization. These changes improve code clarity, concurrency handling, and runtime performance.

Added an EstimatorWrapper class to handle the creation and management of multiple estimators in a queue. Updated the forward method and related logic to support both single estimator and EstimatorWrapper instances, improving flexibility and concurrency in execution.

Aligned code indentation for improved readability and maintainability. Adjusted spacing issues in method signatures and data pointer lists to ensure uniform formatting across the file.

qi-hua · 2025-03-31T17:48:57Z

flow.decoder 的多 Estimator 并行推理，较原来的单Estimator实例推理，对于流式返回、并发推理的效率有显著提升！！

reduce GPU memory cost

wang-TJ-20 · 2025-04-01T15:27:19Z

@qi-hua @hexisyztem 想问下这个flow的并发加速更新完代码后，还需要做哪些操作才能应用上

wang-TJ-20 · 2025-04-01T16:27:03Z

我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对

hexisyztem · 2025-04-02T01:30:19Z

你是怎么使用的？发自我的 iPhone在 2025年4月2日，00:27，wang ***@***.***> 写道：我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) 我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

hexisyztem · 2025-04-02T01:32:02Z

client.py里面有多进程并发的example发自我的 iPhone在 2025年4月2日，00:27，wang ***@***.***> 写道：我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) 我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

wang-TJ-20 · 2025-04-02T02:05:24Z

你是怎么使用的？发自我的 iPhone在 2025年4月2日，00:27，wang @.> 写道：我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) 我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

我是这样使用的，在官方的版本上将llm模块替换为vllm加速的版本，然后进行的测试

hexisyztem · 2025-04-02T02:12:26Z

https://github.com/qi-hua/async_cosyvoice qi-hua/async_cosyvoice: 使用vllm加速cosyvoice2的推理 github.com 这里有完整的example，你自己看看吧。 https://github.com/qi-hua/async_cosyvoice/blob/main/runtime/async_grpc/client.py#L203 async_cosyvoice/runtime/async_grpc/client.py at main · qi-hua/async_cosyvoice github.com 里面也有完整的多进程client调用的逻辑。

…

2025年4月2日 10:05，wang ***@***.***> 写道： wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) 你是怎么使用的？发自我的 iPhone在 2025年4月2日，00:27，wang @.> 写道：我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) 我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***> 我是这样使用的，在官方的版本上将llm模块替换为vllm加速的版本，然后进行的测试 image.png (view on web) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned. <#1115> <https://github.com/user-attachments/assets/29ed9b22-ffeb-46db-bbfa-edd45d9b8495> <#1115 (comment)> <https://github.com/notifications/unsubscribe-auth/AGAOAOGQBHT5XJHG33NOJBD2XNAXTAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRGA4TOOBYGY> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) <#1115 (comment)> 你是怎么使用的？发自我的 iPhone在 2025年4月2日，00:27，wang @.> 写道：我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115 <#1115>) 我发现目前qi-hua分享的vllm加速的llm模块，在并发处理时，耗时是逐步累加的，无法处理多个并发请求，不知道是不是我使用方法不对 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***> 我是这样使用的，在官方的版本上将llm模块替换为vllm加速的版本，然后进行的测试 image.png (view on web) <https://github.com/user-attachments/assets/29ed9b22-ffeb-46db-bbfa-edd45d9b8495> — Reply to this email directly, view it on GitHub <#1115 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAOAOGQBHT5XJHG33NOJBD2XNAXTAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRGA4TOOBYGY>. You are receiving this because you were mentioned.

wang-TJ-20 · 2025-04-02T09:00:27Z

@hexisyztem hi，咨询个问题，我是将flow分离单独做成了一个服务，我发现当并发请求时，即使是两个实例，flow单个实例的耗时还是会增加，就是两个实例，总会有个实例的耗时看起来是翻倍了，不知道这种做法合理吗

单并发时

两并发时

hexisyztem · 2025-04-02T09:04:37Z

你看看是 cpu block 了还是 coda stream block 了

…

2025年4月2日 17:00，wang ***@***.***> 写道： wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) @hexisyztem hi，咨询个问题，我是将flow分离了，我发现当并发请求时，即使是两个实例，flow单个实例的耗时还是会成倍增加，不知道有啥看法吗单并发时 image.png (view on web) 两并发时 image.png (view on web) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned. <https://github.com/hexisyztem> <https://github.com/user-attachments/assets/a25216a9-c7e7-413f-8516-34c493374666> <https://github.com/user-attachments/assets/a2f362fe-d30d-4b0f-be32-39e185b7eaeb> <#1115 (comment)> <https://github.com/notifications/unsubscribe-auth/AGAOAOCBRJ2MGDWUPBMKZSL2XORMDAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRHA4TOMJWGY> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) <#1115 (comment)> @hexisyztem <https://github.com/hexisyztem> hi，咨询个问题，我是将flow分离了，我发现当并发请求时，即使是两个实例，flow单个实例的耗时还是会成倍增加，不知道有啥看法吗单并发时 image.png (view on web) <https://github.com/user-attachments/assets/a25216a9-c7e7-413f-8516-34c493374666> 两并发时 image.png (view on web) <https://github.com/user-attachments/assets/a2f362fe-d30d-4b0f-be32-39e185b7eaeb> — Reply to this email directly, view it on GitHub <#1115 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAOAOCBRJ2MGDWUPBMKZSL2XORMDAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRHA4TOMJWGY>. You are receiving this because you were mentioned.

wang-TJ-20 · 2025-04-02T09:18:58Z

@hexisyztem 哦哦，好的，我去看看，另外，我发现用你分享的EstimatorWrapper，只启动单个实例测并发，在服务里打印的耗时看起来没变化，但是在client里去打印flow耗时时，看起来稳定增加100ms，单实例应该还是有锁吧😂

hexisyztem · 2025-04-02T09:31:53Z

你是怎么测试的？测试的整个服务还是只测试 flow 部分？我之后看看

…

2025年4月2日 17:19，wang ***@***.***> 写道： wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) @hexisyztem 哦哦，好的，我去看看，另外，我发现用你分享的EstimatorWrapper，只启动单个实例测并发，在服务里打印的耗时看起来没变化，但是在client里去打印flow耗时时，看起来稳定增加100ms😂 服务端打印的耗时 image.png (view on web) 客户端接收到结果打印的时间 image.png (view on web) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned. <https://github.com/hexisyztem> <https://github.com/user-attachments/assets/a4473666-0520-4810-bf60-240a29c10707> <https://github.com/user-attachments/assets/65c25c38-5320-4828-8407-640300b5c685> <#1115 (comment)> <https://github.com/notifications/unsubscribe-auth/AGAOAOBOKTOFD7BCD5FYMML2XOTRPAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRHE2TCNRUHA> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) <#1115 (comment)> @hexisyztem <https://github.com/hexisyztem> 哦哦，好的，我去看看，另外，我发现用你分享的EstimatorWrapper，只启动单个实例测并发，在服务里打印的耗时看起来没变化，但是在client里去打印flow耗时时，看起来稳定增加100ms😂 服务端打印的耗时 image.png (view on web) <https://github.com/user-attachments/assets/a4473666-0520-4810-bf60-240a29c10707> 客户端接收到结果打印的时间 image.png (view on web) <https://github.com/user-attachments/assets/65c25c38-5320-4828-8407-640300b5c685> — Reply to this email directly, view it on GitHub <#1115 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAOAOBOKTOFD7BCD5FYMML2XOTRPAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRHE2TCNRUHA>. You are receiving this because you were mentioned.

wang-TJ-20 · 2025-04-02T09:43:40Z

你是怎么测试的？测试的整个服务还是只测试 flow 部分？我之后看看
…
2025年4月2日 17:19，wang @.***> 写道： wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) @hexisyztem 哦哦，好的，我去看看，另外，我发现用你分享的EstimatorWrapper，只启动单个实例测并发，在服务里打印的耗时看起来没变化，但是在client里去打印flow耗时时，看起来稳定增加100ms😂 服务端打印的耗时 image.png (view on web) 客户端接收到结果打印的时间 image.png (view on web) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned. https://github.com/hexisyztem https://github.com/user-attachments/assets/a4473666-0520-4810-bf60-240a29c10707 https://github.com/user-attachments/assets/65c25c38-5320-4828-8407-640300b5c685 <#1115 (comment)> https://github.com/notifications/unsubscribe-auth/AGAOAOBOKTOFD7BCD5FYMML2XOTRPAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRHE2TCNRUHA wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) <#1115 (comment)> @hexisyztem https://github.com/hexisyztem 哦哦，好的，我去看看，另外，我发现用你分享的EstimatorWrapper，只启动单个实例测并发，在服务里打印的耗时看起来没变化，但是在client里去打印flow耗时时，看起来稳定增加100ms😂 服务端打印的耗时 image.png (view on web) https://github.com/user-attachments/assets/a4473666-0520-4810-bf60-240a29c10707 客户端接收到结果打印的时间 image.png (view on web) https://github.com/user-attachments/assets/65c25c38-5320-4828-8407-640300b5c685 — Reply to this email directly, view it on GitHub <#1115 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAOAOBOKTOFD7BCD5FYMML2XOTRPAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRHE2TCNRUHA. You are receiving this because you were mentioned.

测试的整个服务，根据llm模块的输出发请求给flow模块，然后打印flow的耗时

hexisyztem and others added 3 commits March 27, 2025 14:37

Fix inconsistent indentation in flow_matching.py

7921181

Aligned code indentation for improved readability and maintainability. Adjusted spacing issues in method signatures and data pointer lists to ensure uniform formatting across the file.

qi-hua mentioned this pull request Mar 31, 2025

多 Estimator 并行推理flow.decoder： ImportError: cannot import name 'EstimatorWrapper' from 'cosyvoice.flow.flow_matching' qi-hua/async_cosyvoice#35

Open

Update file_utils.py

5788aa0

reduce GPU memory cost

wjddd mentioned this pull request Apr 2, 2025

Processing error: 'EstimatorWrapper' object has no attribute 'set_input_shape' qi-hua/async_cosyvoice#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce EstimatorWrapper for managing multiple estimators #1115

Introduce EstimatorWrapper for managing multiple estimators #1115

hexisyztem commented Mar 28, 2025

qi-hua commented Mar 31, 2025

wang-TJ-20 commented Apr 1, 2025 •

edited

Loading

wang-TJ-20 commented Apr 1, 2025

hexisyztem commented Apr 2, 2025 via email

hexisyztem commented Apr 2, 2025 via email

wang-TJ-20 commented Apr 2, 2025

hexisyztem commented Apr 2, 2025 via email

wang-TJ-20 commented Apr 2, 2025 •

edited

Loading

hexisyztem commented Apr 2, 2025 via email

wang-TJ-20 commented Apr 2, 2025 •

edited

Loading

hexisyztem commented Apr 2, 2025 via email

wang-TJ-20 commented Apr 2, 2025 •

edited

Loading

Introduce EstimatorWrapper for managing multiple estimators #1115

Are you sure you want to change the base?

Introduce EstimatorWrapper for managing multiple estimators #1115

Conversation

hexisyztem commented Mar 28, 2025

qi-hua commented Mar 31, 2025

wang-TJ-20 commented Apr 1, 2025 • edited Loading

wang-TJ-20 commented Apr 1, 2025

hexisyztem commented Apr 2, 2025 via email

hexisyztem commented Apr 2, 2025 via email

wang-TJ-20 commented Apr 2, 2025

hexisyztem commented Apr 2, 2025 via email

wang-TJ-20 commented Apr 2, 2025 • edited Loading

hexisyztem commented Apr 2, 2025 via email

wang-TJ-20 commented Apr 2, 2025 • edited Loading

hexisyztem commented Apr 2, 2025 via email

wang-TJ-20 commented Apr 2, 2025 • edited Loading

wang-TJ-20 commented Apr 1, 2025 •

edited

Loading

wang-TJ-20 commented Apr 2, 2025 •

edited

Loading

wang-TJ-20 commented Apr 2, 2025 •

edited

Loading

wang-TJ-20 commented Apr 2, 2025 •

edited

Loading