-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce EstimatorWrapper for managing multiple estimators #1115
base: main
Are you sure you want to change the base?
Conversation
Replaces inline locking logic with an improved estimator acquisition and release mechanism. Adopts `execute_async_v3` for more efficient execution and ensures proper CUDA stream synchronization. These changes improve code clarity, concurrency handling, and runtime performance.
Added an EstimatorWrapper class to handle the creation and management of multiple estimators in a queue. Updated the forward method and related logic to support both single estimator and EstimatorWrapper instances, improving flexibility and concurrency in execution.
Aligned code indentation for improved readability and maintainability. Adjusted spacing issues in method signatures and data pointer lists to ensure uniform formatting across the file.
flow.decoder 的多 Estimator 并行推理, 较原来的单Estimator实例推理,对于流式返回、并发推理的效率有显著提升 !! |
reduce GPU memory cost
@qi-hua @hexisyztem 想问下这个flow的并发加速更新完代码后,还需要做哪些操作才能应用上 |
我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对 |
你是怎么使用的?发自我的 iPhone在 2025年4月2日,00:27,wang ***@***.***> 写道:
我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115)
我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
client.py里面有多进程并发的example发自我的 iPhone在 2025年4月2日,00:27,wang ***@***.***> 写道:
我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115)
我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
https://github.com/qi-hua/async_cosyvoice
qi-hua/async_cosyvoice: 使用vllm加速cosyvoice2的推理
github.com
这里有完整的example,你自己看看吧。
https://github.com/qi-hua/async_cosyvoice/blob/main/runtime/async_grpc/client.py#L203
async_cosyvoice/runtime/async_grpc/client.py at main · qi-hua/async_cosyvoice
github.com
里面也有完整的多进程client调用的逻辑。
… 2025年4月2日 10:05,wang ***@***.***> 写道:
wang-TJ-20
left a comment
(FunAudioLLM/CosyVoice#1115)
你是怎么使用的?发自我的 iPhone在 2025年4月2日,00:27,wang @.> 写道: 我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) 我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
我是这样使用的,在官方的版本上将llm模块替换为vllm加速的版本,然后进行的测试
image.png (view on web)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
<#1115> <https://github.com/user-attachments/assets/29ed9b22-ffeb-46db-bbfa-edd45d9b8495> <#1115 (comment)> <https://github.com/notifications/unsubscribe-auth/AGAOAOGQBHT5XJHG33NOJBD2XNAXTAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRGA4TOOBYGY>
wang-TJ-20
left a comment
(FunAudioLLM/CosyVoice#1115)
<#1115 (comment)>
你是怎么使用的?发自我的 iPhone在 2025年4月2日,00:27,wang @.> 写道: 我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115 <#1115>) 我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
我是这样使用的,在官方的版本上将llm模块替换为vllm加速的版本,然后进行的测试
image.png (view on web) <https://github.com/user-attachments/assets/29ed9b22-ffeb-46db-bbfa-edd45d9b8495>
—
Reply to this email directly, view it on GitHub <#1115 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAOAOGQBHT5XJHG33NOJBD2XNAXTAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRGA4TOOBYGY>.
You are receiving this because you were mentioned.
|
@hexisyztem hi,咨询个问题,我是将flow分离单独做成了一个服务,我发现当并发请求时,即使是两个实例,flow单个实例的耗时还是会增加,就是两个实例,总会有个实例的耗时看起来是翻倍了,不知道这种做法合理吗 |
@hexisyztem 哦哦,好的,我去看看,另外,我发现用你分享的EstimatorWrapper,只启动单个实例测并发,在服务里打印的耗时看起来没变化,但是在client里去打印flow耗时时,看起来稳定增加100ms,单实例应该还是有锁吧😂 |
向前兼容原本的代码逻辑。
添加 EstimatorWrapper,用于多个 cuda stream 同时调用 Estimator 进行处理的情况