Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce EstimatorWrapper for managing multiple estimators #1115

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hexisyztem
Copy link
Contributor

向前兼容原本的代码逻辑。
添加 EstimatorWrapper,用于多个 cuda stream 同时调用 Estimator 进行处理的情况

hexisyztem and others added 3 commits March 27, 2025 14:37
Replaces inline locking logic with an improved estimator acquisition and release mechanism. Adopts `execute_async_v3` for more efficient execution and ensures proper CUDA stream synchronization. These changes improve code clarity, concurrency handling, and runtime performance.
Added an EstimatorWrapper class to handle the creation and management of multiple estimators in a queue. Updated the forward method and related logic to support both single estimator and EstimatorWrapper instances, improving flexibility and concurrency in execution.
Aligned code indentation for improved readability and maintainability. Adjusted spacing issues in method signatures and data pointer lists to ensure uniform formatting across the file.
@qi-hua
Copy link

qi-hua commented Mar 31, 2025

flow.decoder 的多 Estimator 并行推理, 较原来的单Estimator实例推理,对于流式返回、并发推理的效率有显著提升 !!

reduce GPU memory cost
@wang-TJ-20
Copy link

wang-TJ-20 commented Apr 1, 2025

@qi-hua @hexisyztem 想问下这个flow的并发加速更新完代码后,还需要做哪些操作才能应用上

@wang-TJ-20
Copy link

我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对

@hexisyztem
Copy link
Contributor Author

hexisyztem commented Apr 2, 2025 via email

@hexisyztem
Copy link
Contributor Author

hexisyztem commented Apr 2, 2025 via email

@wang-TJ-20
Copy link

你是怎么使用的?发自我的 iPhone在 2025年4月2日,00:27,wang @.> 写道: 我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.> wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) 我发现目前qi-hua分享的vllm加速的llm模块,在并发处理时,耗时是逐步累加的,无法处理多个并发请求,不知道是不是我使用方法不对 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

我是这样使用的,在官方的版本上将llm模块替换为vllm加速的版本,然后进行的测试
image

@hexisyztem
Copy link
Contributor Author

hexisyztem commented Apr 2, 2025 via email

@wang-TJ-20
Copy link

wang-TJ-20 commented Apr 2, 2025

@hexisyztem hi,咨询个问题,我是将flow分离单独做成了一个服务,我发现当并发请求时,即使是两个实例,flow单个实例的耗时还是会增加,就是两个实例,总会有个实例的耗时看起来是翻倍了,不知道这种做法合理吗

单并发时
image

两并发时
image

@hexisyztem
Copy link
Contributor Author

hexisyztem commented Apr 2, 2025 via email

@wang-TJ-20
Copy link

wang-TJ-20 commented Apr 2, 2025

@hexisyztem 哦哦,好的,我去看看,另外,我发现用你分享的EstimatorWrapper,只启动单个实例测并发,在服务里打印的耗时看起来没变化,但是在client里去打印flow耗时时,看起来稳定增加100ms,单实例应该还是有锁吧😂

@hexisyztem
Copy link
Contributor Author

hexisyztem commented Apr 2, 2025 via email

@wang-TJ-20
Copy link

wang-TJ-20 commented Apr 2, 2025

你是怎么测试的?测试的整个服务还是只测试 flow 部分?我之后看看

2025年4月2日 17:19,wang @.***> 写道: wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) @hexisyztem 哦哦,好的,我去看看,另外,我发现用你分享的EstimatorWrapper,只启动单个实例测并发,在服务里打印的耗时看起来没变化,但是在client里去打印flow耗时时,看起来稳定增加100ms😂 服务端打印的耗时 image.png (view on web) 客户端接收到结果打印的时间 image.png (view on web) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned. https://github.com/hexisyztem https://github.com/user-attachments/assets/a4473666-0520-4810-bf60-240a29c10707 https://github.com/user-attachments/assets/65c25c38-5320-4828-8407-640300b5c685 <#1115 (comment)> https://github.com/notifications/unsubscribe-auth/AGAOAOBOKTOFD7BCD5FYMML2XOTRPAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRHE2TCNRUHA wang-TJ-20 left a comment (FunAudioLLM/CosyVoice#1115) <#1115 (comment)> @hexisyztem https://github.com/hexisyztem 哦哦,好的,我去看看,另外,我发现用你分享的EstimatorWrapper,只启动单个实例测并发,在服务里打印的耗时看起来没变化,但是在client里去打印flow耗时时,看起来稳定增加100ms😂 服务端打印的耗时 image.png (view on web) https://github.com/user-attachments/assets/a4473666-0520-4810-bf60-240a29c10707 客户端接收到结果打印的时间 image.png (view on web) https://github.com/user-attachments/assets/65c25c38-5320-4828-8407-640300b5c685 — Reply to this email directly, view it on GitHub <#1115 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGAOAOBOKTOFD7BCD5FYMML2XOTRPAVCNFSM6AAAAABZ63V4L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZRHE2TCNRUHA. You are receiving this because you were mentioned.

测试的整个服务,根据llm模块的输出发请求给flow模块,然后打印flow的耗时

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants