Skip to content

flow_cache 版本容易CUDA OOM #1165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
reusu opened this issue Apr 12, 2025 · 6 comments
Open

flow_cache 版本容易CUDA OOM #1165

reusu opened this issue Apr 12, 2025 · 6 comments

Comments

@reusu
Copy link

reusu commented Apr 12, 2025

因为是小显存显卡
之前跑COSY2的时候都不会有问题
今天升级到FLOW CACHE的版本
即时use_flow_cache=False
快速的生成5-6条语音
也会很快的OOM
请问原因是?

没有开tensorrt

@reusu
Copy link
Author

reusu commented Apr 12, 2025

使用 08312f4 版本恢复正常
搭配的模型为 9bd5b08fc085bd93d3f8edb16b67295606290350

@reusu
Copy link
Author

reusu commented Apr 12, 2025

`==========
== CUDA ==

CUDA Version 12.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/diffusers/models/lora.py:393: FutureWarning: LoRACompatibleLinear is deprecated and will be removed in version 1.0.0. Use of LoRACompatibleLinear is deprecated. Please switch to PEFT backend by installing PEFT: pip install peft.
deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2025-04-12 16:22:16,366 INFO input frame rate=25
/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-04-12 16:22:17.653625681 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 8 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-04-12 16:22:17.655306299 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-04-12 16:22:17.655316013 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
text.cc: festival_Text_init
open voice lang map failed
2025-04-12 16:22:20,417 DEBUG Using selector: EpollSelector
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
2025-04-12 16:22:36,304 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:36,304 DEBUG Calling on_header_field with data[40:59]
2025-04-12 16:22:36,304 DEBUG Calling on_header_value with data[61:83]
2025-04-12 16:22:36,304 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,304 DEBUG Calling on_header_field with data[85:99]
2025-04-12 16:22:36,304 DEBUG Calling on_header_value with data[101:103]
2025-04-12 16:22:36,304 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,304 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:36,304 DEBUG Calling on_part_data with data[107:120]
2025-04-12 16:22:36,304 DEBUG Calling on_part_end with no data
2025-04-12 16:22:36,304 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:36,305 DEBUG Calling on_header_field with data[162:181]
2025-04-12 16:22:36,305 DEBUG Calling on_header_value with data[183:205]
2025-04-12 16:22:36,305 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,305 DEBUG Calling on_header_field with data[207:221]
2025-04-12 16:22:36,305 DEBUG Calling on_header_value with data[223:224]
2025-04-12 16:22:36,305 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,305 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:36,305 DEBUG Calling on_part_data with data[228:233]
2025-04-12 16:22:36,305 DEBUG Calling on_part_end with no data
2025-04-12 16:22:36,305 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:36,305 DEBUG Calling on_header_field with data[275:294]
2025-04-12 16:22:36,305 DEBUG Calling on_header_value with data[296:318]
2025-04-12 16:22:36,305 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,305 DEBUG Calling on_header_field with data[320:334]
2025-04-12 16:22:36,305 DEBUG Calling on_header_value with data[336:337]
2025-04-12 16:22:36,305 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,305 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:36,305 DEBUG Calling on_part_data with data[341:350]
2025-04-12 16:22:36,305 DEBUG Calling on_part_end with no data
2025-04-12 16:22:36,305 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:36,305 DEBUG Calling on_header_field with data[392:411]
2025-04-12 16:22:36,305 DEBUG Calling on_header_value with data[413:436]
2025-04-12 16:22:36,305 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,305 DEBUG Calling on_header_field with data[438:452]
2025-04-12 16:22:36,305 DEBUG Calling on_header_value with data[454:455]
2025-04-12 16:22:36,305 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,305 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:36,306 DEBUG Calling on_part_data with data[459:462]
2025-04-12 16:22:36,306 DEBUG Calling on_part_end with no data
2025-04-12 16:22:36,306 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:36,306 DEBUG Calling on_header_field with data[504:523]
2025-04-12 16:22:36,306 DEBUG Calling on_header_value with data[525:549]
2025-04-12 16:22:36,306 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,306 DEBUG Calling on_header_field with data[551:565]
2025-04-12 16:22:36,306 DEBUG Calling on_header_value with data[567:568]
2025-04-12 16:22:36,306 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,306 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:36,306 DEBUG Calling on_part_data with data[572:577]
2025-04-12 16:22:36,306 DEBUG Calling on_part_end with no data
2025-04-12 16:22:36,306 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:36,306 DEBUG Calling on_header_field with data[619:638]
2025-04-12 16:22:36,306 DEBUG Calling on_header_value with data[640:687]
2025-04-12 16:22:36,306 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,306 DEBUG Calling on_header_field with data[689:701]
2025-04-12 16:22:36,306 DEBUG Calling on_header_value with data[703:727]
2025-04-12 16:22:36,306 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,306 DEBUG Calling on_header_field with data[729:743]
2025-04-12 16:22:36,306 DEBUG Calling on_header_value with data[745:750]
2025-04-12 16:22:36,306 DEBUG Calling on_header_end with no data
2025-04-12 16:22:36,306 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:36,306 DEBUG Calling on_part_data with data[754:37346]
2025-04-12 16:22:36,307 DEBUG Calling on_part_data with data[0:20796]
2025-04-12 16:22:36,307 DEBUG Calling on_part_end with no data
2025-04-12 16:22:36,307 DEBUG Calling on_end with no data
INFO: 172.20.20.222:59920 - "POST /tts HTTP/1.1" 200 OK
0%| | 0/1 [00:00<?, ?it/s]2025-04-12 16:22:37,327 INFO synthesis text 你好!
2025-04-12 16:22:38,709 INFO yield speech len 1.12, rtf 1.2342697807720728
2025-04-12 16:22:38,709 INFO now index 1, sample rate 24000
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.36s/it]
2025-04-12 16:22:38,729 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:38,729 DEBUG Calling on_header_field with data[40:59]
2025-04-12 16:22:38,729 DEBUG Calling on_header_value with data[61:83]
2025-04-12 16:22:38,730 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,730 DEBUG Calling on_header_field with data[85:99]
2025-04-12 16:22:38,730 DEBUG Calling on_header_value with data[101:103]
2025-04-12 16:22:38,730 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,730 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:38,730 DEBUG Calling on_part_data with data[107:120]
2025-04-12 16:22:38,730 DEBUG Calling on_part_end with no data
2025-04-12 16:22:38,730 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:38,730 DEBUG Calling on_header_field with data[162:181]
2025-04-12 16:22:38,730 DEBUG Calling on_header_value with data[183:205]
2025-04-12 16:22:38,730 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,730 DEBUG Calling on_header_field with data[207:221]
2025-04-12 16:22:38,730 DEBUG Calling on_header_value with data[223:224]
2025-04-12 16:22:38,730 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,730 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:38,730 DEBUG Calling on_part_data with data[228:233]
2025-04-12 16:22:38,730 DEBUG Calling on_part_end with no data
2025-04-12 16:22:38,730 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:38,730 DEBUG Calling on_header_field with data[275:294]
2025-04-12 16:22:38,730 DEBUG Calling on_header_value with data[296:318]
2025-04-12 16:22:38,730 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,730 DEBUG Calling on_header_field with data[320:334]
2025-04-12 16:22:38,730 DEBUG Calling on_header_value with data[336:338]
2025-04-12 16:22:38,730 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,730 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:38,731 DEBUG Calling on_part_data with data[342:384]
2025-04-12 16:22:38,731 DEBUG Calling on_part_end with no data
2025-04-12 16:22:38,731 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:38,731 DEBUG Calling on_header_field with data[426:445]
2025-04-12 16:22:38,731 DEBUG Calling on_header_value with data[447:470]
2025-04-12 16:22:38,731 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,731 DEBUG Calling on_header_field with data[472:486]
2025-04-12 16:22:38,731 DEBUG Calling on_header_value with data[488:489]
2025-04-12 16:22:38,731 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,731 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:38,731 DEBUG Calling on_part_data with data[493:496]
2025-04-12 16:22:38,731 DEBUG Calling on_part_end with no data
2025-04-12 16:22:38,731 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:38,731 DEBUG Calling on_header_field with data[538:557]
2025-04-12 16:22:38,731 DEBUG Calling on_header_value with data[559:583]
2025-04-12 16:22:38,731 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,731 DEBUG Calling on_header_field with data[585:599]
2025-04-12 16:22:38,731 DEBUG Calling on_header_value with data[601:602]
2025-04-12 16:22:38,731 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,731 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:38,731 DEBUG Calling on_part_data with data[606:611]
2025-04-12 16:22:38,731 DEBUG Calling on_part_end with no data
2025-04-12 16:22:38,731 DEBUG Calling on_part_begin with no data
2025-04-12 16:22:38,731 DEBUG Calling on_header_field with data[653:672]
2025-04-12 16:22:38,732 DEBUG Calling on_header_value with data[674:721]
2025-04-12 16:22:38,732 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,732 DEBUG Calling on_header_field with data[723:735]
2025-04-12 16:22:38,732 DEBUG Calling on_header_value with data[737:761]
2025-04-12 16:22:38,732 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,732 DEBUG Calling on_header_field with data[763:777]
2025-04-12 16:22:38,732 DEBUG Calling on_header_value with data[779:784]
2025-04-12 16:22:38,732 DEBUG Calling on_header_end with no data
2025-04-12 16:22:38,732 DEBUG Calling on_headers_finished with no data
2025-04-12 16:22:38,732 DEBUG Calling on_part_data with data[788:14178]
2025-04-12 16:22:38,732 DEBUG Calling on_part_data with data[0:43998]
2025-04-12 16:22:38,732 DEBUG Calling on_part_end with no data
2025-04-12 16:22:38,732 DEBUG Calling on_end with no data
INFO: 172.20.20.222:59932 - "POST /tts HTTP/1.1" 200 OK
0%| | 0/1 [00:00<?, ?it/s]2025-04-12 16:22:38,779 INFO synthesis text 请问有什么我可以帮助你的吗?
0%| | 0/1 [00:02<?, ?it/s]
ERROR: Exception in ASGI application

  • Exception Group Traceback (most recent call last):
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi
    | result = await app( # type: ignore[func-returns-value]
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call
    | return await self.app(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
    | await super().call(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/applications.py", line 113, in call
    | await self.middleware_stack(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in call
    | raise exc
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in call
    | await self.app(scope, receive, _send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call
    | await self.app(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
    | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    | raise exc
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    | await app(scope, receive, sender)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/routing.py", line 715, in call
    | await self.middleware_stack(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
    | await route.handle(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
    | await self.app(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
    | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    | raise exc
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    | await app(scope, receive, sender)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    | await response(scope, receive, send)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/responses.py", line 252, in call
    | async with anyio.create_task_group() as task_group:
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 772, in aexit
    | raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/responses.py", line 255, in wrap
    | await func()
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/responses.py", line 244, in stream_response
    | async for chunk in self.body_iterator:
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/concurrency.py", line 62, in iterate_in_threadpool
    | yield await anyio.to_thread.run_sync(_next, as_iterator)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    | return await get_async_backend().run_sync_in_worker_thread(
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread
    | return await future
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 967, in run
    | result = context.run(func, *args)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/starlette/concurrency.py", line 51, in _next
    | return next(iterator)
    | File "/workspace/CosyVoice/server_pcm.py", line 35, in get_stream
    | for i, j in model_output:
    | File "/workspace/CosyVoice/cosyvoice/cli/cosyvoice.py", line 99, in inference_cross_lingual
    | for model_output in self.model.tts(**model_input, stream=stream, speed=speed):
    | File "/workspace/CosyVoice/cosyvoice/cli/model.py", line 448, in tts
    | this_tts_speech = self.token2wav(token=this_tts_speech_token,
    | File "/workspace/CosyVoice/cosyvoice/cli/model.py", line 363, in token2wav
    | tts_mel, self.flow_cache_dict[uuid] = self.flow.inference(token=token.to(self.device),
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    | return func(*args, **kwargs)
    | File "/workspace/CosyVoice/cosyvoice/flow/flow.py", line 279, in inference
    | feat, cache['decoder_cache'] = self.decoder(
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    | return self._call_impl(*args, **kwargs)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    | return forward_call(*args, **kwargs)
    | File "/opt/conda/envs/cosyvoice/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    | return func(*args, **kwargs)
    | File "/workspace/CosyVoice/cosyvoice/flow/flow_matching.py", line 220, in forward
    | mel, cache = self.solve_euler(z, t_span=t_span, mu=mu, mask=mask, spks=spks, cond=cond, cache=cache)
    | File "/workspace/CosyVoice/cosyvoice/flow/flow_matching.py", line 289, in solve_euler
    | cache['mid_blocks_kv_cache'] = torch.concat([cache['mid_blocks_kv_cache'], mid_blocks_kv_cache_new], dim=4)
    | torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.05 GiB. GPU
    +------------------------------------`

使用flow_cache版本的时候出错日志

@aluminumbox
Copy link
Collaborator

cache模式消耗显存更多,是空间换时间的策略,是更容易oom的,如果一直oom可以换离线方式生成

@reusu
Copy link
Author

reusu commented Apr 15, 2025

cache模式消耗显存更多,是空间换时间的策略,是更容易oom的,如果一直oom可以换离线方式生成

问题在于use_flow_cache=False也会flow oom,感觉是有BUG

@aluminumbox
Copy link
Collaborator

cache模式消耗显存更多,是空间换时间的策略,是更容易oom的,如果一直oom可以换离线方式生成

问题在于use_flow_cache=False也会flow oom,感觉是有BUG

我这边测试flow_cache模式下显存稳定在5g,离线模式下的显存消耗应该是和之前一直的

@reusu
Copy link
Author

reusu commented Apr 15, 2025

cache模式消耗显存更多,是空间换时间的策略,是更容易oom的,如果一直oom可以换离线方式生成

问题在于use_flow_cache=False也会flow oom,感觉是有BUG

我这边测试flow_cache模式下显存稳定在5g,离线模式下的显存消耗应该是和之前一直的

能麻烦解释一下 离线模式 吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants