After choosing to offload all layers onto the GPU, the Ram used for model loading is not released #1964

MATII13T · 2025-03-07T04:45:59Z

My graphics card was RTX3060 12G, the model used was Qwen2.5-7B-instruct-Q4_k_M, normally the model should only take up 4~5G VRam, so I thought the VRam of my GPU was sufficient to handle the quantization model, but I found that my Ram was occupied all the time. The amount of Ram used by each application in windows Task Manager is inconsistent with the actual total Ram usage in Windows Task Manager, and it will not be released until I finish the python script, is the Ram usage necessary or is it just a BUG?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After choosing to offload all layers onto the GPU, the Ram used for model loading is not released #1964

After choosing to offload all layers onto the GPU, the Ram used for model loading is not released #1964

MATII13T commented Mar 7, 2025

After choosing to offload all layers onto the GPU, the Ram used for model loading is not released #1964

After choosing to offload all layers onto the GPU, the Ram used for model loading is not released #1964

Comments

MATII13T commented Mar 7, 2025