Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offloading into shared part of VRAM #1413

Open
2Fort2 opened this issue Mar 9, 2025 · 10 comments
Open

Offloading into shared part of VRAM #1413

2Fort2 opened this issue Mar 9, 2025 · 10 comments

Comments

@2Fort2
Copy link

2Fort2 commented Mar 9, 2025

Koboldcpp is offloading into the shared part of GPU memory instead of the dedicated part.

Image

Other things I've noticed: says it is "unable to detect VRAM" on launch, and "device vulkan0 does not support async, host buffers or events" while offloading.

Image

Image

OS: Windows 10 IoT Enterprise LTSC
CPU: Ryzen 5 1600
GPU: Radeon RX 9070
Kobold version: koboldcpp_nocuda 1.85.1

@LostRuins
Copy link
Owner

Can you try select a different GPU from the dropdown? You might have a few GPUs and it could have picked the wrong one (the iGPU instead).

Image

@2Fort2
Copy link
Author

2Fort2 commented Mar 9, 2025

Hey, thanks for the fast reply.

GPU 1 is identified as the 9070. The other ones have no identifier and I've tried them... the program just crashes.
Also I tried on the normal koboldcpp as opposed to the nocude version, same result, just loads into the shared memory which is just normal RAM.

Image

@LostRuins
Copy link
Owner

Alright, so only ID 1 works. If you try manually setting the layers, does it work? Let's try setting 50 layers.

@2Fort2
Copy link
Author

2Fort2 commented Mar 9, 2025

Setting manually yields the same result. Here's the logs.

Image

Image

@LostRuins
Copy link
Owner

Looks like you loaded it correctly. I can see the AMD Radeon RX 9070 being selected in the log. And I see Vulkan0 model buffer size as 6GB, which looks correct. So the offload should be working. If you do a partial offload e.g. 20 layers, you'll see this value change (and it should also reflect in task manager)

@2Fort2
Copy link
Author

2Fort2 commented Mar 9, 2025

I've tried a partial offload, it does change the value displayed, but doesn't change the final result.

Image

Strange that even with the partial offload, it still seem to occupy the shared part of the GPU as opposed to leave that bit a bit more empty since technically it's been loaded into normal RAM.

I've also tried to load models using LM studio, there seems to be the exact same issue. But given how both koboldcpp and LM studio uses vulkan llamacpp, it would make sense. Could this just be that vulkan llamacpp is currently bugged with the 9070 and just has trouble identifying the VRAM so is opting not to load into it?

@LostRuins
Copy link
Owner

I doubt that. Instead, it could be your GPU settings then.

Is it possible you configured your AMD graphics settings to always prefer shared memory? Perhaps some sort of integrated graphics power saver mode?

If you load a normal GPU workload, say a video game, does it also use shared memory?

One more option would be to try the rocm fork https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.85.yr0-ROCm and see if that works for you.

@2Fort2
Copy link
Author

2Fort2 commented Mar 10, 2025

Where could I check the GPU settings? I have the AMD Adrenaline software, but it doesn't even mention shared memory in there at all.

Games work fine, they all take up the dedicated memory.

I have tried the ROCm version, it says there's no ROCm devices found and just loads into normal RAM, which isn't surprising as 9070/XT don't have support for it from the get go.

@henk717
Copy link

henk717 commented Mar 13, 2025

Others have reported the same thing and it seems to be a driver issue. Yesterday I helped someone where this did not occur and occam (vulkan developer) confirmed this is beyond his control. Double check that you are on the latest driver. If you are assuming theu were to I don't know why the driver does not properly load to vram like it says it does.

@2Fort2
Copy link
Author

2Fort2 commented Mar 14, 2025

Hey, thanks for the reply.

Yeah, I am on the latest drivers according to AMD Adrenaline. Still loads into shared VRAM. I guess all that could be done is keep waiting for a fix?

It doesn't seem to affect everyone on the 9070/XT though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants