Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma 3 vision not working #1422

Open
mercurial-moon opened this issue Mar 14, 2025 · 6 comments
Open

Gemma 3 vision not working #1422

mercurial-moon opened this issue Mar 14, 2025 · 6 comments

Comments

@mercurial-moon
Copy link

mercurial-moon commented Mar 14, 2025

OS Win10
Kobold version 1.86 and Gemma 3 GGUF

https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/resolve/main/google_gemma-3-12b-it-Q4_K_M.gguf?download=true

and also mmproj file
https://huggingface.co/bartowski/google_gemma-3-12b-it-GGUF/resolve/main/mmproj-google_gemma-3-12b-it-f16.gguf?download=true

I loaded both the gguf and the mmproj into Kobold but when I upload an image using Add img and then paste from clipboard or Upload from file

and then i enter a prompt analyse the image

I get a response

I understand you want me to analyze an image. However, I need you to provide me with the image first! I don't have access to previous conversations or external files unless you give them to me.

Please either:

  Paste a link to the image.
  Upload the image directly (if the platform allows it).

Once you do that, I'll be happy to analyze it for you.

@mercurial-moon
Copy link
Author

Ok I did some more tests. And vision seems to be working on localhost but not from another pc on the same LAN. Looks like the image data is not being sent, when connecting to kobold over LAN.
But on localhost I can see the base64 encoded data being printed on the console.

@mercurial-moon
Copy link
Author

Ok It seems it's started working, I just did F12 to check the network traffic between the browser and Kobold and the request showed the image data.
I'm not sure why it didnt' work the first couple of times.
will do some more tests...

@cb88
Copy link

cb88 commented Mar 14, 2025

Probably cached HTML UI?

@LostRuins
Copy link
Owner

Make sure:

  • BOTH mmproj and the model must be loaded.
  • your image is actually being sent (click on it and see that it's enabled for Multimodal)

@mercurial-moon
Copy link
Author

Make sure:

* BOTH mmproj and the model must be loaded.

* your image is actually being sent (click on it and see that it's enabled for Multimodal)

it's working reliably now but when I tried first 2-3 times i couldn't see the image data (base64 encoded) in the console, I could only see my question regarding the image being sent.
I loaded both files gguf + mmproj.

When i click on the image it says Multimodel (KCPP mmproj)

@mercurial-moon
Copy link
Author

Probably cached HTML UI?

Possibly, but unable to verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants