Skip to content

Embedding unreachable, but llama is running #4112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LucaFulchir opened this issue Apr 3, 2025 · 5 comments
Closed

Embedding unreachable, but llama is running #4112

LucaFulchir opened this issue Apr 3, 2025 · 5 comments

Comments

@LucaFulchir
Copy link

LucaFulchir commented Apr 3, 2025

Hello, I'm trying to run/test tabby, but I have problems with the embedding instance
Using version 0.27, NixOS unstable server.

Ai completion and Ai chat seem to work, but I can not add a git context provider of a public repo, it seems to clone successfully but can't parse a single file.

config.toml:

[model.completion.local]
model_id = "Qwen2.5-Coder-3B"

[model.chat.local]
model_id = "Qwen2.5-Coder-1.5B-Instruct"

[model.embedding.local]
model_id = "Nomic-Embed-Text"

running with:

tabby serve --model Qwen2.5-Coder-3B --host 192.168.1.10 --port 11029 --device rocm

testing on AMD Ryzen 7 8845HS w/ Radeon 780M Graphics

on the tabby web interface, on the systems page I see "Unreachable" only under "Enbedding", with error "error decoding response body"

The llama instance seems to be UP and by dumping the local traffic I see the following req/responses:

GET /health HTTP/1.1
accept: */*
host: 127.0.0.1:30888

HTTP/1.1 200 OK
Access-Control-Allow-Origin:
Content-Length: 15
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp
------
POST /tokenize HTTP/1.1
content-type: application/json
accept: */*
host: 127.0.0.1:30888
content-length: 25

{"content":"hello Tabby"}

HTTP/1.1 200 OK
Access-Control-Allow-Origin: 
Content-Length: 28
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp

{"tokens":[7592,21628,3762]}
-----------
POST /embeddings HTTP/1.1
content-type: application/json
accept: */*
host: 127.0.0.1:30888
content-length: 27

{"content":"hello Tabby\n"}

HTTP/1.1 200 OK
Access-Control-Allow-Origin:
Content-Length: 16226
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp

{"embedding":[0.0018252730369567871, **a lot more floats**,-0.024591289460659027],"index":0}

Additional tabby logging even when running with RUST_LOG=debug are all like:

WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:R1AWw5:::{"path":"/var/lib/tabby/repositories/[redacted]/src/connection/handshake/dirsync/req.rs","language":"rust","git_hash":"906b1491a1a0ecb98781568b24d8ba781d6765e2"}': Failed to embed chunk text: error decoding response body

what can I try/what I am doing wrong?

@LucaFulchir
Copy link
Author

updated to 0.27.1, tried different models thanks to more ram, the local embedding is still marked as 'unreachable', same errors

@LucaFulchir
Copy link
Author

workaround: use http for embeddings, not local

I literally copied the llama-server cmdline and ran llama manually.
connecting this way works

[model.embedding.http]
kind = "llama.cpp/embedding"
model_name = "Nomic-Embed-Text"
api_endpoint = "http://127.0.0.1:30887"

I think the kind is wrong when using the [model.embedding.local] at this point

@zwpaper
Copy link
Member

zwpaper commented Apr 18, 2025

hi @LucaFulchir, Did you utilize the llama.cpp that came with Tabby, or was it installed manually as a separate component?

@LucaFulchir
Copy link
Author

tabby is configured to use nixos llama.cpp, built to use vulkan.
Currently seems to be release b4154

Now I notice that when I run llama manually instead it uses realase b5141, which is much newer

@zwpaper
Copy link
Member

zwpaper commented May 3, 2025

The llama.cpp included in the Tabby release should be functional.

The llama.cpp embedding API has been updated post-build b4356. Please verify your version and configure it appropriately.

for more detail, you could check: https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/

@zwpaper zwpaper closed this as completed May 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants