Allow for possibly non-pooled embeddings #1380
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Getting embeddings direct from "non-embedding" LLMs is pretty important for achieving SOTA performance (see MTEB Leaderboard). This PR makes it so that the embed functions return token level embeddings when the
pooling_type
ends up beingLLAMA_POOLING_TYPE_NONE
and the appropriate sequence level embeddings otherwise. After that, the user can do whatever type of pooling they wish. This should address most of the concerns raised in #1288 too.There are still some gotchas. If you request mean/cls pooling on a model that doesn't support it, it will hard crash in
llama.cpp
. And it's hard to know ex ante what types of pooling a model supports. So it's probably just better to wait until this gets sorted out inllama.cpp
.Also, this changes the default for
normalize
toFalse
. You typically want to normalize after pooling, so you would not want to normalize when getting token level embeddings. The other option is defaulting toFalse
for token level andTrue
for sequence level, but that seems overly complicated.Note: this depends on the
llama_pooling_type
function, which just got merged in ggml-org/llama.cpp@b4e4b8a.